
### Lab 12 Unittest & Linear Regression
---

`unittest` — Unit testing framework

The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages. It supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework.

To achieve this, unittest supports some important concepts in an object-oriented way:

test fixture
A test fixture represents the preparation needed to perform one or more tests, and any associated cleanup actions. This may involve, for example, creating temporary or proxy databases, directories, or starting a server process.

test case
A test case is the individual unit of testing. It checks for a specific response to a particular set of inputs. unittest provides a base class, TestCase, which may be used to create new test cases.

test suite
A test suite is a collection of test cases, test suites, or both. It is used to aggregate tests that should be executed together.

test runner
A test runner is a component which orchestrates the execution of tests and provides the outcome to the user. The runner may use a graphical interface, a textual interface, or return a special value to indicate the results of executing the tests.

#### Simple Linear Regression: A Practical Implementation in Python
To build a linear regression model in python, we’ll follow six steps:

  1. Reading and understanding the data
  2. Data pre-processing
  3. Splitting the test and train sets
  4. Fitting the linear regression model to the training set
  5. Predicting test results
  6. Visualizing the test results


### Generating our data
Instead of using a popular sample dataset, let’s generate our own data instead. This will help us understand the values of the sample data better than if we took a real life dataset, and will also help us judge the accuracy of our model better, as you will see in the later sections.

Let’s assume there is only one predictor variable. In that case the linear relationship will be of the form:

`y = mx + b + e`

If we normalize our data, so that `b = 0` , we will get the simplified form of the above equation:

`Y = mx + e`

Here `e`,  is a random value that represents the irreducible error that occurs with each measurement of `y`

**Question 1** Lets write a function to generate this data for us:

In [None]:
def generate_dataset(b, n, std_dev):
  # Generate x as an array of `n` samples which can take a value between 0 and 100
  ...
  # Generate the random error of n samples, with a random value from a normal distribution, with a standard
  # deviation provided in the function argument
  ...
  # Calculate `y` according to the equation discussed
  ...

**Question 2** Create the required number of samples, and then separate them into training and testing sets

In [None]:
#Generate dataset using b = 10, n = 50 and std_dev = 100
x, y = ...

# Take the first 40 samples to train, and the last 10 to test
...
...

### Estimating the coefficient from the data
Now that we have our data, let’s use scikit learn’s `LinearRegression` model to predict the coefficients from the raw data using the ordinary least squares method of regression:

**Question 3** Using linear regression get the value of `m` back

In [None]:
# Import, and create an instance of a simple least squares regression model
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score


model = linear_model.LinearRegression()

# Train the model using the training data that we created
...
# Now that we have trained the model, we can print the coefficient of x that it has predicted
...

# We then use the model to make predictions based on the test values of x
y_pred = model.predict(x_test)

# Now, we can calculate the models accuracy metrics based on what the actual value of y was
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))
print('r_2 statistic: %.2f' % r2_score(y_test, y_pred))



**Question 4** Using unittest, test that the predicted `m` is within some% of the model `m` that was used to make the fake data, which tests that regression was done properly.


In [None]:
...
...
