### Linear regression for data with multiple features
Linear regression with sklearn on the auto-mpg dataset

***
#### Environment
`conda activate sklearn-env`

***
#### Goals
- Train linear regression with sklearn
- Predict values from test dataset and compare with test labels

***
#### References
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

#### Basic python imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import random 


# Make numpy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)

#### Dataset load from CSV located on UCI website.

http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data  
If the URL does not work the dataset can be loaded from the data folder `./data/auto-mpg.data`.

In [None]:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?', comment='\t',
                          sep=' ', skipinitialspace=True)
dataset = raw_dataset.copy()
dataset.tail(2)

#### Data preparation

Split data in `training` and `test` datasets


In [None]:
dataset = dataset.dropna().copy()

train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

#### Traing sklean linear regression algorithm (based on training datasets)

In [None]:
from sklearn.linear_model import LinearRegression

linear_regressor = LinearRegression().fit(train_features, train_labels)

#### Predict values from test dataset and compare with test labels

In [None]:
scored_test = linear_regressor.predict(test_features)
test_dataset['Predicted']=scored_test

In [None]:
test_dataset.sample(10)

In [None]:
test_digit = random.randint(1, len(scored_test))
test_dataset.iloc[[test_digit]]