# Linear Regression Example - Diabetes dataset

This lesson uses the [diabetes dataset](course_datasets.md#diabetes), an internal sample dataset in the scikit-learn package.  

This tutorial is based on the linear regression diabetes tutorial in Section 2.1 of Microsoft's ML for beginners course [here](https://github.com/microsoft/ML-For-Beginners/tree/main/2-Regression/1-Tools).


In [None]:
import matplotlib.pyplot as plt # for charts
import numpy as np
# sklearn is the scikit-learn package
from sklearn import datasets, linear_model, model_selection

Load the data into X, y variables. The X data is the array of independent variables.  The y data is the vector of labels.  Both are numpy arrays

In [None]:
X, y = datasets.load_diabetes(return_X_y=True)
print(f'type of X: {type(X)}')
print(f'shape of X: {X.shape}')
print(f'first row of X: {X[0]}') 
print(f'type of y: {type(y)}') 
print(f'shape of y: {y.shape}')


Aside: an example of indexing numpy arrays - Get all rows, but only 3rd column (which is the BMI value) from the independent variables

In [None]:
X_BMI = X[:, 2]
print(f'shape of X_BMI: {X_BMI.shape}')
print(f'first element of X_BMI: {X_BMI[0]}') 
print(f'first three elements of X_BMI: {X_BMI[0:3]}') 
print(f'number of dimensions of X_BMI: {X_BMI.ndim}') 



In [None]:
# Plot the BMI values against the target values
plt.scatter(X_BMI, y)
plt.title('Disease Progression vs BMI')
plt.xlabel('Scaled BMI')
plt.ylabel('Disease Progression')
plt.show()

Split into train and test datasets

In [None]:
# 
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
print(f"shape of X_train: {X_train.shape}  y_train: {y_train.shape}")
print(f"first row of X_train: {X_train[0]}  y_train: {y_train[0]}")
print(f"number of dimensions of X_train: {X_train.ndim}  y_train: {y_train.ndim}")


Create a linear regression model and train it with the training data set.

In [None]:
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
print(f'\nmodel.coef_:\n {model.coef_}, \nmodel.intercept_:\n {model.intercept_}, \nmodel:\n {model}')

Create a prediction using the predict() function on the test data set.

In [None]:
y_pred = model.predict(X_test)
print(f'shape of y_pred: {y_pred.shape}')

Create a scatter plot of all the X_test data with the actual y values in black and the predicted in green

In [None]:
plt.scatter(X_test[:,2], y_test,  color='black')
plt.scatter(X_test[:, 2], y_pred,  color='green')
plt.xlabel('Scaled BMIs')
plt.ylabel('Disease Progression')
plt.title('A Graph Plot Showing Diabetes Progression Against BP')
plt.show()

A very quick evaluation of the model.

In [None]:
accuracy = model.score(X_test, y_test)
accuracy

END OF TUTORIAL