In [None]:
%matplotlib inline


# Linear Regression Example
The example below uses only the first feature of the `diabetes` dataset,
in order to illustrate the data points within the two-dimensional plot.
The straight line can be seen in the plot, showing how linear regression
attempts to draw a straight line that will best minimize the
residual sum of squares between the observed responses in the dataset,
and the responses predicted by the linear approximation.

The coefficients, residual sum of squares and the coefficient of
determination are also calculated.


## Installation

First, let's install the required libraries then import them:

In [None]:
%pip install matplotlib
%pip install numpy
%pip install sklearn
%pip install pandas

In [None]:
# Import the libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

## Viewing the data

Next, let's see the data we're working with

In [None]:
# Load the diabetes dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True, as_frame=True)
diabetes_X.head()

In [None]:
diabetes_y.head()

## Choosing our input

Let's use the BMI as our input feature and split the data into training and testing.

In [None]:
diabetes_bmi = diabetes_X["bmi"]

# required for the linear regression model
diabetes_bmi = diabetes_bmi.to_numpy()[:, np.newaxis]

# Split the data into training/testing sets
diabetes_X_train = diabetes_bmi[:-20]
diabetes_X_test = diabetes_bmi[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes_y.to_numpy()[:-20]
diabetes_y_test = diabetes_y.to_numpy()[-20:]

## Creating the model

In [None]:
# Create linear regression object


# Train the model using the training sets


# Make predictions using the testing set


<details><summary>Click to cheat</summary>

```python
# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

```
</details>

## Evaluting our model

We can print the raw numbers and plot!

In [None]:
# The weight
print("Weight:", regr.coef_[0])
# The bais
print("Bias:", regr.intercept_)
# The mean squared error
print("Mean squared error: %.2f" % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination: %.2f" % r2_score(diabetes_y_test, diabetes_y_pred))

In [None]:
# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test, color="black")
plt.plot(diabetes_X_test, diabetes_y_pred, color="blue", linewidth=3)

plt.xlabel("BMI")
plt.ylabel("Disease progression after 1 year")

plt.show()