## Tasks:
#### Task 1: Load and Explore the Data

- Use the load_diabetes() dataset from the sklearn library.
- Convert the dataset into a pandas DataFrame with appropriate column names.
- Print the first 5 rows and summary statistics (mean, median, etc.) of the dataset.
- Hint: Use the .DESCR attribute to understand the features of the dataset.

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_diabetes

raw_data = load_diabetes()

df = pd.DataFrame(raw_data.data, columns = raw_data.feature_names)
df['target'] = raw_data.target

print(df.head())
print('')
print(df.describe())

        age       sex       bmi        bp        s1        s2        s3  \
0  0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401   
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163  0.074412   
2  0.085299  0.050680  0.044451 -0.005671 -0.045599 -0.034194 -0.032356   
3 -0.089063 -0.044642 -0.011595 -0.036656  0.012191  0.024991 -0.036038   
4  0.005383 -0.044642 -0.036385  0.021872  0.003935  0.015596  0.008142   

         s4        s5        s6  target  
0 -0.002592  0.019908 -0.017646   151.0  
1 -0.039493 -0.068330 -0.092204    75.0  
2 -0.002592  0.002864 -0.025930   141.0  
3  0.034309  0.022692 -0.009362   206.0  
4 -0.002592 -0.031991 -0.046641   135.0  

                age           sex           bmi            bp            s1  \
count  4.420000e+02  4.420000e+02  4.420000e+02  4.420000e+02  4.420000e+02   
mean  -3.634285e-16  1.308343e-16 -8.045349e-16  1.281655e-16 -8.835316e-17   
std    4.761905e-02  4.761905e-02  4.761905e-02  4.761905e-

#### Task 2: Feature Scaling

- Use StandardScaler to scale the feature columns.
- Fit the scaler to the independent variables (exclude the target variable, which is the disease progression score) and transform the data.

In [11]:
scaler = StandardScaler()

data = df.drop(columns = 'target')

scaled_data = scaler.fit_transform(data)

print(scaled_data[0:5])

[[ 0.80050009  1.06548848  1.29708846  0.45983993 -0.92974581 -0.73206462
  -0.91245053 -0.05449919  0.41855058 -0.37098854]
 [-0.03956713 -0.93853666 -1.08218016 -0.55351103 -0.17762425 -0.40288615
   1.56441355 -0.83030083 -1.43655059 -1.93847913]
 [ 1.79330681  1.06548848  0.93453324 -0.11921776 -0.95867356 -0.71889748
  -0.68024452 -0.05449919  0.06020733 -0.54515416]
 [-1.87244107 -0.93853666 -0.24377122 -0.77065766  0.25629203  0.52539714
  -0.75764652  0.72130245  0.47707245 -0.19682291]
 [ 0.11317236 -0.93853666 -0.76494435  0.45983993  0.08272552  0.32789006
   0.17117751 -0.05449919 -0.67258161 -0.98056821]]


#### Task 3: Train-Test Split

- Split the dataset into training and test sets using an 80-20 ratio.
- Use X for the independent variables (features) and y for the target variable (disease progression score).

In [12]:
target = df['target']

data_train, data_test, target_train, target_test = train_test_split(scaled_data, target, test_size = 0.2)

print(f'Train Shape: {data_train.shape}')
print(f'Test Shape: {data_test.shape}')

Train Shape: (353, 10)
Test Shape: (89, 10)


#### Task 4: Linear Regression Model

- Build a Linear Regression model using the training data.
- Fit the model to the scaled training data and make predictions on the test set.

In [13]:
lr = LinearRegression()

lr.fit(data_train, target_train)

predictions = lr.predict(data_test)

print(predictions[0:5])

[174.44272692 216.70269417 208.8950362  104.07510453 160.28356605]


#### Task 5: Model Evaluation

- Evaluate the model’s performance using:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Print the calculated MAE and MSE values, and briefly assess how well the model performs based on these metric

In [15]:
MAE = mean_absolute_error(target_test, predictions)
MSE = mean_squared_error(target_test, predictions)

print(f'MAE = {MAE:.4f}')
print(f'MSE = {MSE:.4f}')

MAE = 46.8962
MSE = 3367.1504
