#TESTING ALL THE LINEAR REGRESSION ALGORITHMS ON A SINGLE DATA SET

Steps Involved:
1. Importing the required Libraries
2. Identify the dataset type - Is it a class or function
3. Identify the location of data set (Github or local repository)
4. View the dataset contents to extract the keys of the dictionary
5. Defining the object to access the contents of the class
6. printing the Description of the dataset (DESCR Variable)
7. Creating a dataframe with the existing dictionary of data
8. Importing the metrics for linear regression
9. Importing all the linear regression algorithms
10.Dividing the data set for training_testing (X) and target variable (y)
11.Splitting the data into training, testing variables
12. Standardizing the data set (Using Standard Scalar)
13. Creating a dictionary of Machine learning regression models
14. Define a function to evaluate each model
15. Define a dictionary to store results
16. Applying the data to each regression algorithms (Scaled data is supplied to SVR and Neural Network)
17. Convert the results into a DataFrame for better readability

In [None]:
# Importing the required libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_wine # Search in google "sklearn datasets", then import your preferred dataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [None]:
print(type(load_wine())) # Checking the type of the dataset downloaded.
# As we have found it to be a class, we should use object to access the contents of this class

<class 'sklearn.utils._bunch.Bunch'>


In [None]:
# type(load_wine()) # To identify the location of dataset - The location is identified in github
# # The term "kwargs" stands for "keyword arguments," and
# # the double asterisks (**) allow you to pass a dictionary of arguments to the function.

In [None]:
print(load_wine()) # Printing the contents of the dataset
# We identify the key names of the dictionaries: data - having data, DESCR - Description of data, feature_names -  Name of all columns of the dataset

{'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
 

In [None]:
wine = load_wine() # Defining object to access the contents of this class

In [None]:
wine

{'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
         1.065e+03],
        [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
         1.050e+03],
        [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
         1.185e+03],
        ...,
        [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
         8.350e+02],
        [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
         8.400e+02],
        [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
         5.600e+02]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

In [None]:
print(wine.DESCR)  # To print the description of the dataset

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

:Number of Instances: 178
:Number of Attributes: 13 numeric, predictive attributes and the class
:Attribute Information:
    - Alcohol
    - Malic acid
    - Ash
    - Alcalinity of ash
    - Magnesium
    - Total phenols
    - Flavanoids
    - Nonflavanoid phenols
    - Proanthocyanins
    - Color intensity
    - Hue
    - OD280/OD315 of diluted wines
    - Proline
    - class:
        - class_0
        - class_1
        - class_2

:Summary Statistics:

                                Min   Max   Mean     SD
Alcohol:                      11.0  14.8    13.0   0.8
Malic Acid:                   0.74  5.80    2.34  1.12
Ash:                          1.36  3.23    2.36  0.27
Alcalinity of Ash:            10.6  30.0    19.5   3.3
Magnesium:                    70.0 162.0    99.7  14.3
Total Phenols:                0.98  3.88    2.29  0.63
Flavanoids:                   0.34  5.08    2.03  1.00

In [None]:
# To view the dataset in dataframe format
pd.DataFrame(data=wine.data, columns=wine.feature_names)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0


In [None]:
# Importing the metrics (measurements) for linear regression algorithms
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
# Regression algorithms
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor  # Multi-layer perceptron regressor (MLP)

In [None]:
# Load dataset (example: Boston housing dataset)
X = pd.DataFrame(wine.data, columns=wine.feature_names)
y = pd.Series(wine.target)

In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
X_train

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
158,14.34,1.68,2.70,25.0,98.0,2.80,1.31,0.53,2.70,13.00,0.57,1.96,660.0
137,12.53,5.51,2.64,25.0,96.0,1.79,0.60,0.63,1.10,5.00,0.82,1.69,515.0
98,12.37,1.07,2.10,18.5,88.0,3.52,3.75,0.24,1.95,4.50,1.04,2.77,660.0
159,13.48,1.67,2.64,22.5,89.0,2.60,1.10,0.52,2.29,11.75,0.57,1.78,620.0
38,13.07,1.50,2.10,15.5,98.0,2.40,2.64,0.28,1.37,3.70,1.18,2.69,1020.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,13.86,1.51,2.67,25.0,86.0,2.95,2.86,0.21,1.87,3.38,1.36,3.16,410.0
106,12.25,1.73,2.12,19.0,80.0,1.65,2.03,0.37,1.63,3.40,1.00,3.17,510.0
14,14.38,1.87,2.38,12.0,102.0,3.30,3.64,0.29,2.96,7.50,1.20,3.00,1547.0
92,12.69,1.53,2.26,20.7,80.0,1.38,1.46,0.58,1.62,3.05,0.96,2.06,495.0


In [None]:
y_train

Unnamed: 0,0
158,2
137,2
98,1
159,2
38,0
...,...
71,1
106,1
14,0
92,1


In [None]:
# Standardize the dataset (only for some algorithms like SVR and Neural network-which gives better results )
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
print(X_train_scaled)

[[ 1.66529275 -0.60840587  1.21896194 ... -1.65632857 -0.87940904
  -0.24860607]
 [-0.54952506  2.7515415   1.00331502 ... -0.58463272 -1.25462095
  -0.72992237]
 [-0.74531007 -1.14354109 -0.93750727 ...  0.35845962  0.2462267
  -0.24860607]
 ...
 [ 1.714239   -0.44172441  0.06884503 ...  1.04434496  0.56585166
   2.69572196]
 [-0.35374006 -0.7399965  -0.36244882 ...  0.01551695 -0.74044166
  -0.79631083]
 [-0.78201975  0.06709269  0.35637426 ... -0.67036839  1.09392769
  -0.98551793]]


In [None]:
print(X_test_scaled)

[[ 8.08733375e-01  6.37318741e-01  7.15785791e-01 -1.24128036e+00
   1.06556722e+00  6.46636689e-01  1.02724237e+00 -1.54932094e+00
   8.93605295e-02  1.82522280e-02  1.55169482e-02  1.06613421e+00
   3.65487151e-01]
 [ 1.50621744e+00  1.46195334e+00  2.84491948e-01 -1.66513218e-01
   7.23080697e-01  8.82684015e-01  6.47480801e-01 -5.32234717e-01
  -6.15594759e-01  7.85270273e-02 -3.70293555e-01  1.02444400e+00
   1.14555151e+00]
 [-6.00625639e-02  3.82910194e-01  1.21896194e+00  4.43489751e-01
  -3.04378866e-01 -1.17879597e+00 -1.50117016e+00  1.26722552e+00
  -1.47529633e+00 -1.97014913e-01 -7.98971892e-01 -3.93023224e-01
  -4.47771435e-01]
 [ 9.18862438e-01 -7.66314624e-01  1.21896194e+00  8.79206158e-01
   3.81076551e-02  1.11873134e+00  1.24710433e+00 -6.10472119e-01
   1.32733079e+00  2.76572797e-01  1.00147712e+00  1.62846276e-01
   1.82603318e+00]
 [-7.45310065e-01 -1.05581401e+00 -1.58444804e+00  3.68211048e-02
  -1.53733034e+00 -2.81816127e-01 -2.11135044e-03 -7.66946921e-01


In [None]:
# List of linear regression models to apply
models = {
    "Linear Regression": LinearRegression(),
    "Ridge Regression": Ridge(),
    "Lasso Regression": Lasso(),
    "ElasticNet Regression": ElasticNet(),
    "Decision Tree": DecisionTreeRegressor(),
    "Random Forest": RandomForestRegressor(),
    "Gradient Boosting": GradientBoostingRegressor(),
    "Support Vector Regressor": SVR(),
    "K-Nearest Neighbors": KNeighborsRegressor(),
    "Neural Network": MLPRegressor(max_iter=1000)
}

In [None]:
# Function to evaluate model performance
def evaluate_model(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    r2 = r2_score(y_true, y_pred)
    return mae, rmse, r2

In [None]:
# Dictionary to store the results
results = {}

In [None]:
# Apply each model and compute metrics
for name, model in models.items(): # When you call items() on a dictionary, returns a list of the dictionary’s key-value tuple pairs.
                                   # Here "name" represents the "key", and "model" represents the "value"
  if name in ["Support Vector Regressor", "Neural Network"]: # Standardised value is only applied to support vector regressor and Neural network
        # Apply scaling for models that need it
        model.fit(X_train_scaled, y_train)
        y_pred = model.predict(X_test_scaled)
  else:
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)

        # Compute evaluation metrics
  mae, rmse, r2 = evaluate_model(y_test, y_pred)
  results[name] = {"MAE": mae, "RMSE": rmse, "R²": r2} # name represents the "key" of the dictionary. Here key is the name of the algorithms
  # In the new dictionary "results", the key is the name of the algorithm; and the "key" is the MAE, RMSE,and R2 values
  # results[name] will be the different key when the name of the algorithm changes

In [None]:
# Convert results to a DataFrame for better visualization
results_df = pd.DataFrame(results).T
print(results_df)

                               MAE      RMSE        R²
Linear Regression         0.203047  0.261789  0.882514
Ridge Regression          0.202464  0.260964  0.883254
Lasso Regression          0.480219  0.571480  0.440133
ElasticNet Regression     0.404472  0.499326  0.572584
Decision Tree             0.111111  0.408248  0.714286
Random Forest             0.096111  0.238234  0.902705
Gradient Boosting         0.114892  0.379391  0.753249
Support Vector Regressor  0.127050  0.162087  0.954962
K-Nearest Neighbors       0.316667  0.538516  0.502857
Neural Network            0.172429  0.233442  0.906579
