# Modeling

### Importing important libraries

In [1]:
### importing the important libraries
import pandas as pd
import numpy as np

### Train Test Split
from sklearn.model_selection import train_test_split

### for modeling
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.svm import SVR

### Evaluation metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

### Serialization
import pickle

### Data Loading

In [2]:
### specify the absolute path of dataset
dataset_path = "../dataset/ads.csv"

### importing the dataset
df = pd.read_csv(dataset_path)

### data preview
df.head(3)

Unnamed: 0.1,Unnamed: 0,TV,radio,newspaper,sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3


### Selecting Target and features

#### Single Feature "TV"

In [3]:
features = df.loc[:,["TV"]]
target = df["sales"]

### Train Test Split
* The dataset is split into training and testing sets using train_test_split(). Here, we use 80% of the data for training and 20% for testing.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

In [5]:
### checking the shape of train dataset
X_train.shape, y_train.shape

((160, 1), (160,))

In [6]:
### checking the shape of test dataset
X_test.shape, y_test.shape

((40, 1), (40,))

# Modeling

### 1. LinearRegression:
* The model is trained on the training data using fit().

In [7]:
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

#### Prediction using linearRegression
* We make predictions on the test set using predict()

In [8]:
lr_model_pred = lr_model.predict(X_test)

#### Evaluation
* Finally, we evaluate the model using r2_score, MAE, and MSE. These metrics provide insights into how well the model performs.

#### a. Mean Absolute Error

In [9]:
lr_model_mae = mean_absolute_error(y_true=y_test, y_pred=lr_model_pred)
print(f"The mean absolute error of linearregression with feature TV is {lr_model_mae}")

The mean absolute error of linearregression with feature TV is 2.444420003751042


#### b. Mean Square Error

In [10]:
lr_model_mse = mean_squared_error(y_true=y_test, y_pred=lr_model_pred)
print(f"The mean squared error of linearregression with feature TV is {lr_model_mse}")

The mean squared error of linearregression with feature TV is 10.204654118800956


#### c. R2 Score

In [11]:
lr_model_r2_score = r2_score(y_true=y_test, y_pred=lr_model_pred)
print(f"The r2 score of linearregression with feature TV is {lr_model_r2_score}")

The r2 score of linearregression with feature TV is 0.6766954295627076


### 2. Lasso Regression:
* The model is trained on the training data using fit().

In [12]:
lasso_model = Lasso()
lasso_model.fit(X_train, y_train)

#### Prediction using Lasso Regression
* We make predictions on the test set using predict()

In [13]:
lasso_model_pred = lr_model.predict(X_test)

#### Evaluation
*  Finally, we evaluate the model using r2_score, MAE, and MSE. These metrics provide insights into how well the model performs.

#### a. Mean Absolute Error

In [14]:
lasso_model_mae = mean_absolute_error(y_true=y_test, y_pred=lasso_model_pred)
print(f"The mean absolute error of lasso regression with feature TV is {lasso_model_mae}")

The mean absolute error of lasso regression with feature TV is 2.444420003751042


#### b. Mean Square Error

In [15]:
lasso_model_mse = mean_squared_error(y_true=y_test, y_pred=lasso_model_pred)
print(f"The mean squared error of lasso regression with feature TV is {lasso_model_mse}")

The mean squared error of lasso regression with feature TV is 10.204654118800956


#### c. R2 Score

In [16]:
lasso_model_r2_score = r2_score(y_true=y_test, y_pred=lasso_model_pred)
print(f"The r2 score of lasso regression with feature TV is {lasso_model_r2_score}")

The r2 score of lasso regression with feature TV is 0.6766954295627076


### 3. Ridge Regression:
* The model is trained on the training data using fit().

In [17]:
ridge_model = Ridge()
ridge_model.fit(X_train, y_train)

#### Prediction using Ridge Regression
* We make predictions on the test set using predict()

In [18]:
ridge_model_pred = ridge_model.predict(X_test)

#### Evaluation
*  Finally, we evaluate the model using r2_score, MAE, and MSE. These metrics provide insights into how well the model performs.

#### a. Mean Absolute Error

In [19]:
ridge_model_mae = mean_absolute_error(y_true=y_test, y_pred=ridge_model_pred)
print(f"The mean absolute error of ridge regression with feature TV is {ridge_model_mae}")

The mean absolute error of ridge regression with feature TV is 2.4444210257774612


#### b. Mean Square Error

In [20]:
ridge_model_mse = mean_squared_error(y_true=y_test, y_pred=ridge_model_pred)
print(f"The mean squared error of ridge regression with feature TV is {ridge_model_mse}")

The mean squared error of ridge regression with feature TV is 10.204657076656227


#### c. R2 Score

In [21]:
ridge_model_r2_score = r2_score(y_true=y_test, y_pred=ridge_model_pred)
print(f"The r2 score of lasso regression with feature TV is {ridge_model_r2_score}")

The r2 score of lasso regression with feature TV is 0.6766953358517286


### 4. Support Vector Regression:
* The model is trained on the training data using fit().

In [22]:
svr_model = SVR()
svr_model.fit(X_train, y_train)

#### Prediction using SVR Regression
* We make predictions on the test set using predict()

In [23]:
svr_model_pred = svr_model.predict(X_test)

#### Evaluation
*  Finally, we evaluate the model using r2_score, MAE, and MSE. These metrics provide insights into how well the model performs.

#### a. Mean Absolute Error

In [24]:
svr_model_mae = mean_absolute_error(y_true=y_test, y_pred=svr_model_pred)
print(f"The mean absolute error of support vector regression with feature TV is {svr_model_mae}")

The mean absolute error of support vector regression with feature TV is 2.6198405049216142


#### b. Mean Square Error

In [25]:
svr_model_mse = mean_squared_error(y_true=y_test, y_pred=svr_model_pred)
print(f"The r2 score of support vector regression with feature TV is {svr_model_mse}")

The r2 score of support vector regression with feature TV is 11.114260909710861


#### c. R2 Score

In [26]:
svr_model_r2_score = r2_score(y_true=y_test, y_pred=svr_model_pred)
print(f"The r2 score of support vector regression with feature TV is {svr_model_r2_score}")

The r2 score of support vector regression with feature TV is 0.6478772031555862


# Metrics table for TV

In [27]:
metrics_table ={
    "Regression_Algorithms": ["LinearRegression","LassoRegression","RidgeRegression", "SupportVectorRegression"],
    "Mean_Absolute_Error": [lr_model_mae, lasso_model_mae, ridge_model_mae, svr_model_mae],
    "Mean_Squared_Error": [lr_model_mse, lasso_model_mse, ridge_model_mse, svr_model_mse],
    "R2_score": [lr_model_r2_score, lasso_model_r2_score, ridge_model_r2_score, svr_model_r2_score]
}

In [28]:
metrics_df = pd.DataFrame(metrics_table)
metrics_df

Unnamed: 0,Regression_Algorithms,Mean_Absolute_Error,Mean_Squared_Error,R2_score
0,LinearRegression,2.44442,10.204654,0.676695
1,LassoRegression,2.44442,10.204654,0.676695
2,RidgeRegression,2.444421,10.204657,0.676695
3,SupportVectorRegression,2.619841,11.114261,0.647877


All three types of regression (Linear, Lasso, and Ridge) have the same R2 score of approximately 0.6767. The Support Vector Regression (SVR) has a slightly lower R2 score of approximately 0.6479.

Since all three types of regression models perform similarly based on the R2 score alone, other factors such as computational complexity, interpretability, and potential overfitting might come into consideration.

However, if we only consider the R2 score, we can't confidently say that one model is definitively better than the others. In such cases, it might be prudent to choose the simplest model, which in this case would be Linear Regression, as it may generalize better and is easier to interpret.

Therefor , Linear Regression could be selected as the best model, given its simplicity and comparable performance to the other regression techniques.

### Serialization model for features "TV"

In [29]:
with open("linearregression_model.pickle","bw") as file:
    pickle.dump(lr_model, file) # serializing logistic regression model