## Goal : Predict the sales value based on multiple features

# Note: Outputs are preserved for demonstration. To test, please run cells in order.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('../ML And DS/08-Linear-Regression-Models/Advertising.csv')

In [None]:
df.head()

- Let's check visually relationship between Every feature to sales!

In [None]:
# or just do sns.parplot(df) for easeness
fig, axes= plt.subplots(nrows = 1, ncols = 3, figsize = (16, 6))

axes[0].plot(df['TV'], df['sales'], 'o')
axes[0].set_ylabel('Sales')
axes[0].set_xlabel('TV Spend')

axes[1].plot(df['radio'], df['sales'], 'o')
axes[1].set_ylabel('Sales')
axes[1].set_xlabel('Radio Spend')

axes[2].plot(df['newspaper'], df['sales'], 'o')
axes[2].set_ylabel('Sales')
axes[2].set_xlabel('Newspaper Spend')

- Obtain only the X features, excluding sales

In [None]:
X = df.drop('sales', axis=1)

In [None]:
X

- obtain only the label now (sales)

In [None]:
y = df['sales']

- Now, let's do train-test-split 

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
#help(train_test_split)

In [None]:
# 70 : 30 split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101) 

In [None]:
len(df)

In [None]:
X_train

In [None]:
y_train

In [None]:
len(X_test)

- Let's train our linear model!


In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
# help(LinearRegression)

In [None]:
model = LinearRegression() # creating a  base / raw model 

In [None]:
type(model)

In [None]:
# feed the model , means to train it
model.fit(X_train, y_train)

In [None]:
# feed the X_test features, so it can predict the values (y_test) :
test_predictions = model.predict(X_test)
test_predictions 

In [None]:
y_test # see? values r pretty decently close to actual vcalues, still there is some error , called as residual error!

- Evaluating the performance matrics of our model by finding error between true y label vs predicted y label
- we have Mean absolute error, mean squared error and root mean squared error

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error



### 📏 Error Metrics in Regression

**Mean Absolute Error** (MAE) is the mean of the absolute value of the errors:

$$\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|$$

**Mean Squared Error** (MSE) is the mean of the squared errors:

$$\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2$$

**Root Mean Squared Error** (RMSE) is the square root of the mean of the squared errors:

$$\sqrt{\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2}$$


In [None]:
# this means that the data the model has never seen before, the y hat will be on average off by following amount!
mean_absolute_error(y_test, test_predictions)

In [None]:
# RMSE
np.sqrt(mean_squared_error(y_test, test_predictions))

- Now compare to the range of sales:

If sales ranges from 0 to 30, then 2.3 is ~7.7% of the full range. That might be acceptable.

- But if sales were between 0 and 5, then 2.3 is huge — almost 50%.

--------------------

### Finding if our linear regression model was valid for this dataset or not


In [None]:
# find the difference between each data reseduals
test_residuals = y_test - test_predictions

In [None]:
# test_residuals

In [None]:
# plotting these residuals vs the true y values
sns.scatterplot(x = y_test, y = test_residuals)
plt.axhline(y=0, color = 'r', ls = '--')

In [None]:
sns.displot(test_residuals, bins = 25, kde = True)

### Conclusion

- Because the residual plot showed randomness and not a perticular pattern like a curve, etc, linear regression might be
  a perfect choice for this dataset


-------------------------

### Model deployement for future use

In [None]:
final_model = LinearRegression()

In [None]:
# full dataset, not just train sets
final_model.fit(X, y)

- checking the coefficients of all the features

final_model.coef_ # first one foe first X feature (X1) and so on

- Understanding above coefficients

- Recall from the first plot for every X feature vs the y feature, we could see that the newspaper was the most random.
----
- Newspaper was not as linear as other features, as not necessesary newspaper ads go higher as well as the sales.
----
- That last coeff tell just that but in a number, it is very close to 0, meaning that our model does barly care for predicting
  the sales based on newspaper
---
- We can see that maybe the radio ad spent has the greatest impact on the rising sales!
---
- Quantitavly speaking, if we let radio and newspaper constant, and increase TV by 1, we would expect in average rise of sale in 0.0457..
---
- Similier with radio, BUT, as newspaper b1 is negative, if we increase only it by 1, we would get 0.0010.. sales decreased!

-----------

- Plotting full y vs y hat

In [None]:
y_hat = model.predict(X)

In [None]:
fig,axes = plt.subplots(nrows=1,ncols=3,figsize=(16,6))

axes[0].plot(df['TV'],df['sales'],'o')
axes[0].plot(df['TV'],y_hat,'o',color='red')
axes[0].set_ylabel("Sales")
axes[0].set_title("TV Spend")

axes[1].plot(df['radio'],df['sales'],'o')
axes[1].plot(df['radio'],y_hat,'o',color='red')
axes[1].set_title("Radio Spend")
axes[1].set_ylabel("Sales")

axes[2].plot(df['newspaper'],df['sales'],'o')
axes[2].plot(df['radio'],y_hat,'o',color='red')
axes[2].set_title("Newspaper Spend");
axes[2].set_ylabel("Sales")
plt.tight_layout();  # True points are blue, predicted ones are in red

----------------- 

## Saving the model

In [None]:
from joblib import dump, load

In [None]:
dump(final_model, 'final_sales_model.joblib') # saving the file

In [None]:
# loading that model / file
loaded_model = load('final_sales_model.joblib')

In [None]:
loaded_model.coef_

In [None]:
# Option 1: If each number represents a different feature for a single sample
# This creates a 2D array with shape (1, 3): [[500, 100, 1000]]   
data = np.array([230, 37, 900]).reshape(1, -1) # Reshape to 1 sample with 3 features (notice last value increases, sales decreases!)
loaded_model.predict(data)