# **Sales Prediction using Python**

## Introduction

Embark on a journey into 'Sales Prediction with Python,' where the focus lies in leveraging machine learning to forecast sales. This project revolves around the goal of predicting sales figures using various data-driven methodologies and predictive modeling techniques. By delving into exploratory data analysis, feature engineering, and model development, the aim is to craft a robust predictive model capable of anticipating future sales trends. Join this exploration to uncover insights, patterns, and predictive capabilities in sales data, empowering decision-making processes within the realm of business analytics

##Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Importing Data Set

In [3]:
df= pd.read_csv("Advertising.csv")

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper,Sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  200 non-null    int64  
 1   TV          200 non-null    float64
 2   Radio       200 non-null    float64
 3   Newspaper   200 non-null    float64
 4   Sales       200 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 7.9 KB


In [6]:
df.drop(columns= ['Unnamed: 0'], inplace= True)

In [7]:
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


##Spliting Dataset into Features and Target

In [8]:
X= df.drop(columns= ['Sales'], axis= 1)
y= df['Sales'].values

In [9]:
X= np.array(X)
y= np.array(y)

In [10]:
print(X)

[[230.1  37.8  69.2]
 [ 44.5  39.3  45.1]
 [ 17.2  45.9  69.3]
 [151.5  41.3  58.5]
 [180.8  10.8  58.4]
 [  8.7  48.9  75. ]
 [ 57.5  32.8  23.5]
 [120.2  19.6  11.6]
 [  8.6   2.1   1. ]
 [199.8   2.6  21.2]
 [ 66.1   5.8  24.2]
 [214.7  24.    4. ]
 [ 23.8  35.1  65.9]
 [ 97.5   7.6   7.2]
 [204.1  32.9  46. ]
 [195.4  47.7  52.9]
 [ 67.8  36.6 114. ]
 [281.4  39.6  55.8]
 [ 69.2  20.5  18.3]
 [147.3  23.9  19.1]
 [218.4  27.7  53.4]
 [237.4   5.1  23.5]
 [ 13.2  15.9  49.6]
 [228.3  16.9  26.2]
 [ 62.3  12.6  18.3]
 [262.9   3.5  19.5]
 [142.9  29.3  12.6]
 [240.1  16.7  22.9]
 [248.8  27.1  22.9]
 [ 70.6  16.   40.8]
 [292.9  28.3  43.2]
 [112.9  17.4  38.6]
 [ 97.2   1.5  30. ]
 [265.6  20.    0.3]
 [ 95.7   1.4   7.4]
 [290.7   4.1   8.5]
 [266.9  43.8   5. ]
 [ 74.7  49.4  45.7]
 [ 43.1  26.7  35.1]
 [228.   37.7  32. ]
 [202.5  22.3  31.6]
 [177.   33.4  38.7]
 [293.6  27.7   1.8]
 [206.9   8.4  26.4]
 [ 25.1  25.7  43.3]
 [175.1  22.5  31.5]
 [ 89.7   9.9  35.7]
 [239.9  41.5

In [11]:
print(y)

[22.1 10.4  9.3 18.5 12.9  7.2 11.8 13.2  4.8 10.6  8.6 17.4  9.2  9.7
 19.  22.4 12.5 24.4 11.3 14.6 18.  12.5  5.6 15.5  9.7 12.  15.  15.9
 18.9 10.5 21.4 11.9  9.6 17.4  9.5 12.8 25.4 14.7 10.1 21.5 16.6 17.1
 20.7 12.9  8.5 14.9 10.6 23.2 14.8  9.7 11.4 10.7 22.6 21.2 20.2 23.7
  5.5 13.2 23.8 18.4  8.1 24.2 15.7 14.  18.   9.3  9.5 13.4 18.9 22.3
 18.3 12.4  8.8 11.  17.   8.7  6.9 14.2  5.3 11.  11.8 12.3 11.3 13.6
 21.7 15.2 12.  16.  12.9 16.7 11.2  7.3 19.4 22.2 11.5 16.9 11.7 15.5
 25.4 17.2 11.7 23.8 14.8 14.7 20.7 19.2  7.2  8.7  5.3 19.8 13.4 21.8
 14.1 15.9 14.6 12.6 12.2  9.4 15.9  6.6 15.5  7.  11.6 15.2 19.7 10.6
  6.6  8.8 24.7  9.7  1.6 12.7  5.7 19.6 10.8 11.6  9.5 20.8  9.6 20.7
 10.9 19.2 20.1 10.4 11.4 10.3 13.2 25.4 10.9 10.1 16.1 11.6 16.6 19.
 15.6  3.2 15.3 10.1  7.3 12.9 14.4 13.3 14.9 18.  11.9 11.9  8.  12.2
 17.1 15.   8.4 14.5  7.6 11.7 11.5 27.  20.2 11.7 11.8 12.6 10.5 12.2
  8.7 26.2 17.6 22.6 10.3 17.3 15.9  6.7 10.8  9.9  5.9 19.6 17.3  7.6
  9.7 1

##Spliting the Data into Training set and Test set

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size= 0.30, random_state= 42)

##Traning tnr model using leanear regresion

In [15]:
from sklearn.linear_model import LinearRegression
model_lr= LinearRegression()
model_lr.fit(X_train, y_train)

##Predictiog the Test set

In [17]:
pred_lr= model_lr.predict(X_test)

##Model Evaluation

In [18]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
r_squared = r2_score(y_test, pred_lr)
print(f"R-squared (R^2): {r_squared:.4f}")

R-squared (R^2): 0.8609


In [19]:
mse = mean_squared_error(y_test, pred_lr)
print(f"Mean Squared Error (MSE): {mse:.4f}")

rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

mae = mean_absolute_error(y_test, pred_lr)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error (MSE): 3.7968
Root Mean Squared Error (RMSE): 1.9485
Mean Absolute Error (MAE): 1.5117


##Traning the model using Random Forest

In [21]:
from sklearn.ensemble import RandomForestRegressor
model_rf = RandomForestRegressor(n_estimators = 10, random_state = 42)
model_rf.fit(X_train, y_train)

##Predictiog the Test set

In [22]:
pred_rf= model_rf.predict(X_test)

##Model Evaluation

In [23]:
r_squared = r2_score(y_test, pred_rf)
print(f"R-squared (R^2): {r_squared:.4f}")

R-squared (R^2): 0.9825


In [24]:
mse = mean_squared_error(y_test, pred_rf)
print(f"Mean Squared Error (MSE): {mse:.4f}")

rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

mae = mean_absolute_error(y_test, pred_rf)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error (MSE): 0.4769
Root Mean Squared Error (RMSE): 0.6906
Mean Absolute Error (MAE): 0.5435


##Traning the model using Gradient Boosting Regressor

In [27]:
from sklearn.ensemble import GradientBoostingRegressor
model_gb= GradientBoostingRegressor()
model_gb.fit(X_train, y_train)

##Predictiog the Test set

In [29]:
pred_gb= model_gb.predict(X_test)

##Model Evaluation

In [30]:
r_squared = r2_score(y_test, pred_gb)
print(f"R-squared (R^2): {r_squared:.4f}")

R-squared (R^2): 0.9779


In [31]:
mse = mean_squared_error(y_test, pred_gb)
print(f"Mean Squared Error (MSE): {mse:.4f}")

rmse = np.sqrt(mse)
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

mae = mean_absolute_error(y_test, pred_gb)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

Mean Squared Error (MSE): 0.6036
Root Mean Squared Error (RMSE): 0.7769
Mean Absolute Error (MAE): 0.5683


##Summary

### Sales Prediction using Python
This project explores sales prediction leveraging machine learning techniques. The dataset 'Advertising.csv' is loaded and preprocessed by removing the 'Unnamed: 0' column. The data is divided into features (X) and the target variable 'Sales' (y). Three regression models are trained and evaluated: Linear Regression, Random Forest, and Gradient Boosting Regressor. Model evaluation metrics such as R-squared (R^2), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) are computed for each model using the test set. Results indicate that the Random Forest model outperformed the others, achieving an R-squared score of approximately 98.25% with lower error metrics (MSE, RMSE, and MAE) compared to Linear Regression and Gradient Boosting Regressor models.
