# SALES FORECAST

It is determining present-day or future sales using data like past sales, seasonality, festivities, economic conditions, etc.

So, this model will predict sales on a certain day after being provided with a certain set of inputs.

In this model 8 parameters were used as input:

* past seven day sales
* day of the week
* date – the date was transformed into 3 different inputs
* season
* Festival or not
* sales on the same day in the previous year

### How doe it work

First, all inputs are preprocessed to be understandable by the machine. This is a linear regression model based on supervised learning, so the output will be provided along with the input. Then inputs are then fed to the model along with desired output. The model will plot(learn) a relation(function) between the input and output. This function or relation is then used to predict the output for a specific set of inputs. In this case, input parameters like date and previous sales are labeled as input, and the amount of sales is marked as output. The model will predict a number between 0 and 1 as a sigmoid function is used in the last layer. This output can be multiplied by a specific number(in this case, maximum sales), this will be our corresponding sales amount for a certain day. This output is then provided as input to calculate sales data for the next day. This cycle of steps will be continued until a certain date arrives.


## Required packages and Installation

* numpy
* pandas
* keras
* tensorflow
* csv
* matplotlib.pyplot

## Step 1: Import the required libraries and dataset.

The dataset I chose for this exercise or program is in the form of CSV so, I used pd.read_csv from the panda's module as shown in the picture below dataset contains 4 columns named TV, radio, newspaper, and sales.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import warnings
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv('C:/Users/asus/Downloads/advertising.csv')
df.head()

## Step 2: Check for null values in the dataset and data inspection.

After the extraction of data, it’s time to check the dataset for null values and duplicate values.


In [None]:
pd.DataFrame(df.isnull().sum(),columns=['Count of Null Values']).T

In [None]:
df.describe(include='all')

In [None]:
df.info()

## Step 3: Exploratory Data Analysis (EDA).

In [None]:
a = df['TV']
b = df['Sales']

In [None]:
plt.figure(figsize=(10,5))
plt.title('Scatterplot between TV and Sales(EDA)')
sns.scatterplot(a,b,color='orange')

In [None]:
a = df['Radio']
b = df['Sales']
plt.figure(figsize=(10,5))
plt.title('Scatterplot between Radio and Sales(EDA)')
sns.scatterplot(x=a,y=b,color='red')

In [None]:
a = df['Newspaper']
b = df['Sales']
plt.figure(figsize=(10,5))
plt.title('Scatterplot between Newspaper and Sales(EDA)')
sns.scatterplot(x=a,y=b,color='purple')

## Distplot:

Displot is used to represent the univariate distribution of data(involving one variate or variable quantity) against the density.

In [None]:
plt.figure(figsize=(10,5))
sns.distplot(df['TV'])
plt.title('Distplot for TV')

In [None]:
plt.figure(figsize=(10,5))
sns.distplot(df['Radio'])
plt.title('Distplot for Radio')

In [None]:
plt.figure(figsize=(10,5))
sns.distplot(df['Newspaper'])
plt.title('Distplot for Newspaper')

In [None]:
plt.figure(figsize=(10,5))
sns.distplot(df['Sales'])
plt.title('Distplot for Sales')

In [None]:
plt.figure(figsize=(10,5))
sns.pairplot(df,x_vars=['TV','Radio','Newspaper'],y_vars=['Sales'],height=3,aspect=1)
plt.title('Pair plot between TV, radio, and newspaper with respect to sales')

In [None]:
plt.figure(figsize=(10,5))
sns.heatmap(df.corr(),annot=True,vmin=0,vmax=1,cmap='ocean')
plt.title('Heatmap (EDA)')

## Step 4: Statistical Tasks

## Standard Deviation

Standard Deviation(std) is a function used to depict how much variation is from the mean.

In [None]:
df.std()

## Correlation

Correlation(corr) is a function used to identify the relationship between the variables.

In [None]:
df.corr()

## Variance

Variance(var) is a function used to check the dispersion that takes into account the spread of all data points in a data set.

In [None]:
df.var()

## Mean

Mean returns the average of the dataset.

In [None]:
df.mean()

## Median

The median calculates the middle value of the dataset.

In [None]:
df.median()

## Step 5: Linear regression model building and prediction.

In [None]:
x = df[['TV']]
y = df['Sales']

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state = 50, test_size = 0.2)

In [None]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(x_train,y_train)

In [None]:
lr.intercept_

In [None]:
lr.coef_

### Model Building and splitting dataset.

In [None]:
print('The LR model is: Y =',lr.intercept_,'+', lr.coef_,'TV')

In [None]:
lr.score(x_train,y_train)

In [None]:
lr.score(x_test, y_test)

In [None]:
y_pred = lr.predict(x_test)
y_pred

### Linear Regression output for test and train data.

In [None]:
diff = pd.DataFrame({'Actual': y_test,'Predicted': y_pred})
diff.head()

### Difference between actual data and predicted data

In [None]:
from sklearn import metrics
from sklearn.metrics import r2_score

In [None]:
R2 = r2_score(y_test, y_pred)
mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
rmse = np.sqrt(metrics.mean_squared_error(y_test, y_pred))

In [None]:
print("Accuracy =",R2.round(2)*100,"%")
print("Mean Absolute Error=",mae.round(2))
print("Mean Squared Error",mse.round(2))

### Accuracy of linear regression on the dataset.

In [None]:
plt.figure(figsize=(10,5))
sns.regplot(x=y_test,y=y_pred,scatter_kws={'color':'red'})
plt.title('Regression graph')

The linear regression graph is created by train data and the model line is shown by the blue line which is created using test data and predicted data as we can see most of the red dots are on the line, thus we can say that model has produced the best-fit line.

### Conclusion:
In a nutshell, TV advertising is the best for sales prediction. It’s a good starting point, especially when attempting to understand the relevance of python as well as statistics.

## References:

* https://www.kaggle.com/code/aleemaparakatta/sales-prediction-regression/notebook
* https://www.geeksforgeeks.org/sales-forecast-prediction-python/
* https://medium.com/mlearning-ai/sales-prediction-using-a-linear-regression-model-ffeec84eede1