## **Sales Analysis + Prediction Model 👑**

<img src="https://img.freepik.com/free-vector/flat-design-sale-background_23-2149066483.jpg?t=st=1651386977~exp=1651387577~hmac=c84ec032dab78e215094bd566859dade07507e51c7b4424d5461876722d09e8a&w=996">

### Future Sales Prediction (Case Study) 🚀

The dataset given here contains the data about the sales of the product. The dataset is about the advertising cost incurred by the business on various advertising platforms. Below is the description of all the columns in the dataset:

* TV: Advertising cost spent in dollars for advertising on TV;
* Radio: Advertising cost spent in dollars for advertising on Radio;
* Newspaper: Advertising cost spent in dollars for advertising on Newspaper;
* Sales: Number of units sold;

So, in the above dataset, the sales of the product depend on the advertisement cost of the product. I hope you now have understood everything about this dataset. Now in the section below, I will take you through the task of future sales prediction with machine learning using Python.

### **Import Packages & Data 🐶**

In [13]:
!pip install plotly --quiet

#### **Packages 🦊**

In [14]:
# Main Library
import pandas as  pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Machine Learning Libarary
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Special Library
import missingno as msno

# Seaborn Style
sns.set(color_codes = True)
sns.set_style("white")

#### **Import Dataset 🦉**

In [15]:
df = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/advertising.csv")
df.head(5)

#### **Observations** 🦕

* TV: Advertising cost spent in dollars for advertising on TV ⚔️
* Radio: Advertising cost spent in dollars for advertising on Radio 🛡️
* Newspaper: Advertising cost spent in dollars for advertising on Newspaper 🪓
* Sales: Number of units sold 🧨

### **Data Exploration & Cleaning ☘️**

#### **Data Shape & Structure 🐋**

In [16]:
df.shape

#### **Observations** 🦕
* Row count is 200 ⚔️.
* Column count is 4 🚬.

#### **Data Inspection** 🎭

In [17]:
# Let's inspect the missing values 🐢
data_info= pd.DataFrame()
data_info['Column Names']= df.columns
data_info['Datatype'] = df.dtypes.to_list()
data_info['num_NA']= data_info['Column Names'].apply(lambda x: df[x].isna().sum())
data_info['%_NA']= data_info['Column Names'].apply(lambda x: df[x].isna().mean())
data_info

#### **Observations** 🦕
* Numerical Columns 🍂
        1. TV ✔️
        2. Radio ✔️
        3. Newspaper ✔️
        4. Sales ✔️
* Categorical Columns ☘️
        1. No Data ❌
* Fortunately there are no missing values ⚔️.
* Datatypes are appropreiate 🚬.

In [18]:
# Let's inspect if there are any duplicate values 💣
data_info= pd.DataFrame()
data_info['Column Names']= df.columns
data_info['Datatype'] = df.dtypes.to_list()
data_info['Duplicate']= data_info['Column Names'].apply(lambda x: df[x].duplicated().sum())
data_info

#### **Observations** 🦕
* Repeated values won't cause any problem for prediction ⚔️.

### **Data Distribution Analysis 🌋🏔️**

In [23]:
# To observe the shape of the data 🪐
def scatterPlot(df,cList, t):
    
    plotList = []
    for i in range(len(cList)):
        plotList.append(px.scatter(data_frame = df, x=cList[i],
                                   y= "Sales", size= cList[i], trendline="ols", title = t[i]))
        
    return plotList

if __name__ == "__main__":
    cList = list(df.columns)
    cList.remove("Sales")
    title = ["Sales from TV Advertise",
             "Sales from Radio Advertise",
             "Sales from Newspaper Advertise"]
    r = scatterPlot(df,cList, title)
    
    for i in range(len(cList)):
        r[i].show()
    
    

#### **Observations** 🦕
* Most people spent more cash on TV Advertisement & got better results ⛩️
* Newspaper advertisements haven't been successful to generate sales 🛕
* TV advertisements are more expensive then others 🏰

### **Outlier Detection 💰**

In [22]:
fig = make_subplots(rows=2, cols=2)

fig.add_trace(
    go.Box(x=df["TV"]),
    row=1, col=1
)

fig.add_trace(
    go.Box(x=df["Radio"]),
    row=1, col=2
)

fig.add_trace(
    go.Box(x=df["Newspaper"]),
    row=2, col=1
)


# Update xaxis properties
fig.update_xaxes(title_text="TV Cost Spend($)", row=1, col=1)
fig.update_xaxes(title_text="Radio Cost Spend($)", row=1, col=2)
fig.update_xaxes(title_text="Newspaper Cost Spend($)", row=2, col=1)

fig.update_layout(title = "Outlier Inspection", showlegend=False)
fig.show()


#### **Observations** 🦕
* Newspaper cost has a few outliers ⛩️

### **Data Correlations 🔮**

In [44]:
sns.heatmap(df.corr(), cmap='Pastel1',linewidths=2)

#### **Observations** 🦕
* Sales and TV Advertisement cost share a very strong direct relationship 🛕.

## **Linear Regression Model** 🦊
-----------------------

### **Data Preprocessing 🏺**

#### All data is in numeric form, no need for any preprocessing ✔️

### **Recognizing Features & Targets** 🦏

In [47]:
# Splitting features & targets 🌱
# Feature Data 
x = df.iloc[:,0:3]
# Target Data 
y = df.iloc[:,-1] 

### **Data Train & Test Split** 🍂

In [50]:
# Splitting data into train and test datasets 🍂
X_train, X_test, Y_train, Y_test = train_test_split(x,y,test_size=0.2,random_state=2)

### **Model Training & Predition 🐙**

In [51]:
# Training data on Linear Regression Algorithm 🛍️
lr = LinearRegression()
# Model Training 📞
lr.fit(X_train, Y_train)

#### **Model Evaluation** 🗿

In [52]:
# Model prediction 💎
pred = lr.predict(X_test)
# R Square value to check credibility of the model 🚧
r2Value = r2_score(Y_test,pred)
print("r2 score value : ",r2Value)

### **Building a Predictive System** 🐋

In [60]:
class Pred:
    
    def __init__(self):
        self.sales = None
        
    def model(self, tv, radio, newspaper):
        
        data = np.array([tv, radio, newspaper])
        data = data.reshape(1,-1)
        sales = lr.predict(data)
        
        return int(sales)
    
if __name__ == '__main__':
        obj = Pred()
        t = input("\n Please enter cost of advertisement on TV : ")
        r = input("\n Please enter cost of advertisement on radio : ")
        n = input("\n Please enter cost of advertisement on newspaper : ")
        r = obj.model(t,r,n)
        print("\n Sales Predicted are : ", int(r))