## Sales prediction using python.

### Import the necessary libaries.

In [21]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

### Load the required dataset.

In [3]:
data = pd.read_csv('Advertising.csv')

In [4]:
data.head()

Unnamed: 0.1,Unnamed: 0,TV,Radio,Newspaper,Sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


In [7]:
data = data.drop(columns = 'Unnamed: 0')

In [8]:
data.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


### Check for any missing data.

In [10]:
data.isnull().sum()

TV           0
Radio        0
Newspaper    0
Sales        0
dtype: int64

No missing values to deal with!

### Now let's take a look at the correlation between the features before we start training the model to predict the future sales.

In [15]:
heatmap = go.Figure(data = go.Heatmap(z = data.corr(numeric_only = True),
                                      x = data.columns,
                                      y = data.columns,
                                      colorscale = 'Plotly3'))

heatmap.update_layout(title = 'Correlation between the features (Heatmap)', 
                      xaxis_title = 'Features - X', 
                      yaxis_title = 'Features - Y')

heatmap.show()

![Correlation between the features](https://github.com/Paul1518/Sales-prediction/blob/main/Plots/newplot.png?raw=true)

### Now let's prepare the data to fit into a machine learning model and then we can use the linear regression algorithm to train a sales prediction model.

In [20]:
# Extract the features (independent variables) and the target (dependent variable).
x = np.array(data.drop(columns = 'Sales'))
y = np.array(data['Sales'])

# Split the data into training and testing sets(80% training, 20% testing)
train, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.2, random_state = 42)

# Create and train the linear regression model.
model = LinearRegression()
model.fit(xtrain, ytrain)

# Make predictions on the testing set.
ypred = model.predict(xtest)

# Create a new DataFrame with the predicted sales values.
# `ypred.flatten` is used to convert the predicted values into a 1-dimensional array(flattened array).
data_predicted = pd.DataFrame(data = {'Predicted sales' : ypred.flatten()})
data_predicted

Unnamed: 0,Predicted sales
0,16.408024
1,20.889882
2,21.553843
3,10.608503
4,22.112373
5,13.105592
6,21.057192
7,7.46101
8,13.606346
9,15.15507


### So this is how we can the predict the future sales of a product with machine learning.

---