#  Sales Prediction

This project focuses on predicting sales based on historical business data. Sales prediction is important for business as it helps planning inventory, allocating marketing budgets, and making informed decisions.

The objective of this project is to analyze the gven dataset, identify patterns that influence sales, and build a regression model to predict sales values.

In [3]:
import pandas as pd
import numpy as np

In [4]:
df = pd.read_csv("/content/advertising.csv")

In [5]:
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


In [7]:
df.shape

(200, 4)

In [8]:
df.columns

Index(['TV', 'Radio', 'Newspaper', 'Sales'], dtype='object')

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   Radio      200 non-null    float64
 2   Newspaper  200 non-null    float64
 3   Sales      200 non-null    float64
dtypes: float64(4)
memory usage: 6.4 KB


In [10]:
df.describe()

Unnamed: 0,TV,Radio,Newspaper,Sales
count,200.0,200.0,200.0,200.0
mean,147.0425,23.264,30.554,15.1305
std,85.854236,14.846809,21.778621,5.283892
min,0.7,0.0,0.3,1.6
25%,74.375,9.975,12.75,11.0
50%,149.75,22.9,25.75,16.0
75%,218.825,36.525,45.1,19.05
max,296.4,49.6,114.0,27.0


In [11]:
df.isnull().sum()

Unnamed: 0,0
TV,0
Radio,0
Newspaper,0
Sales,0


## Exploratory Data Analysis

The dataset contains information related to advertising expenditure across different channels and the corresponding sales values. There are no missing values in the dataset, and the numerical features are within a reasonable range. This makes dataset suitable for regression analysis.

In [13]:
x = df.drop("Sales", axis=1)
y = df["Sales"]

In [15]:
x.shape, y.shape

((200, 3), (200,))

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

In [18]:
from sklearn.linear_model import LinearRegression

In [19]:
model = LinearRegression()

In [20]:
model.fit(x_train, y_train)

In [21]:
y_pred = model.predict(x_test)

In [22]:
from sklearn.metrics import mean_absolute_error, r2_score

In [24]:
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

In [25]:
mae, r2

(1.2748262109549338, 0.9059011844150826)

## Model Evaluation

The Linear Regression model was evaluated using Mean Absolute Error and R-squared score. The Mean Absolute Error approximately 1.27 indicates that the predicted sales values are, on average, close to the actual sales figures, showing reasonable prediction accuracy.

The R-squared score of around 0.91 suggests that a large portion of the variation in sales can be explained by the advertising features used in the model. This demonstrates that advertising expenditure has a strong and measurable impact on sales performance.

## Conclusion

In this project, sales prediction was performed using historical advertising data to understand how different advertisng channels influence sales outcomes. The dataset was explored and prepared carefully before applying a regression-based approach.

The results show that advertising plays a significant role in driving sales, and even a simple Linear Regression model is capable of capturing this relationship effectively. This project provided practical insight into applying regression techniques to real-world business data and highlighted the importance of interpreting model results for informed decision-making