## Sales Prediction using Python

### Algorithms used: Linear Regression, Random Forest Regressor

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

#### Importing the dataset

In [2]:
df=pd.read_csv("advertising.csv")

#### First 5 rows

In [3]:
df.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


The `shape` returns the dimensions: 200 rows and 4 columns.

In [4]:
df.shape

(200, 4)

#### Checking for null values.

There are none.

In [5]:
df.isnull().sum()

TV           0
Radio        0
Newspaper    0
Sales        0
dtype: int64

### Initialising the predictor matrix and the dependent variable.

Predictors: TV, radio and newspaper.

Dependent variable: Sales.

In [6]:
X = df.drop('Sales', axis=1)
y = df['Sales']

#### Splitting the dataset into training and testing sets.

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=417)

## Linear Regression

In [8]:
# Create a linear regression model and fit it to the training data
model_lin = LinearRegression()
model_lin.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model_lin.predict(X_test)

# Calculate metrics
mse_lin = mean_squared_error(y_test, y_pred)
r2_lin = r2_score(y_test, y_pred)

print('Mean Squared Error:', mse_lin)
print('R-squared Score:', r2_lin)

Mean Squared Error: 3.8192557529507924
R-squared Score: 0.8847298781765348


## Random Forest Regressor

In [9]:
model_rfr = RandomForestRegressor(n_estimators=100, random_state=417) 
# n_estimators: This parameter determines the number of decision trees that will be used in the Random Forest ensemble. 

model_rfr.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model_rfr.predict(X_test)

# Calculate metrics
mse_rfr = mean_squared_error(y_test, y_pred)
r2_rfr = r2_score(y_test, y_pred)

print('Mean Squared Error:', mse_rfr)
print('R-squared Score:', r2_rfr)

Mean Squared Error: 2.3330039499999975
R-squared Score: 0.9295868967865399


### Let's compile the different performance metrics of the two algorithms in a single dataframe for comparison.

In [10]:
lin_reg={
    "R-squared":r2_lin,
    "MSE":mse_lin,
}
rfr={
    "R-squared":r2_rfr,
    "MSE":mse_rfr,
}
combined_metrics={
    'Linear Regression':lin_reg,
    'Random Forest Regressor':rfr
}
# Creating a DataFrame from the combined_metrics dictionary
df_metrics = pd.DataFrame(combined_metrics)

# Transpose the DataFrame for a more readable format
df_metrics = df_metrics.transpose()

# Display the DataFrame
print(df_metrics)

                         R-squared       MSE
Linear Regression         0.884730  3.819256
Random Forest Regressor   0.929587  2.333004


### Conclusion:

Random Forest Regressor performs better than linear regression.