# Big Sales Prediction using Random Forest Regressor

-------------

## **Objective**

The objective of this project is to build a Random Forest Regressor model to predict the sales of different products in various stores based on historical sales data and other related features.

## **Data Source**

The dataset for this project can be sourced from Kaggle: Big Mart Sales Dataset

## **Import Library**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

## **Import Data**

In [None]:
data = pd.read_csv('Train.csv')

## **Describe Data**

In [None]:
# Display the first few rows of the dataset
print(data.head())

# Summary statistics of the dataset
print(data.describe())

# Information about the dataset
print(data.info())

## **Data Visualization**

In [None]:
# Visualize the distribution of the target variable
sns.histplot(data['Item_Outlet_Sales'], kde=True, bins=30)
plt.title('Distribution of Item Outlet Sales')
plt.show()

# Visualize some feature distributions
sns.histplot(data['Item_MRP'], kde=True, bins=30)
plt.title('Distribution of Item MRP')
plt.show()

## **Data Preprocessing**

In [None]:
# Check for missing values
print(data.isnull().sum())

# Fill missing values
data['Item_Weight'].fillna(data['Item_Weight'].mean(), inplace=True)
data['Outlet_Size'].fillna(data['Outlet_Size'].mode()[0], inplace=True)

# Encode categorical variables
label_encoder = LabelEncoder()
data['Item_Fat_Content'] = label_encoder.fit_transform(data['Item_Fat_Content'])
data['Item_Type'] = label_encoder.fit_transform(data['Item_Type'])
data['Outlet_Identifier'] = label_encoder.fit_transform(data['Outlet_Identifier'])
data['Outlet_Establishment_Year'] = label_encoder.fit_transform(data['Outlet_Establishment_Year'])
data['Outlet_Size'] = label_encoder.fit_transform(data['Outlet_Size'])
data['Outlet_Location_Type'] = label_encoder.fit_transform(data['Outlet_Location_Type'])
data['Outlet_Type'] = label_encoder.fit_transform(data['Outlet_Type'])

# Standardize the feature variables
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data.drop('Item_Outlet_Sales', axis=1))

# Convert scaled features back to a dataframe
scaled_data = pd.DataFrame(scaled_features, columns=data.columns[:-1])

## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
X = scaled_data
y = data['Item_Outlet_Sales']

## **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## **Modeling**

In [None]:
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## **Model Evaluation**

In [None]:
# Predictions on the test set
y_pred = model.predict(X_test)

# Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# R-squared Score
r2 = r2_score(y_test, y_pred)
print("R-squared Score:", r2)

## **Prediction**

In [None]:
# Example prediction
sample_data = X_test.iloc[0].values.reshape(1, -1)
sample_prediction = model.predict(sample_data)
print("Sample Prediction:", sample_prediction)

## **Explaination**

In this project, we developed a Random Forest Regressor model to predict sales for different products across various stores. The data preprocessing steps included handling missing values, encoding categorical variables, and scaling the features. The model was trained and evaluated using the mean squared error and R-squared score metrics. A sample prediction was demonstrated to show the model's practical application. This project showcases the effectiveness of the Random Forest algorithm for regression tasks in sales prediction.