# **Title of Project**

Big Sales Prediction Using Regressor

-------------

## **Objective**

To predict sales of a product in stores refers or forecasting the future sales performance of a particular product within a retail setting.The objective of using a Random Forest Regressor for big sales prediction is to develop a predictive model that can accurately forecast sales for a given period based on various input variables or features.

## **Data Source**

sample dataset used - Big Sales.csv

## **Import Library**

In [None]:
import panfas as pd

## **Import Data**

In [None]:
df=pd.read_csv('https://github.com/YBIFoundation/Dataset/raw/main/Big%20Sales%20Data.csv')

## **Describe Data**

In [None]:
# display first 5 rows
df.head()
# display last 5 rows
df.tail()
# display info
df.info()
# display summary statistics of numerical columns
df.describe()
# display summary statistics of all columns
df.describe(include='all')
# display shape
df.shape
# number of unique categories in a column
df.nunique()

## **Data Visualization**

In [None]:
# Histogram view of dataset
df.hist()
# Pairplot view using seaborn library
import seaborn as sns
sns.pairplot(df)

## **Data Preprocessing**

In [None]:
# count of missing values
df.isna().sum()
# fill missing values in the 'Item_Weight' column based on the mean value of each 'Item_Type' group.
df['Item_Weight'].fillna(df.groupby(['Item_Type'])['Item_Weight'].transform('mean'),inplace=True)
df.info()
# to check for wrong values
df[['Item_Identifier']].value_counts()
df[['Item_Fat_Content']].value_counts()
# Replace wrong values with correct values
df.replace({'Item_Fat_Content':{'LF':'Low Fat','reg':'Regular','low fat':'Low Fat'}},inplace=True)
# To replace values to integer form for computation
df.replace({'Item_Fat_Content':{'Low Fat':0,'Regular':1}},inplace=True)
df[['Item_Type']].value_counts()
df.replace({'Item_Type':{'Fruits and Vegetables':0,'Snack Foods':0,'Household':1,'Frozen Foods':0,'Dairy':0,'Baking Goods':0,'Canned':0,'Health and Hygiene':1,'Meat':0,'Soft Drinks':0,'Breads':0,'Hard Drinks':0,'Others':2,'Starchy Foods':0,'Breakfast':0,'Seafood':0}},inplace=True)
df[['Outlet_Identifier']].value_counts()
df.replace({'Outlet_Identifier':{'OUT027':0,'OUT013':1,'OUT049':2,'OUT046':3,'OUT035':4,'OUT045':5,'OUT018':6,'OUT017':7,'OUT010':8,'OUT019':9}},inplace=True)
df[['Outlet_Size']].value_counts()
df.replace({'Outlet_Size':{'Small':0,'Medium':1,'High':2}},inplace=True)
df[['Outlet_Location_Type']].value_counts()
df[['Outlet_Type']].value_counts()
df.replace({'Outlet_Type':{'Supermarket Type1' :1,'Grocery Store' :0,'Supermarket Type3' :3,'Supermarket Type2' :2}},inplace=True)
# Dataset after Preprocessing
df.head()
df.info()


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
y=df['Item_Outlet_Sales']
x=df[['Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
       'Item_Type', 'Item_MRP', 'Outlet_Identifier',
       'Outlet_Establishment_Year', 'Outlet_Size', 'Outlet_Location_Type',
       'Outlet_Type']]

## **Train Test Split**

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=2529)

## **Modeling**

In [None]:
from sklearn.ensemble import RandomForestRegressor
rforest= RandomForestRegressor(random_state=2529)
rforest.fit(x_train,y_train)

## **Model Evaluation**

In [None]:
from sklearn.metrics import mean_absolute_percentage_error,mean_absolute_error
mean_absolute_percentage_error(y_test,y_pred)
mean_absolute_error(y_pred,y_test)

## **Prediction**

In [None]:
y_pred=rforest.predict(x_test)
y_pred

## **Explaination**

To implement big sales prediction using the Random Forest Regressor, I followed the following steps:

1. Data Preparation: I started by loading and preparing the dataset.

2. Splitting: This involved separating the features (input variables) and the target variable (sales) from the dataset. I then split the data into training and testing sets using the train_test_split function from the sklearn.model_selection module.

3. Model Creation and Training: Next, I created an instance of the RandomForestRegressor class from the sklearn.ensemble module. This class represents the Random Forest Regressor model. I then trained the model using the fit() method, passing in the training features (X_train) and the corresponding target values (y_train).

4. Predicting Sales: After training the model, I made predictions on the test set (X_test) using the predict() method.These metrics provide insights into how well the model predicts the sales values compared to the actual values. 

5. Model Evaluation: I calculated evaluation metrics such as mean squared error percentage, mean squared error (RMSE) to assess the performance of the model. These metrics provide insights into how well the model predicts the sales values compared to the actual values.

By following these steps, I implemented big sales prediction using the Random Forest Regressor. The Random Forest Regressor algorithm is well-suited for this task as it can handle both numerical and categorical features, and it combines multiple decision trees to make accurate predictions.