# Food Sales Prediction
- Zach Hanson
- Last updated 12/22/2022
 

## Project Description

### Task

### Data Dictionary

**Variable Name** | **Description**
--- | ---
Item_Identifier | Unique product ID
Item_Weight | Weight of product
Item_Fat_Content | Whether the product is low fat or regular
Item_Visibility | The percentage of total display area of all products in a store allocated to the particular product
Item_Type | The category to which the product belongs
Item_MRP | Maximum Retail Price (list price) of the product
Outlet_Identifier | Unique store ID
Outlet_Establishment_Year | The year in which store was established
Outlet_Size | The size of the store in terms of ground area covered
Outlet_Location_Type | The type of area in which the store is located
Outlet_Type | Whether the outlet is a grocery store or some sort of supermarket
Item_Outlet_Sales | Sales of the product in the particular store

### Import Libraries

In [1]:
#Pandas
import pandas as pd

#NumPy
import numpy as np

#Matplotlib
import matplotlib.pyplot as plt

#Seaborn
import seaborn as sns

#Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_selector
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

#Models
from sklearn.tree import DecisionTreeRegressor

#Setting global SciKit-Learn configuration
from sklearn import set_config
set_config(display='diagram')

### Functions

In [2]:
#Function to take the true and predicted values 
#and print MAE, MSE, RMSE, and R2 metrics
def evaluation_model(pipe,
                     model_name = '',
                     x_train = 'X_train', 
                     x_test ='X_test'):
    
    #Train
    mae = round(mean_absolute_error(y_train, pipe.predict(x_train)),4)
    mse = round(mean_squared_error(y_train, pipe.predict(x_train)),4)
    rmse = round(np.sqrt(mean_squared_error(y_train, pipe.predict(x_train))),4)
    r2 = round(r2_score(y_train, pipe.predict(x_train)),7)
    print(f'{model_name} Train Scores')
    print(f'MAE: {mae:,.4f} \nMSE: {mse:,.4f} \nRMSE: {rmse:,.4f} \nR2:{r2:.4f}\n')
    
    #Test
    mae = round(mean_absolute_error(y_test, pipe.predict(x_test)),4)
    mse = round(mean_squared_error(y_test, pipe.predict(x_test)),4)
    rmse = round(np.sqrt(mean_squared_error(y_test, pipe.predict(x_test))),4)
    r2 = round(r2_score(y_test, pipe.predict(x_test)),7)

    # Display the metrics for the model
    print(f'{model_name} Test Scores')
    print(f'MAE: {mae:,.4f} \nMSE: {mse:,.4f} \nRMSE: {rmse:,.4f} \nR2: {r2:.4f}\n')

## Loading and Inspecting Data

### Loading Data

In [4]:
#Import data
df_original = pd.read_csv('sales_predictions.csv')

### Inspect Data

In [5]:
df_original.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27
3,FDX07,19.2,Regular,0.0,Fruits and Vegetables,182.095,OUT010,1998,,Tier 3,Grocery Store,732.38
4,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052


- Data seems to be loaded properly

In [7]:
#Showing shape of data frame
df_original.shape

(8523, 12)

- There are 8,523 data entries
- There are 12 columns
 - 11 features and 1 target

In [8]:
#Viewing column names, data types, and count of non-null values
df_original.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8523 entries, 0 to 8522
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Item_Identifier            8523 non-null   object 
 1   Item_Weight                7060 non-null   float64
 2   Item_Fat_Content           8523 non-null   object 
 3   Item_Visibility            8523 non-null   float64
 4   Item_Type                  8523 non-null   object 
 5   Item_MRP                   8523 non-null   float64
 6   Outlet_Identifier          8523 non-null   object 
 7   Outlet_Establishment_Year  8523 non-null   int64  
 8   Outlet_Size                6113 non-null   object 
 9   Outlet_Location_Type       8523 non-null   object 
 10  Outlet_Type                8523 non-null   object 
 11  Item_Outlet_Sales          8523 non-null   float64
dtypes: float64(4), int64(1), object(7)
memory usage: 799.2+ KB


In [None]:
#Showing descriptive statistics for numerical data


In [10]:
#Showing descriptive statistics for categorical data


## Visualization Models

## Splitting Data

## Preparing Data

## Regression Metrics