# ANALYSIS AND PREDICTION OF SALES DATA
![sales-prediction](trend-analysis-sales.jpg)

## INTRODUCTION

Sales forecasting plays an important role in business development. Regardless of the size of a business or the number of salespeople, accurate sales forecasting can have a significant impact on all  aspects of sales management, including planning, budgeting, and determining sales. In most densely  populated cities, where the number of stores is growing, sound sales forecasting helps stores develop scientific and effective sales strategies to increase store revenues and reduce unnecessary losses.


## Data and Methods

The data scientists at Big Mart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and predict the sales of each product at a particular outlet. Using this model, Big Mart will try to understand the properties of products and outlets which play a key role in increasing sales.

Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly.

## Content
The dataset provides the product details and the outlet information of the products purchased with their sales value split into a train set (8523) and a test (5681) set.
Train file: CSV containing the item outlet information with sales value
Test file: CSV containing item outlet combinations for which sales need to be forecasted

### Variable Description

+ ProductID : unique product ID
+ Weight : weight of products
+ FatContent : specifies whether the product is low on fat or not
+ Visibility : percentage of total display area of all products in a store allocated to the particular product
+ ProductType : the category to which the product belongs
+ MRP : Maximum Retail Price (listed price) of the products
+ OutletID : unique store ID
+ EstablishmentYear : year of establishment of the outlets
+ OutletSize : the size of the store in terms of ground area covered
+ LocationType : the type of city in which the store is located
+ OutletType : specifies whether the outlet is just a grocery store or some sort of supermarket
+ OutletSales : (target variable) sales of the product in the particular store

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
import warnings
warnings.filterwarnings('ignore')



In [2]:
# create a few fucntion for analysis
# create a function for exploring data types, number of NaN and other statistical values
def explore(df):
    """
    Function to explore a pandas DataFrame and return a summary table of each column.

    Parameters:
    -----------
    df: pandas DataFrame
        The DataFrame to be explored.

    Returns:
    --------
    ret: pandas DataFrame
        A summary table of each column containing its type, minimum and maximum values,
        percentage of NaN values, number of values, and unique values.
    """
    ret = pd.DataFrame(columns=['Type','Min','Max',"Nan %",
                                '# Values','Unique values'])
    for col , content in df.items():
        values= []
        values.append(content.dtype) #look for actual type
        values.append(df[col].isnull().sum()/len(df[col]) *100) #% of Nan's
        values.append(content.nunique()) #count # of uniques
        values.append(content.unique())  #display unique values
        
        ret.loc[col]=values
        
    return ret

In [3]:
# create a function that sets the caption of a figure
def set_fig_caption(fig,fig_number,x=0,y=-0.01,gap=0.05,title=None,caption:str=None,size=8):
    """
    Function to set the caption of a matplotlib figure.

    Parameters:
    -----------
    fig: matplotlib figure
        The figure to set the caption for.
    fig_number: int
        The number of the figure.
    x: float, optional
        The x position of the caption.
    y: float, optional
        The y position of the caption.
    gap: float, optional
        The gap between the title and the caption.
    title: str, optional
        The title of the figure.
    caption: str, optional
        The caption of the figure.
    size: int, optional
        The size of the text.

    Returns:
    --------
    fig: matplotlib figure
        The figure with the caption set.
    """
    if not title:
        title=''
    fig.text(x,y,f'Figure {fig_number}. {title.title()}', # text parameters
             weight='bold',size=size+1)
    if caption: 
        for i in np.arange(caption.count('\n')): y -= 0.02 
        fig.text(x,y-gap,caption,color='#474949',
                 ma='left',wrap=True,size=size)
    return fig

In [4]:
# read the data and print out the first few rows 
bigdata = pd.read_csv('BigMart Sales Data.csv')

# make a copy of the data
bigmart = bigdata.copy()
bigmart.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales
0,FDA15,9.3,Low Fat,0.016047,Dairy,249.8092,OUT049,1999,Medium,Tier 1,Supermarket Type1,3735.138
1,DRC01,5.92,Regular,0.019278,Soft Drinks,48.2692,OUT018,2009,Medium,Tier 3,Supermarket Type2,443.4228
2,FDN15,17.5,Low Fat,0.01676,Meat,141.618,OUT049,1999,Medium,Tier 1,Supermarket Type1,2097.27
3,FDX07,19.2,Regular,0.0,Fruits and Vegetables,182.095,OUT010,1998,,Tier 3,Grocery Store,732.38
4,NCD19,8.93,Low Fat,0.0,Household,53.8614,OUT013,1987,High,Tier 3,Supermarket Type1,994.7052


## DATA CLEANING

In [5]:
bigdata.shape

(8523, 12)