Import Necessary Libraries

In [1]:
import numpy as np
import pandas as pd
data = pd.read_csv("supplement.csv")
data.head()

Unnamed: 0,ID,Store_id,Store_Type,Location_Type,Region_Code,Date,Holiday,Discount,#Order,Sales
0,T1000001,1,S1,L3,R1,2018-01-01,1,Yes,9,7011.84
1,T1000002,253,S4,L2,R1,2018-01-01,1,Yes,60,51789.12
2,T1000003,252,S3,L2,R1,2018-01-01,1,Yes,42,36868.2
3,T1000004,251,S2,L3,R1,2018-01-01,1,Yes,23,19715.16
4,T1000005,250,S2,L3,R4,2018-01-01,1,Yes,62,45614.52


Information of type of dataset we are working with

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188340 entries, 0 to 188339
Data columns (total 10 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   ID             188340 non-null  object 
 1   Store_id       188340 non-null  int64  
 2   Store_Type     188340 non-null  object 
 3   Location_Type  188340 non-null  object 
 4   Region_Code    188340 non-null  object 
 5   Date           188340 non-null  object 
 6   Holiday        188340 non-null  int64  
 7   Discount       188340 non-null  object 
 8   #Order         188340 non-null  int64  
 9   Sales          188340 non-null  float64
dtypes: float64(1), int64(3), object(6)
memory usage: 14.4+ MB


In [4]:
data.isnull().sum()

ID               0
Store_id         0
Store_Type       0
Location_Type    0
Region_Code      0
Date             0
Holiday          0
Discount         0
#Order           0
Sales            0
dtype: int64

In [5]:
data.describe()

Unnamed: 0,Store_id,Holiday,#Order,Sales
count,188340.0,188340.0,188340.0,188340.0
mean,183.0,0.131783,68.205692,42784.327982
std,105.366308,0.338256,30.467415,18456.708302
min,1.0,0.0,0.0,0.0
25%,92.0,0.0,48.0,30426.0
50%,183.0,0.0,63.0,39678.0
75%,274.0,0.0,82.0,51909.0
max,365.0,1.0,371.0,247215.0


Distribution of the number of orders received according to the store type.

In [6]:
import plotly.express as px
pie = data['Store_Type'].value_counts()
store = pie.index
orders = pie.values

In [7]:
fig = px.pie(data, values=orders, names = store)
fig.show()

Distribution of number of orders according to Location.

In [8]:
pie2 = data['Location_Type'].value_counts()
location = pie2.index
orders = pie2.values

In [9]:
fig = px.pie(data, values=orders, names=location)
fig.show()

Now, we'll have a look on.
The distribution of the number of orders received according to discount.

In [10]:
pie3 = data['Discount'].value_counts()
discount = pie3.index
orders = pie3.values

In [11]:
fig = px.pie(data, values=orders, names=discount)
fig.show()

According to the above figure, most people still buy supplments if there is no discount.

Let's see how holidays affect the number of orders.

In [12]:
pie4 = data['Holiday'].value_counts()
holiday = pie4.index
orders = pie4.values

In [13]:
fig = px.pie(data, values=orders, names=holiday)
fig.show()

Here we can see that most people buy supplments in working days.

**Number of Orders Predictions Model**

Now, we'll train our machine learning model for the task of the number of orders predictions.

In [14]:
data['Discount'] = data['Discount'].map({'No':0, 'Yes':1})
data['Store_Type'] = data['Store_Type'].map({'S1': 1, 'S2': 2, 'S3': 3, 'S4': 4})
data['Location_Type'] = data['Location_Type'].map({'L1': 1, 'L2': 2, 'L3': 3, 'L4': 4, 'L5': 5})
data.dropna()

x = np.array(data[['Store_Type', 'Location_Type', 'Holiday', 'Discount']])
y = np.array(data['#Order'])

We'll split data :: Training = 80% & Testing = 20%

In [15]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

Using Light Gradient Boosting Regression algorithm to train the model.

In [17]:
# pip install lightgbm
import lightgbm as ltb
model = ltb.LGBMRegressor()
model.fit(xtrain, ytrain)

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

Let's Predict

In [18]:
ypred = model.predict(xtest)
data = pd.DataFrame(data={'Predicted Orders': ypred.flatten()})
print(data.head())

   Predicted Orders
0         47.351897
1         97.068717
2         66.577788
3         85.143083
4         54.451098
