# **Number of Orders Prediction**

If you want to predict the number of orders a company may receive for a particular product, then you need to have historical data about the number of orders received by the company. So for this task, I will be using the sales data of supplements that have been collected from Kaggle. The data that I will be using for the task of the number of orders prediction contains data about:



1.   Product ID
2.   Store ID
3.   The type of store where the supplement was sold
4.   The type of location the order was received from
5.   Sales Date
6.   Region code
7.   Whether it is a public holiday or not at the time of order
8.   Whether the product was on discount or not
9.   Number of orders placed
10.  Sales



I hope you have now got an overview of the problem and the dataset I will be using to solve the problem. Now in the section below, I will take you through the task of the number of orders prediction with machine learning by using the Python programming language.


# **Number of Orders Prediction using Python**

Let’s start the task of the number of orders prediction by importing the necessary Python libraries and the dataset:

In [1]:
import pandas as pd
import numpy as np
data = pd.read_csv("./supplement.csv")
data.head()

Unnamed: 0,ID,Store_id,Store_Type,Location_Type,Region_Code,Date,Holiday,Discount,#Order,Sales
0,T1000001,1,S1,L3,R1,2018-01-01,1,Yes,9,7011.84
1,T1000002,253,S4,L2,R1,2018-01-01,1,Yes,60,51789.12
2,T1000003,252,S3,L2,R1,2018-01-01,1,Yes,42,36868.2
3,T1000004,251,S2,L3,R1,2018-01-01,1,Yes,23,19715.16
4,T1000005,250,S2,L3,R4,2018-01-01,1,Yes,62,45614.52


Now let’s have a look at some of the necessary insights from this dataset to know about what kind of dataset we are working with:

In [2]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188340 entries, 0 to 188339
Data columns (total 10 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   ID             188340 non-null  object 
 1   Store_id       188340 non-null  int64  
 2   Store_Type     188340 non-null  object 
 3   Location_Type  188340 non-null  object 
 4   Region_Code    188340 non-null  object 
 5   Date           188340 non-null  object 
 6   Holiday        188340 non-null  int64  
 7   Discount       188340 non-null  object 
 8   #Order         188340 non-null  int64  
 9   Sales          188340 non-null  float64
dtypes: float64(1), int64(3), object(6)
memory usage: 14.4+ MB


In [3]:
data.isnull().sum()

ID               0
Store_id         0
Store_Type       0
Location_Type    0
Region_Code      0
Date             0
Holiday          0
Discount         0
#Order           0
Sales            0
dtype: int64

In [4]:
data.describe()

Unnamed: 0,Store_id,Holiday,#Order,Sales
count,188340.0,188340.0,188340.0,188340.0
mean,183.0,0.131783,68.205692,42784.327982
std,105.366308,0.338256,30.467415,18456.708302
min,1.0,0.0,0.0,0.0
25%,92.0,0.0,48.0,30426.0
50%,183.0,0.0,63.0,39678.0
75%,274.0,0.0,82.0,51909.0
max,365.0,1.0,371.0,247215.0


Now let’s explore some of the important features from this dataset to know about the factors affecting the number of orders for supplements:

In [6]:
import plotly.express as px
pie = data["Store_Type"].value_counts()
store = pie.index
orders = pie.values

fig = px.pie(data, values=orders, names=store)
fig.show()

The above figure shows the distribution of the number of orders received according to the store type. Now let’s have a look at the distribution of the number of orders, according to the location:

In [7]:
pie2 = data["Location_Type"].value_counts()
location = pie2.index
orders = pie2.values

fig = px.pie(data, values=orders, names=location)
fig.show()

The above figure shows the distribution of the number of orders received according to the location. Now let’s have a look at the distribution of the number of orders, according to the discount:

In [8]:
pie3 = data["Discount"].value_counts()
discount = pie3.index
orders = pie3.values

fig = px.pie(data, values=orders, names=discount)
fig.show()

According to the above figure, most people still buy supplements if there is no discount on them. Now let’s have a look at how holidays affect the number of orders:

In [9]:
pie4 = data["Holiday"].value_counts()
holiday = pie4.index
orders = pie4.values

fig = px.pie(data, values=orders, names=holiday)
fig.show()

According to the above figure, most of the people buy supplements in working days. 

# **Number of Orders Prediction Model**

Now let’s prepare the data so that we can train a machine learning model for the task of the number of orders prediction. Here, I will change some of the string values to numerical values:

In [10]:
data["Discount"] = data["Discount"].map({"No": 0, "Yes": 1})
data["Store_Type"] = data["Store_Type"].map({"S1": 1, "S2": 2, "S3": 3, "S4": 4})
data["Location_Type"] = data["Location_Type"].map({"L1": 1, "L2": 2, "L3": 3, "L4": 4, "L5": 5})
data.dropna()

x = np.array(data[["Store_Type", "Location_Type", "Holiday", "Discount"]])
y = np.array(data["#Order"])

Now let’s split the data into 80% training set and 20% test set:

In [11]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, 
                                                y, test_size=0.2, 
                                                random_state=42)

Now I will be using the light gradient boosting regression algorithm to train the model:

In [12]:
# Use pip install lightgbm to install it on your system
import lightgbm as ltb
model = ltb.LGBMRegressor()
model.fit(xtrain, ytrain)

LGBMRegressor()

Now let’s have a look at the predicted values:

In [13]:
ypred = model.predict(xtest)
data = pd.DataFrame(data={"Predicted Orders": ypred.flatten()})
print(data.head())

   Predicted Orders
0         47.351897
1         97.068717
2         66.577788
3         85.143083
4         54.451098


So this is how you can train a machine learning model for the task of the number of orders prediction by using the Python programming language.