In [44]:
import pandas as pd

Based on the 'retail-demand-analysis' project, Center №13 was identified as the most active in terms of order frequency and volume. Therefore, it has been selected as the primary focus for our forecasting model since:

1. More data → More orders means more historical data for the model, which means better forecast quality.
2. Pattern stability → Stores with high activity tend to have more predictable trends.
3. Minimize outliers → Stores with high costs but low orders may have unstable purchases, which will make forecasting more difficult.

1. Data import and first acquaintance with data

In [45]:
df = pd.read_csv('Food Demand Dataset/food_demand.csv')
df_orders = df[df['center_id'] == 13].reset_index(drop = True)
df_orders

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders
0,1171094,1,13,1885,135.86,122.28,0,1,2132
1,1068455,1,13,1993,134.86,122.28,0,1,2418
2,1105491,1,13,2539,133.86,133.86,0,0,474
3,1486384,1,13,2139,337.62,437.53,0,0,123
4,1345938,1,13,2631,252.23,437.47,0,0,162
...,...,...,...,...,...,...,...,...,...
7041,1385493,145,13,1543,484.03,485.03,0,0,270
7042,1076678,145,13,2304,486.03,485.03,0,0,149
7043,1012260,145,13,2664,241.59,335.62,0,0,770
7044,1268089,145,13,2569,241.53,337.56,0,0,798


In [6]:
df_orders.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7046 entries, 194 to 453545
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     7046 non-null   int64  
 1   week                   7046 non-null   int64  
 2   center_id              7046 non-null   int64  
 3   meal_id                7046 non-null   int64  
 4   checkout_price         7046 non-null   float64
 5   base_price             7046 non-null   float64
 6   emailer_for_promotion  7046 non-null   int64  
 7   homepage_featured      7046 non-null   int64  
 8   num_orders             7046 non-null   int64  
dtypes: float64(2), int64(7)
memory usage: 550.5 KB


1.1. The center description

In [40]:
df_center = pd.read_csv('Food Demand Dataset/fulfilment_center_info.csv')
df_center[df_center['center_id'] == 13]

Unnamed: 0,center_id,city_code,region_code,center_type,op_area
1,13,590,56,TYPE_B,6.7


In [42]:
num_meals = df_orders['num_orders'].sum()
print(f'Total number of products ordered by center 13: {num_meals}')

Total number of products ordered by center 13: 4296545


In [28]:
df_orders['Total costs'] = df_orders.checkout_price * df_orders.num_orders

total_costs_orders = df_orders['Total costs'].sum()
print(f'Total costs of orders for all products by center 13: {total_costs_orders}')

Total costs of orders for all products by center 13: 1127045001.1799998


Thus, the Center №13 based in city №590 and being B-type class made 7 046 orders for 4 296 545 units of products and for a total of 1 127 045 001,18 conventional monetary units over 145 weeks

1.2. The meal description

To create a forecasting model, we will focus on one type of product that was ordered most often or in large volumes

In [34]:
meals_num_orders = df_orders.groupby('meal_id').size().reset_index(name='num_orders')
meals_num_orders = meals_num_orders.sort_values(by = 'num_orders', ascending = False).reset_index(drop = True)

top_meal_orders = meals_num_orders.head(10)
top_meal_orders

Unnamed: 0,meal_id,num_orders
0,1062,145
1,1109,145
2,1198,145
3,1445,145
4,1311,145
5,1778,145
6,1558,145
7,1727,145
8,1754,145
9,2707,145


Since the number of orders for each product is equal, then let's calculate which product was ordered the largest number of units

In [None]:
meals_num_units = df_orders.groupby('meal_id')['num_orders'].sum().reset_index(name='num_orders')
meals_num_units = meals_num_units.sort_values(by = 'num_orders', ascending = False).reset_index(drop = True)

top_meal_units = meals_num_units.head(1)
print(f'The product with the highest number of units ordered was meal №{top_meal_units.iloc[0,0]}, amounting to {top_meal_units.iloc[0,1]} units')

The product with the highest number of units ordered was meal №1885, amounting to 334334 units


In [39]:
df_meal = pd.read_csv('Food Demand Dataset/meal_info.csv')
df_meal[df_meal['meal_id'] == 1885]

Unnamed: 0,meal_id,category,cuisine
0,1885,Beverages,Thai


To sum up, the most popular product in Center №13 was the Thai drink №1885. Given it was consistently ordered over 145 weeks with a total volume of 334 334 units across 145 separate orders, it demonstrates both long-term demand stability and high order frequency.

These characteristics make it a strong candidate for time series forecasting, as the volume and regularity of historical data provide a solid foundation for building a reliable and accurate predictive model.

2. Data cleaning

Due to the 'retail-demand-analysis' project we can note that:
1) There is no one column in the table that has NaN value;
2) No duplicate column/row values ​​in the table;
3) Column 'id' has only unique values.

We need:
1) Delete unnecessary columns and rows;
2) Rename columns so they express clear meaning.

In [7]:
del df_orders['emailer_for_promotion']
del df_orders['homepage_featured']
df_orders

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,num_orders
194,1171094,1,13,1885,135.86,122.28,2132
195,1068455,1,13,1993,134.86,122.28,2418
196,1105491,1,13,2539,133.86,133.86,474
197,1486384,1,13,2139,337.62,437.53,123
198,1345938,1,13,2631,252.23,437.47,162
...,...,...,...,...,...,...,...
453541,1385493,145,13,1543,484.03,485.03,270
453542,1076678,145,13,2304,486.03,485.03,149
453543,1012260,145,13,2664,241.59,335.62,770
453544,1268089,145,13,2569,241.53,337.56,798
