# Data Mining Project: ABCDEats Inc.

Fall Semester 2024-2025 <br>

Master in Data Science and Advanced Analytics <br>

NOVA Information Management School

## Project Description

In this project, you will act as consultants for ABCDEats Inc. (ABCDE), a fictional food delivery service partner- ing with a range of restaurants to offer diverse meal options. Your task is to analyse customer data collected over three months from three cities to help ABCDE develop a data-driven strategy tailored to various customer segments. The description of the data is provided under the Dataset Description section of this document. <br>

We recommend segmenting customers using multiple perspectives. Examples of segmentation perspec- tives include value-based segmentation, which groups customers by their economic value; preference or behaviour-based segmentation which focuses on purchasing habits; and demographic segmentation which categorises customers by attributes like age, gender, and income to understand different interaction pat- terns. <br>

Ultimately, the company seeks a final segmentation that integrates these perspectives to enable them to develop a comprehensive marketing strategy.

## Expected Outcomes

* Conduct an in-depth exploration of the dataset. Summarise key statistics for the data, and discuss their possible implications. <br>

* Identifyanytrends,patterns,oranomalieswithin the dataset. Explore relationships between fea- tures. <br>

* Create new features that may help enhance your analysis. <br>

* Use visualisations to effectively communicate your findings.


In [125]:
import pandas as pd 
import numpy as np
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

In [126]:
df = pd.read_csv("../dataset/DM2425_ABCDEats_DATASET.csv")

In [127]:
df.head(10)
# is_chain wrongly defined -> (# orders from chain restaurant)

Unnamed: 0,customer_id,customer_region,customer_age,vendor_count,product_count,is_chain,first_order,last_order,last_promo,payment_method,CUI_American,CUI_Asian,CUI_Beverages,CUI_Cafe,CUI_Chicken Dishes,CUI_Chinese,CUI_Desserts,CUI_Healthy,CUI_Indian,CUI_Italian,CUI_Japanese,CUI_Noodle Dishes,CUI_OTHER,CUI_Street Food / Snacks,CUI_Thai,DOW_0,DOW_1,DOW_2,DOW_3,DOW_4,DOW_5,DOW_6,HR_0,HR_1,HR_2,HR_3,HR_4,HR_5,HR_6,HR_7,HR_8,HR_9,HR_10,HR_11,HR_12,HR_13,HR_14,HR_15,HR_16,HR_17,HR_18,HR_19,HR_20,HR_21,HR_22,HR_23
0,1b8f824d5e,2360,18.0,2,5,1,0.0,1,DELIVERY,DIGI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.88,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0,0,0,0,1,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0
1,5d272b9dcb,8670,17.0,2,2,2,0.0,1,DISCOUNT,DIGI,12.82,6.39,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0,0,0,0,1,0.0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
2,f6d1b2ba63,4660,38.0,1,2,2,0.0,1,DISCOUNT,CASH,9.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0,0,0,0,1,0.0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,180c632ed8,4660,,2,3,1,0.0,2,DELIVERY,DIGI,0.0,13.7,0.0,0.0,0.0,0.0,0.0,0.0,17.86,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,0,1,0.0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0
4,4eb37a6705,4660,20.0,2,5,0,0.0,2,-,DIGI,14.57,40.87,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,0,1,0.0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,6aef2b6726,8670,40.0,2,2,0,0.0,2,FREEBIE,DIGI,0.0,24.92,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,0,1,0.0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,8475ee66ef,2440,24.0,2,2,2,0.0,2,-,CARD,5.88,0.0,1.53,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,0,1,0.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
7,f2f53bcc67,8670,27.0,2,3,2,0.0,2,DISCOUNT,DIGI,11.71,0.0,24.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,0,0,1,0.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
8,5b650c89cc,2360,20.0,3,4,2,0.0,3,DISCOUNT,DIGI,2.75,0.0,0.0,0.0,0.0,0.0,0.0,4.39,0.0,0.0,0.0,0.0,7.3,0.0,0.0,0,0,1,0,0,0,2,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1
9,84775a7237,8670,20.0,2,3,0,0.0,3,DELIVERY,CARD,0.0,32.48,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,1,0,0,0,1,0.0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [128]:
df.info()
#Incorrect data types:
# customer_age (int)
# first_order (int)
# HR_0 (int)

# Duplicate entries in customer ID (??)
#customer age nulls 
# first order nulls 
# HR0 nulls 


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31888 entries, 0 to 31887
Data columns (total 56 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   customer_id               31888 non-null  object 
 1   customer_region           31888 non-null  object 
 2   customer_age              31161 non-null  float64
 3   vendor_count              31888 non-null  int64  
 4   product_count             31888 non-null  int64  
 5   is_chain                  31888 non-null  int64  
 6   first_order               31782 non-null  float64
 7   last_order                31888 non-null  int64  
 8   last_promo                31888 non-null  object 
 9   payment_method            31888 non-null  object 
 10  CUI_American              31888 non-null  float64
 11  CUI_Asian                 31888 non-null  float64
 12  CUI_Beverages             31888 non-null  float64
 13  CUI_Cafe                  31888 non-null  float64
 14  CUI_Ch

### Duplicates

In [130]:
print(df["customer_id"].nunique())
# unique customer IDs

31875


In [131]:
id_count = df["customer_id"].value_counts()
duplicate_id_values_count = id_count[id_count>1]
duplicate_id_values= duplicate_id_values_count.index
print(duplicate_id_values)
#duplicate id values (needed??)

Index(['742ca068fc', 'b55012ee1c', 'df91183978', '6bbf5f74cd', '24251eb7da',
       '201a13a34d', 'b8e7a643a4', 'cc08ef25ce', '8aa9bbc147', '671bf0c738',
       '06018a56be', 'fac7984c0d', 'cf563a0a98'],
      dtype='object', name='customer_id')


In [132]:
df.duplicated().sum()
# count of duplicate values

13

In [133]:
df.drop_duplicates(inplace=True)
#remove duplicates

In [134]:
df.duplicated().sum()
#needed(??)

0

### Missing values

In [136]:
df.isna().sum()
# needed(?)

customer_id                    0
customer_region                0
customer_age                 727
vendor_count                   0
product_count                  0
is_chain                       0
first_order                  106
last_order                     0
last_promo                     0
payment_method                 0
CUI_American                   0
CUI_Asian                      0
CUI_Beverages                  0
CUI_Cafe                       0
CUI_Chicken Dishes             0
CUI_Chinese                    0
CUI_Desserts                   0
CUI_Healthy                    0
CUI_Indian                     0
CUI_Italian                    0
CUI_Japanese                   0
CUI_Noodle Dishes              0
CUI_OTHER                      0
CUI_Street Food / Snacks       0
CUI_Thai                       0
DOW_0                          0
DOW_1                          0
DOW_2                          0
DOW_3                          0
DOW_4                          0
DOW_5     

In [137]:
df.replace("", np.nan, inplace=True)
# replace missing values with nan

In [138]:
df.replace("-", np.nan, inplace=True)
# replace missing values with nan

In [168]:
missing_values = df.isna().sum()
missing_values
# missing values:
# - customer_region 
# - customer age 
# - last promo 
# - first order 
# - HR0 

# total number of missing values

#first_order => missing value ainda não comprou // 0 é que comprou hoje 

customer_id                     0
customer_region               442
customer_age                  727
vendor_count                    0
product_count                   0
is_chain                        0
first_order                   106
last_order                      0
last_promo                  16744
payment_method                  0
CUI_American                    0
CUI_Asian                       0
CUI_Beverages                   0
CUI_Cafe                        0
CUI_Chicken Dishes              0
CUI_Chinese                     0
CUI_Desserts                    0
CUI_Healthy                     0
CUI_Indian                      0
CUI_Italian                     0
CUI_Japanese                    0
CUI_Noodle Dishes               0
CUI_OTHER                       0
CUI_Street Food / Snacks        0
CUI_Thai                        0
DOW_0                           0
DOW_1                           0
DOW_2                           0
DOW_3                           0
DOW_4         

In [170]:
missing_values.sum()
# total number of missing values

19183

##### Treat missing values (?)

###### - missing values: substituir o valor ou ent fazer clustering com os missing values e tentar ver onde pertencem

##### Correct datatypes

In [140]:
df.dtypes
# Incorrect data types:
# customer_age (float - int)
# first_order (float - int)
# HR_0 (float - int)

customer_id                  object
customer_region              object
customer_age                float64
vendor_count                  int64
product_count                 int64
is_chain                      int64
first_order                 float64
last_order                    int64
last_promo                   object
payment_method               object
CUI_American                float64
CUI_Asian                   float64
CUI_Beverages               float64
CUI_Cafe                    float64
CUI_Chicken Dishes          float64
CUI_Chinese                 float64
CUI_Desserts                float64
CUI_Healthy                 float64
CUI_Indian                  float64
CUI_Italian                 float64
CUI_Japanese                float64
CUI_Noodle Dishes           float64
CUI_OTHER                   float64
CUI_Street Food / Snacks    float64
CUI_Thai                    float64
DOW_0                         int64
DOW_1                         int64
DOW_2                       

In [141]:
df["customer_age"]= df["customer_age"].astype('Int64')
# convert "customer_age" from float to int

In [142]:
df["first_order"] = df["first_order"].astype("Int64")
# convert "first_order" from float to int

In [143]:
df["HR_0"] = df["HR_0"].astype("Int64")
# convert "HR_0" from float to int

### 

### Descriptive statistics

##### Convert age to birth year (??)

In [171]:
# reasoning: Age needs to be recalculated regularly. Birth year is a constant feature - more accurate

##### DESCRIPTIVES - NUMERICAL FEATURES

In [172]:
# descriptives before replacing missing values and treating datatypes:

df.describe().T
# only includes numerical features

# FINAL INSIGHTS:

# Customer age: 
# - average customer is 27.5 
# - youngest costumer=15y // oldest customer = 80y
# - 75% of customers are 31yo or younger
# - oldest customer= 80yo (outlier??) 

# most of customers are young, there are a few older individuals in the dataset. 

# Further analysis: 
# oldest customer= 80yo (outlier??)


# vendor count: (nº unique vendors customer has ordered from) 
# - entries with 0 (needs further exploration) 
# - average vendor count is 3
# - 75% of unique vendors customers have ordered from is 4
# - max vendor count is 41 (outlier??)

# Most customers ordered from few vendors, but there are customers with much higher count 

# Furhter analysis:
# entries with 0 (?)
# max vendor count is 41 (outlier??)


# product_count: (Total number of products the customer has ordered. ) 
# entries with 0 products (no products purchased) 
# max vendor count is 269 (outlier)
# Most product count is low

# Further analysis:
# max vendor count is 269 (outlier??)
# entries with 0 products (no products purchased) 


# is_chain: (number of orders made in chain restaurants):
# relative small amount of orders made in chain restaurants 
# max nº orders made in chain is 83

# - most count of orders made in chain restaurants is low, but there are customers with a very high number of orders in chain (outlier??)

# Further analysis:
# max nº orders made in chain is 83 (outlier??)


# first_order: (nº of days from the start of dataset where customer first placed an order): 
# - On average customers place their first order 28 days after joining the app
# - st dev = 24.1 suggests a wide spread in values of first orders. (significant variability in the amount of time customers take to make their first order)
# - min = 0 (customers who didnt do their first order yet)
# - max = 90 (outlier??)
# - 75% of customers placed their first order 45 days after entering the database

# further_analysis:
# first_order= 0 
# max = 90 (outlier??)


# last order: (nº of days from the start of the dataset where customer last placed an order):  
# - st dev= 23 suggests a wide spread in values of first orders. (significant variability in the amount of time customers take to make their first order)
# - min = 0 (can indicate the customer placed their first order on the first day they joined the dataset/didnt do an order yet compare with min first order)

# further analysis:
# last_order = 0 
# first_order = last_order (?) 


# CUI_American/Asian: (amount of money spent by the customer on the indicated cuisine):
# On average each customer spent a low amout of money from american/Asian cuisin 
# min = 0 at leat 1 customer didnt buy from american/Asian cuisine
# median = 0 - 50% of customers didnt buy from american/Asian cuisine. Reinforcing that customers are not engaging with american/Asian cuisine
# 25% of customers spend more than 5.66/11.83 in american/Asian cuisin
# max = 280/896 - while most customers spend little on American/Asian cuisine, there are outliers who frequently order or spend heavily in this category



# CUI_Beverages / CUI_Cafe/ CUI_Chicken Dishes/ CUI_Chinese/ CUI_Desserts/ CUI_Healthy/CUI_Indian/CUI_Italian/CUI_Japanese/CUI_Noodle Dishes/CUI_OTHER:
# 75% of customers (most customers) - spent nothing on these type of cuisines
# st dev - high variability in spending.
# max = 229 - there's at least one customer who spent 229 (outlier)
# mean low bcs high spenders are a low fraction of the total dataset. Their spending dont increase the overall avg significance.

# Most of customers spent nothing on these types of cuisines, while a few spent significantly more (high st_dev and max values)

# Further analysis:
# outliers (??)

# DW_0-DW_6: (plot different hours of the day to see peak days)
# - 75% of customers ordered 1 time in that day 
# the max indicate the presence of outliers and suggests the peak 

# most customers only ordered 1 time in each day of the week.

# further analysis:
# max outlier(??)


# Hours of the day
# HR_0 : No activity at midnight 
# HR_1-23: 75% of customers placed no order in these hours. Max can be outlier or unusual behavior.

# Further_analysis:
# HR_0 : No activity at midnight 
# max outlier (??)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
customer_age,31148.0,27.507545,7.161828,15.0,23.0,26.0,31.0,80.0
vendor_count,31875.0,3.102714,2.771753,0.0,1.0,2.0,4.0,41.0
product_count,31875.0,5.668424,6.957933,0.0,2.0,3.0,7.0,269.0
is_chain,31875.0,2.819357,3.977977,0.0,1.0,2.0,3.0,83.0
first_order,31769.0,28.469924,24.104626,0.0,7.0,22.0,45.0,90.0
last_order,31875.0,63.672376,23.227992,0.0,49.0,70.0,83.0,90.0
CUI_American,31875.0,4.877205,11.647043,0.0,0.0,0.0,5.66,280.21
CUI_Asian,31875.0,9.955306,23.561492,0.0,0.0,0.0,11.83,896.71
CUI_Beverages,31875.0,2.298224,8.475868,0.0,0.0,0.0,0.0,229.22
CUI_Cafe,31875.0,0.80149,6.428422,0.0,0.0,0.0,0.0,326.1


### Descriptives - categorical Data

In [147]:
df.describe(include="object").T

# Customer_region:
# Customers are from 8 different regions
# Most customers are located in region 8670
# Missing values 

# last_promo: 
# There are promotions in 3 different categories 
# Most customers use promotions in the delivery category
# missing values

# payment_method: 
# There are 3 different payment methods used by customers
# Most customers use Card as their preferred payment method

Unnamed: 0,count,unique,top,freq
customer_id,31875,31875,1b8f824d5e,1
customer_region,31433,8,8670,9761
last_promo,15131,3,DELIVERY,6282
payment_method,31875,3,CARD,20153


In [148]:
df["customer_region"][df["customer_region"].notna()].unique()
# Different regions where customers are located

array(['2360', '8670', '4660', '2440', '4140', '2490', '8370', '8550'],
      dtype=object)

In [149]:
df["last_promo"][df["last_promo"].notna()].unique()
# Different types of discounts used by customers

array(['DELIVERY', 'DISCOUNT', 'FREEBIE'], dtype=object)

In [150]:
df["payment_method"].unique()
# Different types of payment methods used by customers

array(['DIGI', 'CASH', 'CARD'], dtype=object)

### Data interpretation - NUMERICAL FEATURES

### Strange Values

##### -> Is_chain metadata - wrongly defined

In [151]:
df["is_chain"].unique()
# is_chain should be binary (metadata wrongly defined - assume: Nº of orders on chain restaurants)
# chain restaurant number id (??)

array([ 1,  2,  0,  3,  5,  4,  7, 12,  6, 23,  8, 11, 20, 14,  9, 10, 16,
       15, 13, 25, 17, 27, 30, 32, 24, 18, 26, 28, 22, 40, 31, 39, 21, 37,
       19, 33, 65, 38, 29, 45, 47, 73, 81, 56, 35, 46, 67, 44, 61, 34, 36,
       49, 83, 75, 43, 42, 48, 63, 54, 80], dtype=int64)

#### -> Vendor count (nº unique vendors customer has ordered from) & product_count: Total number of products the customer has ordered. only products or unique products)

###### VENDOR_COUNT = 0  => PRODUCT_COUNT = 0 & FIRST_ORDER = LAST_ORDER

In [152]:
# unlock vendor count = 0: #strange value 
vendor_count_zero = df.loc[df["vendor_count"]==0, ["customer_id", "vendor_count", "product_count", "first_order", "last_order"]]
vendor_count_zero


Unnamed: 0,customer_id,vendor_count,product_count,first_order,last_order
1449,4903041977,0,0,1,1
1476,c94b288475,0,0,1,1
1488,f687717dc1,0,0,1,1
2486,c6cf0b76fb,0,0,2,2
3391,1b7c34738e,0,0,3,3
3405,5ccdf6c889,0,0,3,3
6201,eff3f98046,0,0,6,6
7123,376f896388,0,0,7,7
7152,8a7b681c19,0,0,7,7
7166,a2b54d0827,0,0,7,7


In [153]:
vendor_count_zero.count()
# total amount of customers with vendor_count = 0

customer_id      138
vendor_count     138
product_count    138
first_order      138
last_order       138
dtype: int64

In [154]:
all_conditions = df.loc[(df["vendor_count"]==0)  & (df["vendor_count"]==0) & (df["first_order"]==df["last_order"]), ["customer_id", "vendor_count", "product_count", "first_order", "last_order"]]
all_conditions

# all customers with vendor_count = 0 have product_count= 0, first_order= last_order

Unnamed: 0,customer_id,vendor_count,product_count,first_order,last_order
1449,4903041977,0,0,1,1
1476,c94b288475,0,0,1,1
1488,f687717dc1,0,0,1,1
2486,c6cf0b76fb,0,0,2,2
3391,1b7c34738e,0,0,3,3
3405,5ccdf6c889,0,0,3,3
6201,eff3f98046,0,0,6,6
7123,376f896388,0,0,7,7
7152,8a7b681c19,0,0,7,7
7166,a2b54d0827,0,0,7,7


In [155]:
all_conditions.count()

customer_id      138
vendor_count     138
product_count    138
first_order      138
last_order       138
dtype: int64

###### PRODUCT COUNT = 0 => FIRST_ORDER = LAST_ORDER & VENDOR_COUNT = 0 OR 1

In [156]:
product_count_zero = df.loc[df["product_count"]==0, ["customer_id", "vendor_count", "product_count", "first_order", "last_order"]]
product_count_zero
# when product_count = 0, vendor_count = 0 or vendor_count = 1, first_order = last_order (code this)

Unnamed: 0,customer_id,vendor_count,product_count,first_order,last_order
1449,4903041977,0,0,1,1
1476,c94b288475,0,0,1,1
1488,f687717dc1,0,0,1,1
2486,c6cf0b76fb,0,0,2,2
3391,1b7c34738e,0,0,3,3
3405,5ccdf6c889,0,0,3,3
6180,aed85972bb,1,0,6,6
6183,b2ebe2e6e0,1,0,6,6
6201,eff3f98046,0,0,6,6
7123,376f896388,0,0,7,7


In [157]:
product_count_zero.count()
# total amount of customers with product_count = 0

customer_id      156
vendor_count     156
product_count    156
first_order      156
last_order       156
dtype: int64

In [158]:
prdct_count_zero_frst_last_order = df.loc[(df["product_count"]==0) & (df["first_order"]==df["last_order"]) , ["customer_id", "vendor_count", "product_count", "first_order", "last_order"]]
prdct_count_zero_frst_last_order

Unnamed: 0,customer_id,vendor_count,product_count,first_order,last_order
1449,4903041977,0,0,1,1
1476,c94b288475,0,0,1,1
1488,f687717dc1,0,0,1,1
2486,c6cf0b76fb,0,0,2,2
3391,1b7c34738e,0,0,3,3
3405,5ccdf6c889,0,0,3,3
6180,aed85972bb,1,0,6,6
6183,b2ebe2e6e0,1,0,6,6
6201,eff3f98046,0,0,6,6
7123,376f896388,0,0,7,7


In [159]:
prdct_count_zero_frst_last_order.count()

customer_id      156
vendor_count     156
product_count    156
first_order      156
last_order       156
dtype: int64

In [160]:
all_condit = df.loc[(df["product_count"]==0) & (df["first_order"]==df["last_order"]) & ((df["vendor_count"]==0)|(df["vendor_count"]==1)) , ["customer_id", "vendor_count", "product_count", "first_order", "last_order"]]
all_condit

Unnamed: 0,customer_id,vendor_count,product_count,first_order,last_order
1449,4903041977,0,0,1,1
1476,c94b288475,0,0,1,1
1488,f687717dc1,0,0,1,1
2486,c6cf0b76fb,0,0,2,2
3391,1b7c34738e,0,0,3,3
3405,5ccdf6c889,0,0,3,3
6180,aed85972bb,1,0,6,6
6183,b2ebe2e6e0,1,0,6,6
6201,eff3f98046,0,0,6,6
7123,376f896388,0,0,7,7


In [161]:
all_condit.count()
# total amount of customers with product_count = 0 & first_order = last order & vendor_count = 0 or 1

customer_id      156
vendor_count     156
product_count    156
first_order      156
last_order       156
dtype: int64

In [162]:
equal_first_last_orders = df.loc[df["first_order"]==df["last_order"], ["customer_id", "vendor_count", "product_count", "first_order", "last_order"]]
equal_first_last_orders.count()

# number of customers where first_order=last_order 

customer_id      7187
vendor_count     7187
product_count    7187
first_order      7187
last_order       7187
dtype: int64

### Aggregating Data

In [163]:
numerical_columns_totals = df.sum(numeric_only=True, axis=0)
numerical_columns_totals
# no order placed on HR_0 

customer_age                 856805.0
vendor_count                  98899.0
product_count                180681.0
is_chain                      89867.0
first_order                  904461.0
last_order                  2029557.0
CUI_American                 155460.9
CUI_Asian                   317325.38
CUI_Beverages                 73255.9
CUI_Cafe                     25547.48
CUI_Chicken Dishes           24493.05
CUI_Chinese                  45638.67
CUI_Desserts                 28200.43
CUI_Healthy                  30300.07
CUI_Indian                    52014.2
CUI_Italian                  103107.0
CUI_Japanese                 95498.48
CUI_Noodle Dishes            22693.92
CUI_OTHER                    95661.23
CUI_Street Food / Snacks    124643.71
CUI_Thai                     26840.02
DOW_0                         17720.0
DOW_1                         18091.0
DOW_2                         18836.0
DOW_3                         19743.0
DOW_4                         21607.0
DOW_5       

In [164]:
total_number_orders = numerical_columns_totals[["DOW_0", "DOW_1", "DOW_2", "DOW_3", "DOW_4", "DOW_5", "DOW_6"]].sum()
total_number_orders

# total number of orders

139263.0

In [165]:
prop_chain_over_total = ((numerical_columns_totals["is_chain"].sum())/total_number_orders*100).round(2)
prop_chain_over_total

# percentage of purchases in chain restaurants 

64.53

In [166]:
total_last_promo = df.groupby("last_promo").size()
total_last_promo

last_promo
DELIVERY    6282
DISCOUNT    4496
FREEBIE     4353
dtype: int64

In [167]:
total_pay_method = df.groupby("customer_region").size()
total_pay_method

customer_region
2360    8829
2440    1483
2490     445
4140     857
4660    9550
8370     495
8550      13
8670    9761
dtype: int64