# Ifood Exploratory and Descriptive Analysis Challenge

This challenge was created by the [Statistics course from Renata Biaggi](www.renatabiaggi.com).

[Ifood](https://institucional.ifood.com.br/ifood/) is a Brazilian Company that connects customers, restaurants and delivery people.



In [15]:
import pandas as pd
import numpy as np
import seaborn as sns

# versions packages
print('Pandas Version -> %s' % pd.__version__)
print('Numpy Version -> %s' % np.__version__)
print('Seaborn Version -> %s' % sns.__version__)

Pandas Version -> 1.3.4
Numpy Version -> 1.20.3
Seaborn Version -> 0.11.2


## Exploring the data

The dataset consists of data from customers of Ifood company.

### Open dataset

In [16]:
df_ifood = pd.read_csv('./mkt_data.csv')
df_ifood.head(5)

Unnamed: 0.1,Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,...,education_Graduation,education_Master,education_PhD,MntTotal,MntRegularProds,AcceptedCmpOverall,marital_status,education_level,kids,expenses
0,0,58138.0,0,0,58,635,88,546,172,88,...,3.0,,,1529,1441,0,Single,Graduation,0,1529
1,1,46344.0,1,1,38,11,1,6,2,1,...,3.0,,,21,15,0,Single,Graduation,2,21
2,2,71613.0,0,0,26,426,49,127,111,21,...,3.0,,,734,692,0,Together,Graduation,0,734
3,3,26646.0,1,0,26,11,4,20,10,3,...,3.0,,,48,43,0,Together,Graduation,1,48
4,4,58293.0,1,0,94,173,43,118,46,27,...,,,5.0,407,392,0,Married,PhD,1,407


In [17]:
df_ifood.shape

(2205, 44)

The dataset has 2205 rows and 44 columns.

### What are the types of variables in this dataset?

In [18]:
df_ifood.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2205 entries, 0 to 2204
Data columns (total 44 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Unnamed: 0            2205 non-null   int64  
 1   Income                2205 non-null   float64
 2   Kidhome               2205 non-null   int64  
 3   Teenhome              2205 non-null   int64  
 4   Recency               2205 non-null   int64  
 5   MntWines              2205 non-null   int64  
 6   MntFruits             2205 non-null   int64  
 7   MntMeatProducts       2205 non-null   int64  
 8   MntFishProducts       2205 non-null   int64  
 9   MntSweetProducts      2205 non-null   int64  
 10  MntGoldProds          2205 non-null   int64  
 11  NumDealsPurchases     2205 non-null   int64  
 12  NumWebPurchases       2205 non-null   int64  
 13  NumCatalogPurchases   2205 non-null   int64  
 14  NumStorePurchases     2205 non-null   int64  
 15  NumWebVisitsMonth    

The numeric variables are 'int64' or 'float64' types columns. The categoric variables are 'object' type columns.

### Are there duplicate rows in the dataset?

In [19]:
df_ifood.duplicated().sum()

0

There are no duplicate rows.

### Are there null data in the dataset?

In [20]:
df_ifood.isnull().sum()

Unnamed: 0                 0
Income                     0
Kidhome                    0
Teenhome                   0
Recency                    0
MntWines                   0
MntFruits                  0
MntMeatProducts            0
MntFishProducts            0
MntSweetProducts           0
MntGoldProds               0
NumDealsPurchases          0
NumWebPurchases            0
NumCatalogPurchases        0
NumStorePurchases          0
NumWebVisitsMonth          0
AcceptedCmp3               0
AcceptedCmp4               0
AcceptedCmp5               0
AcceptedCmp1               0
AcceptedCmp2               0
Complain                   0
Z_CostContact              0
Z_Revenue                  0
Response                   0
Age                        0
Customer_Days              0
marital_Divorced        1975
marital_Married         1351
marital_Single          1728
marital_Together        1637
marital_Widow           2129
education_2n Cycle      2007
education_Basic         2151
education_Grad

There are null data in columns: 'marital_Divorced', 'marital_Married', 'marital_Single', 'marital_Together', 'marital_Widow', 'education_2n Cycle', 'education_Basic', 'education_Graduation', 'education_Master' and 'education_PhD'. 

They are null values because they do not have the described characteristic. 

Therefore, it is possible to transform them into Boolean columns with value of 0 for those that do not have the characteristic.

In [21]:
null_columns = ["marital_Divorced" 
, "marital_Married"
, "marital_Single"  
, "marital_Together"       
, "marital_Widow"           
, "education_2n Cycle"      
, "education_Basic"       
, "education_Graduation"  
, "education_Master"      
, "education_PhD"]

In [22]:
for item in null_columns:
  df_ifood[item] = np.where(df_ifood[item].isnull(), 0, 1) # if null value replace it with 0, if not 1

### What a mean, median, 25th percentile, 75th percentile, minimum and maximum of each of the numeric columns?

In [23]:
numerics_types = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numeric_columns = df_ifood.select_dtypes(include=numerics_types).columns
df_ifood[numeric_columns].describe()

Unnamed: 0.1,Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,...,education_2n Cycle,education_Basic,education_Graduation,education_Master,education_PhD,MntTotal,MntRegularProds,AcceptedCmpOverall,kids,expenses
count,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,...,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0,2205.0
mean,1102.0,51622.094785,0.442177,0.506576,49.00907,306.164626,26.403175,165.312018,37.756463,27.128345,...,0.089796,0.02449,0.504762,0.165079,0.215873,562.764626,518.707483,0.29932,0.948753,562.764626
std,636.672993,20713.063826,0.537132,0.54438,28.932111,337.493839,39.784484,217.784507,54.824635,41.130468,...,0.285954,0.154599,0.500091,0.371336,0.41152,575.936911,553.847248,0.68044,0.749231,575.936911
min,0.0,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,4.0,-283.0,0.0,0.0,4.0
25%,551.0,35196.0,0.0,0.0,24.0,24.0,2.0,16.0,3.0,1.0,...,0.0,0.0,0.0,0.0,0.0,56.0,42.0,0.0,0.0,56.0
50%,1102.0,51287.0,0.0,0.0,49.0,178.0,8.0,68.0,12.0,8.0,...,0.0,0.0,1.0,0.0,0.0,343.0,288.0,0.0,1.0,343.0
75%,1653.0,68281.0,1.0,1.0,74.0,507.0,33.0,232.0,50.0,34.0,...,0.0,0.0,1.0,0.0,0.0,964.0,884.0,0.0,1.0,964.0
max,2204.0,113734.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,262.0,...,1.0,1.0,1.0,1.0,1.0,2491.0,2458.0,4.0,3.0,2491.0
