# iFood Advanced Data Analyst Case

Este Notebook tem por objetivo oferecer insights baseados nos dados fornecidos pelo case do iFood.







### índice

* Introdução
    * [Objetivos](#objetivos)
    * [Dados](#dados)
* Conhecendo os Dados
    * [Reunindo as Bibliotecas](#bibliotecas)
    * [Carregando os Dados](#carregando)

<a id='objetivos'></a>
## Key Objectives and Deliverables


1. Explore the data – be creative and pay attention to the details. You need to provide the
marketing team a better understanding of the characteristic features of
respondents; How do variables connect with response rates? What other relationships
between variables are interesting for the business? Which actionable can we take out of
the EDA?

2. Propose and describe a customer segmentation based on customers’ behaviors; How
many and which profiles are there in the database? How does segmentation connect to
the campaign's financial return?

3. Create a predictive model which allows the company to maximize the profit of the next
marketing campaign. What is the best metric that correlates with the profitability of the
campaign? Simplicity and awareness of what is going on are preferred over
implementations of complex algorithms which you don’t master.

4. Make a highly effective business presentation: Remember that the case must contain a
presentation that at the same time brings technical strength, insights and actionables,
but communicates with a non-technical audience such as a CMO. Take the audience on a
journey. Help them see the story of success and what it will bring.

<a id='dados'></a>
## The Data

**Feature Description**

    AcceptedCmp1 - 1 if customer accepted the offer in the 1st campaign, 0 otherwise
    AcceptedCmp2 - 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
    AcceptedCmp3 - 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
    AcceptedCmp4 - 1 if customer accepted the offer in the 4th campaign, 0 otherwise
    AcceptedCmp5 - 1 if customer accepted the offer in the 5th campaign, 0 otherwise
    Response (target) - 1 if customer accepted the offer in the last campaign, 0 otherwise
    Complain - 1 if customer complained in the last 2 years
    DtCustomer - data of customer's enrollment with the company
    Education - customer's level of education
    Marital - customer's marital status
    Kidhome - number of small children in customer's household
    Teenhome - number of teenagers in customer's household
    Income - customer's yearly household income
    MntFishProducts - amount spent on fish products in the last 2 years
    MntMeatProducts - amount spent on meat products in the last 2 years
    MntFruits - amount spent on fruits products in the last 2 years
    MntSweetProducts - amount spent on sweet products in the last 2 years
    MntWines - amount spent on wines products in the last 2 years
    MntGoldProds - amount spent on gold products in the last 2 years
    NumDealsPurchases - number of purchases made with discount
    NunCatalogPurchases - number of purchases made using catalog
    NunStorePurchases - number of purchases made directly in stores
    NumWebPurchases - number of purchases made through company's web site
    NumWebVisitsMonth - number of visits to company's web site in the last month
    Recency - number of days since the last purchase

<a id='bibliotecas'></a>
## Reunindo as Bibliotecas

In [6]:
import pandas as pd

<a id='carregando'></a>
## Carregando os Dados

In [7]:
dfCustomers = pd.read_csv('https://raw.githubusercontent.com/ifood/ifood-data-advanced-analytics-test/master/ml_project1_data.csv')

In [8]:
dfCustomers.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,2012-09-04,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,2014-03-08,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,2013-08-21,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,2014-02-10,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,2014-01-19,94,173,...,5,0,0,0,0,0,0,3,11,0


In [10]:
dfCustomers.columns

Index(['ID', 'Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome',
       'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
       'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth',
       'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
       'AcceptedCmp2', 'Complain', 'Z_CostContact', 'Z_Revenue', 'Response'],
      dtype='object')

As colunas abaixo não estavam no dicionário de dados:

- Year_Birth 
- Z_CostContact
- Z_Revenue

Utilizei o describe para verificar os conteúdos dessas colunas

In [16]:
dfCustomers.describe()

Unnamed: 0,ID,Year_Birth,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
count,2240.0,2240.0,2216.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,...,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0
mean,5592.159821,1968.805804,52247.251354,0.444196,0.50625,49.109375,303.935714,26.302232,166.95,37.525446,...,5.316518,0.072768,0.074554,0.072768,0.064286,0.013393,0.009375,3.0,11.0,0.149107
std,3246.662198,11.984069,25173.076661,0.538398,0.544538,28.962453,336.597393,39.773434,225.715373,54.628979,...,2.426645,0.259813,0.262728,0.259813,0.245316,0.114976,0.096391,0.0,0.0,0.356274
min,0.0,1893.0,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
25%,2828.25,1959.0,35303.0,0.0,0.0,24.0,23.75,1.0,16.0,3.0,...,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
50%,5458.5,1970.0,51381.5,0.0,0.0,49.0,173.5,8.0,67.0,12.0,...,6.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
75%,8427.75,1977.0,68522.0,1.0,1.0,74.0,504.25,33.0,232.0,50.0,...,7.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
max,11191.0,1996.0,666666.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,...,20.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,11.0,1.0


Analisando as colunas `Z_CostContact` e `Z_Revenue` no resultado do describe, pode-se perceber que estas possuem dados contínuos que não afetarão as análises futuras. Podemos excluí-las:

In [17]:
dfCustomers.drop(['Z_CostContact','Z_Revenue'], axis=1, inplace=True)

In [19]:
dfCustomers.columns

Index(['ID', 'Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome',
       'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
       'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth',
       'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
       'AcceptedCmp2', 'Complain', 'Response'],
      dtype='object')

Verificando dados faltosos

In [15]:
dfCustomers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   i