# iFood Advanced Data Analyst Case

Este Notebook tem por objetivo oferecer insights baseados nos dados fornecidos pelo case do iFood.







### índice

* Introdução
    * [Objetivos](#objetivos)
    * [Dados](#dados)
* Conhecendo os Dados
    * [Reunindo as Bibliotecas](#bibliotecas)
    * [Carregando os Dados](#carregando)

<a id='objetivos'></a>
## Key Objectives and Deliverables


1. Explore the data – be creative and pay attention to the details. You need to provide the
marketing team a better understanding of the characteristic features of
respondents; How do variables connect with response rates? What other relationships
between variables are interesting for the business? Which actionable can we take out of
the EDA?

2. Propose and describe a customer segmentation based on customers’ behaviors; How
many and which profiles are there in the database? How does segmentation connect to
the campaign's financial return?

3. Create a predictive model which allows the company to maximize the profit of the next
marketing campaign. What is the best metric that correlates with the profitability of the
campaign? Simplicity and awareness of what is going on are preferred over
implementations of complex algorithms which you don’t master.

4. Make a highly effective business presentation: Remember that the case must contain a
presentation that at the same time brings technical strength, insights and actionables,
but communicates with a non-technical audience such as a CMO. Take the audience on a
journey. Help them see the story of success and what it will bring.

<a id='dados'></a>
## The Data

**Feature Description**

    AcceptedCmp1 - 1 if customer accepted the offer in the 1st campaign, 0 otherwise
    AcceptedCmp2 - 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
    AcceptedCmp3 - 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
    AcceptedCmp4 - 1 if customer accepted the offer in the 4th campaign, 0 otherwise
    AcceptedCmp5 - 1 if customer accepted the offer in the 5th campaign, 0 otherwise
    Response (target) - 1 if customer accepted the offer in the last campaign, 0 otherwise
    Complain - 1 if customer complained in the last 2 years
    DtCustomer - data of customer's enrollment with the company
    Education - customer's level of education
    Marital - customer's marital status
    Kidhome - number of small children in customer's household
    Teenhome - number of teenagers in customer's household
    Income - customer's yearly household income
    MntFishProducts - amount spent on fish products in the last 2 years
    MntMeatProducts - amount spent on meat products in the last 2 years
    MntFruits - amount spent on fruits products in the last 2 years
    MntSweetProducts - amount spent on sweet products in the last 2 years
    MntWines - amount spent on wines products in the last 2 years
    MntGoldProds - amount spent on gold products in the last 2 years
    NumDealsPurchases - number of purchases made with discount
    NunCatalogPurchases - number of purchases made using catalog
    NunStorePurchases - number of purchases made directly in stores
    NumWebPurchases - number of purchases made through company's web site
    NumWebVisitsMonth - number of visits to company's web site in the last month
    Recency - number of days since the last purchase

<a id='bibliotecas'></a>
## Reunindo as Bibliotecas

In [158]:
import pandas as pd
import numpy as np
from scipy import stats
from datetime import datetime
from sklearn import preprocessing

<a id='carregando'></a>
## Carregando os Dados

In [113]:
dfCustomers = pd.read_csv('https://raw.githubusercontent.com/ifood/ifood-data-advanced-analytics-test/master/ml_project1_data.csv')

In [114]:
dfCustomers.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,2012-09-04,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,2014-03-08,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,2013-08-21,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,2014-02-10,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,2014-01-19,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,3,11,0


In [115]:
dfCustomers.columns

Index(['ID', 'Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome',
       'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
       'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth',
       'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
       'AcceptedCmp2', 'Complain', 'Z_CostContact', 'Z_Revenue', 'Response'],
      dtype='object')

As colunas abaixo não estavam no dicionário de dados:

- Year_Birth 
- Z_CostContact
- Z_Revenue

Utilizei o describe para verificar os conteúdos dessas colunas

In [116]:
pd.set_option('display.max_columns', None)


In [117]:
dfCustomers.describe()

Unnamed: 0,ID,Year_Birth,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
count,2240.0,2240.0,2216.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0,2240.0
mean,5592.159821,1968.805804,52247.251354,0.444196,0.50625,49.109375,303.935714,26.302232,166.95,37.525446,27.062946,44.021875,2.325,4.084821,2.662054,5.790179,5.316518,0.072768,0.074554,0.072768,0.064286,0.013393,0.009375,3.0,11.0,0.149107
std,3246.662198,11.984069,25173.076661,0.538398,0.544538,28.962453,336.597393,39.773434,225.715373,54.628979,41.280498,52.167439,1.932238,2.778714,2.923101,3.250958,2.426645,0.259813,0.262728,0.259813,0.245316,0.114976,0.096391,0.0,0.0,0.356274
min,0.0,1893.0,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
25%,2828.25,1959.0,35303.0,0.0,0.0,24.0,23.75,1.0,16.0,3.0,1.0,9.0,1.0,2.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
50%,5458.5,1970.0,51381.5,0.0,0.0,49.0,173.5,8.0,67.0,12.0,8.0,24.0,2.0,4.0,2.0,5.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
75%,8427.75,1977.0,68522.0,1.0,1.0,74.0,504.25,33.0,232.0,50.0,33.0,56.0,3.0,6.0,4.0,8.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,11.0,0.0
max,11191.0,1996.0,666666.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,263.0,362.0,15.0,27.0,28.0,13.0,20.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,11.0,1.0


Analisando as colunas `Z_CostContact` e `Z_Revenue` no resultado do describe, pode-se perceber que estas colunas possuem dados contínuos que não afetarão as análises futuras. Podemos excluí-las:

In [118]:
dfCustomers.drop(['Z_CostContact','Z_Revenue'], axis=1, inplace=True)

In [119]:
dfCustomers.columns

Index(['ID', 'Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome',
       'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
       'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth',
       'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
       'AcceptedCmp2', 'Complain', 'Response'],
      dtype='object')

O describe também demonstra que podem existir possíveis outliers em algumas colunas numéricas:

In [120]:
dfCustomerOutlier = dfCustomers[dfCustomers['Income']>600000]
dfCustomers.drop(dfCustomers[dfCustomers['Income']>600000].index, axis=0, inplace=True)
dfCustomerOutlier

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response
2233,9432,1977,Graduation,Together,666666.0,1,0,2013-06-02,23,9,14,18,8,1,12,4,3,1,3,6,0,0,0,0,0,0,0


In [121]:
dfCustomers.describe()

Unnamed: 0,ID,Year_Birth,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response
count,2239.0,2239.0,2215.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0,2239.0
mean,5590.444841,1968.802144,51969.8614,0.443948,0.506476,49.121036,304.067441,26.307727,167.016525,37.538633,27.074587,44.036177,2.324252,4.085306,2.662796,5.791425,5.316213,0.0728,0.074587,0.0728,0.064314,0.013399,0.009379,0.149174
std,3246.372471,11.985494,21526.320095,0.53839,0.544555,28.963662,336.61483,39.781468,225.743829,54.637617,41.286043,52.1747,1.932345,2.77924,2.923542,3.251149,2.427144,0.259867,0.262782,0.259867,0.245367,0.115001,0.096412,0.356339
min,0.0,1893.0,1730.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2827.5,1959.0,35284.0,0.0,0.0,24.0,24.0,1.0,16.0,3.0,1.0,9.0,1.0,2.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,5455.0,1970.0,51373.0,0.0,0.0,49.0,174.0,8.0,67.0,12.0,8.0,24.0,2.0,4.0,2.0,5.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,8423.5,1977.0,68487.0,1.0,1.0,74.0,504.5,33.0,232.0,50.0,33.0,56.0,3.0,6.0,4.0,8.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,11191.0,1996.0,162397.0,2.0,2.0,99.0,1493.0,199.0,1725.0,259.0,263.0,362.0,15.0,27.0,28.0,13.0,20.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


Verificando dados faltosos

In [122]:
dfCustomers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2239 entries, 0 to 2239
Data columns (total 27 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2239 non-null   int64  
 1   Year_Birth           2239 non-null   int64  
 2   Education            2239 non-null   object 
 3   Marital_Status       2239 non-null   object 
 4   Income               2215 non-null   float64
 5   Kidhome              2239 non-null   int64  
 6   Teenhome             2239 non-null   int64  
 7   Dt_Customer          2239 non-null   object 
 8   Recency              2239 non-null   int64  
 9   MntWines             2239 non-null   int64  
 10  MntFruits            2239 non-null   int64  
 11  MntMeatProducts      2239 non-null   int64  
 12  MntFishProducts      2239 non-null   int64  
 13  MntSweetProducts     2239 non-null   int64  
 14  MntGoldProds         2239 non-null   int64  
 15  NumDealsPurchases    2239 non-null   i

In [123]:
# Guardando as linhas com Income NULL para análise posterior:
dfIncomeNull = dfCustomers[dfCustomers['Income'].isna()]
dfCustomers.drop(dfIncomeNull.index, axis=0, inplace=True)
dfIncomeNull

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response
10,1994,1983,Graduation,Married,,1,0,2013-11-15,11,5,5,6,0,2,1,1,1,0,2,7,0,0,0,0,0,0,0
27,5255,1986,Graduation,Single,,1,0,2013-02-20,19,5,1,3,3,263,362,0,27,0,0,1,0,0,0,0,0,0,0
43,7281,1959,PhD,Single,,0,0,2013-11-05,80,81,11,50,3,2,39,1,1,3,4,2,0,0,0,0,0,0,0
48,7244,1951,Graduation,Single,,2,1,2014-01-01,96,48,5,48,6,10,7,3,2,1,4,6,0,0,0,0,0,0,0
58,8557,1982,Graduation,Single,,1,0,2013-06-17,57,11,3,22,2,2,6,2,2,0,3,6,0,0,0,0,0,0,0
71,10629,1973,2n Cycle,Married,,1,0,2012-09-14,25,25,3,43,17,4,17,3,3,0,3,8,0,0,0,0,0,0,0
90,8996,1957,PhD,Married,,2,1,2012-11-19,4,230,42,192,49,37,53,12,7,2,8,9,0,0,0,0,0,0,0
91,9235,1957,Graduation,Single,,1,1,2014-05-27,45,7,0,8,2,0,1,1,1,0,2,7,0,0,0,0,0,0,0
92,5798,1973,Master,Together,,0,0,2013-11-23,87,445,37,359,98,28,18,1,2,4,8,1,0,0,0,0,0,0,0
128,8268,1961,PhD,Married,,0,1,2013-07-11,23,352,0,27,10,0,15,3,6,1,7,6,0,0,0,0,0,0,0


In [124]:
dfCustomers.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response
0,5524,1957,Graduation,Single,58138.0,0,0,2012-09-04,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,1
1,2174,1954,Graduation,Single,46344.0,1,1,2014-03-08,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,0
2,4141,1965,Graduation,Together,71613.0,0,0,2013-08-21,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,0
3,6182,1984,Graduation,Together,26646.0,1,0,2014-02-10,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,0
4,5324,1981,PhD,Married,58293.0,1,0,2014-01-19,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,0


In [125]:
# Alterando as colunas de: ano de nascimento e dt_customer
# transformando em idade e days_customer

In [126]:
# Transformando a coluna em date para realizar as operações
dfCustomers['Dt_Customer']=pd.to_datetime(dfCustomers['Dt_Customer'],format='%Y-%m-%d')

# Obtendo o valor máximo para data
max_date = dfCustomers['Dt_Customer'].max().date()
year = max_date.year

# Calculando a quanto tempo é consumidor
dfCustomers['Days_Customer'] = (max_date - dfCustomers['Dt_Customer'].dt.date).dt.days
dfCustomers['Age'] = (year - dfCustomers['Year_Birth'])

In [127]:
dfCustomers.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response,Days_Customer,Age
0,5524,1957,Graduation,Single,58138.0,0,0,2012-09-04,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,1,663,57
1,2174,1954,Graduation,Single,46344.0,1,1,2014-03-08,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,0,113,60
2,4141,1965,Graduation,Together,71613.0,0,0,2013-08-21,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,0,312,49
3,6182,1984,Graduation,Together,26646.0,1,0,2014-02-10,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,0,139,30
4,5324,1981,PhD,Married,58293.0,1,0,2014-01-19,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,0,161,33


In [128]:
# Deletando as colunas com as datas
dfCustomers.drop(columns=['Year_Birth', 'Dt_Customer'], axis=1, inplace=True)

In [129]:
dfCustomers.head()

Unnamed: 0,ID,Education,Marital_Status,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response,Days_Customer,Age
0,5524,Graduation,Single,58138.0,0,0,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,1,663,57
1,2174,Graduation,Single,46344.0,1,1,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,0,113,60
2,4141,Graduation,Together,71613.0,0,0,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,0,312,49
3,6182,Graduation,Together,26646.0,1,0,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,0,139,30
4,5324,PhD,Married,58293.0,1,0,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,0,161,33


In [130]:
# Criando pivot tables

dfCustomers['Education'].value_counts()

Graduation    1115
PhD           481 
Master        365 
2n Cycle      200 
Basic         54  
Name: Education, dtype: int64

In [131]:
dfCustomers['Marital_Status'].value_counts()

Married     857
Together    572
Single      471
Divorced    232
Widow       76 
Alone       3  
Absurd      2  
YOLO        2  
Name: Marital_Status, dtype: int64

In [137]:
# Tratando respostas aleatórias:

otherMartials = ['Alone', 'Absurd', 'YOLO']
dfotherMartials = dfCustomers[dfCustomers['Marital_Status'].isin(otherMartials)]
dfCustomers.drop(dfotherMartials.index, axis=0, inplace=True)
dfCustomers['Marital_Status'].value_counts()

Married     857
Together    572
Single      471
Divorced    232
Widow       76 
Name: Marital_Status, dtype: int64

In [139]:
education_dummie =  pd.get_dummies(dfCustomers["Education"])
education_dummie.head()

Unnamed: 0,2n Cycle,Basic,Graduation,Master,PhD
0,0,0,1,0,0
1,0,0,1,0,0
2,0,0,1,0,0
3,0,0,1,0,0
4,0,0,0,0,1


In [140]:
education_dummie.rename(columns={
    '2n Cycle':'education-2nCycle', 
    'Basic':'education-Basic', 
    'Graduation':'education-Graduation',
    'Master': 'education-Master',
    'PhD': 'education-PhD'}, inplace=True)
education_dummie.head()

Unnamed: 0,education-2nCycle,education-Basic,education-Graduation,education-Master,education-PhD
0,0,0,1,0,0
1,0,0,1,0,0
2,0,0,1,0,0
3,0,0,1,0,0
4,0,0,0,0,1


In [141]:
marital_dummie =  pd.get_dummies(dfCustomers["Marital_Status"])
marital_dummie.head()

Unnamed: 0,Divorced,Married,Single,Together,Widow
0,0,0,1,0,0
1,0,0,1,0,0
2,0,0,0,1,0
3,0,0,0,1,0
4,0,1,0,0,0


In [142]:
marital_dummie.rename(columns={
    'Divorced':'marital-Divorced', 
    'Married':'marital-Married', 
    'Single':'marital-Single',
    'Together': 'marital-Together',
    'Widow': 'marital-Widow'}, inplace=True)
marital_dummie.head()

Unnamed: 0,marital-Divorced,marital-Married,marital-Single,marital-Together,marital-Widow
0,0,0,1,0,0
1,0,0,1,0,0
2,0,0,0,1,0
3,0,0,0,1,0
4,0,1,0,0,0


In [143]:
dfCustomers = pd.concat([dfCustomers, education_dummie], axis=1)

In [144]:
dfCustomers = pd.concat([dfCustomers, marital_dummie], axis=1)

In [145]:
dfCustomers.head()

Unnamed: 0,ID,Education,Marital_Status,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response,Days_Customer,Age,education-2nCycle,education-Basic,education-Graduation,education-Master,education-PhD,marital-Divorced,marital-Married,marital-Single,marital-Together,marital-Widow
0,5524,Graduation,Single,58138.0,0,0,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,1,663,57,0,0,1,0,0,0,0,1,0,0
1,2174,Graduation,Single,46344.0,1,1,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,0,113,60,0,0,1,0,0,0,0,1,0,0
2,4141,Graduation,Together,71613.0,0,0,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,0,312,49,0,0,1,0,0,0,0,0,1,0
3,6182,Graduation,Together,26646.0,1,0,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,0,139,30,0,0,1,0,0,0,0,0,1,0
4,5324,PhD,Married,58293.0,1,0,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,0,161,33,0,0,0,0,1,0,1,0,0,0


In [146]:
# Agora posso eliminar as colunas ID, Education e Marital Status

In [152]:
dfCustomers.drop(['ID', 'Education', 'Marital_Status'], axis=1, inplace=True)

In [153]:
dfCustomers.head()

Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Response,Days_Customer,Age,education-2nCycle,education-Basic,education-Graduation,education-Master,education-PhD,marital-Divorced,marital-Married,marital-Single,marital-Together,marital-Widow
0,58138.0,0,0,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,1,663,57,0,0,1,0,0,0,0,1,0,0
1,46344.0,1,1,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,0,113,60,0,0,1,0,0,0,0,1,0,0
2,71613.0,0,0,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,0,312,49,0,0,1,0,0,0,0,0,1,0
3,26646.0,1,0,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,0,139,30,0,0,1,0,0,0,0,0,1,0
4,58293.0,1,0,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,0,161,33,0,0,0,0,1,0,1,0,0,0


In [154]:
cols = list(dfCustomers.columns.values)
cols

['Income',
 'Kidhome',
 'Teenhome',
 'Recency',
 'MntWines',
 'MntFruits',
 'MntMeatProducts',
 'MntFishProducts',
 'MntSweetProducts',
 'MntGoldProds',
 'NumDealsPurchases',
 'NumWebPurchases',
 'NumCatalogPurchases',
 'NumStorePurchases',
 'NumWebVisitsMonth',
 'AcceptedCmp3',
 'AcceptedCmp4',
 'AcceptedCmp5',
 'AcceptedCmp1',
 'AcceptedCmp2',
 'Complain',
 'Response',
 'Days_Customer',
 'Age',
 'education-2nCycle',
 'education-Basic',
 'education-Graduation',
 'education-Master',
 'education-PhD',
 'marital-Divorced',
 'marital-Married',
 'marital-Single',
 'marital-Together',
 'marital-Widow']

In [156]:
dfCustomers  = dfCustomers[[
 'Income',
 'Kidhome',
 'Teenhome',
 'Recency',
 'MntWines',
 'MntFruits',
 'MntMeatProducts',
 'MntFishProducts',
 'MntSweetProducts',
 'MntGoldProds',
 'NumDealsPurchases',
 'NumWebPurchases',
 'NumCatalogPurchases',
 'NumStorePurchases',
 'NumWebVisitsMonth',
 'AcceptedCmp3',
 'AcceptedCmp4',
 'AcceptedCmp5',
 'AcceptedCmp1',
 'AcceptedCmp2',
 'Complain',
 'Days_Customer',
 'Age',
 'education-2nCycle',
 'education-Basic',
 'education-Graduation',
 'education-Master',
 'education-PhD',
 'marital-Divorced',
 'marital-Married',
 'marital-Single',
 'marital-Together',
 'marital-Widow',
 'Response'
]]

In [157]:
dfCustomers

Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Days_Customer,Age,education-2nCycle,education-Basic,education-Graduation,education-Master,education-PhD,marital-Divorced,marital-Married,marital-Single,marital-Together,marital-Widow,Response
0,58138.0,0,0,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,663,57,0,0,1,0,0,0,0,1,0,0,1
1,46344.0,1,1,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,113,60,0,0,1,0,0,0,0,1,0,0,0
2,71613.0,0,0,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,312,49,0,0,1,0,0,0,0,0,1,0,0
3,26646.0,1,0,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,139,30,0,0,1,0,0,0,0,0,1,0,0
4,58293.0,1,0,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,161,33,0,0,0,0,1,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2235,61223.0,0,1,46,709,43,182,42,118,247,2,9,3,4,5,0,0,0,0,0,0,381,47,0,0,1,0,0,0,1,0,0,0,0
2236,64014.0,2,1,56,406,0,30,0,0,8,7,8,2,5,7,0,0,0,1,0,0,19,68,0,0,0,0,1,0,0,0,1,0,0
2237,56981.0,0,0,91,908,48,217,32,12,24,1,2,3,13,6,0,1,0,0,0,0,155,33,0,0,1,0,0,1,0,0,0,0,0
2238,69245.0,0,1,8,428,30,214,80,30,61,2,6,5,10,3,0,0,0,0,0,0,156,58,0,0,0,1,0,0,0,0,1,0,0


In [162]:
dfCustomers.iloc[:,:-1]

Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Days_Customer,Age,education-2nCycle,education-Basic,education-Graduation,education-Master,education-PhD,marital-Divorced,marital-Married,marital-Single,marital-Together,marital-Widow
0,58138.0,0,0,58,635,88,546,172,88,88,3,8,10,4,7,0,0,0,0,0,0,663,57,0,0,1,0,0,0,0,1,0,0
1,46344.0,1,1,38,11,1,6,2,1,6,2,1,1,2,5,0,0,0,0,0,0,113,60,0,0,1,0,0,0,0,1,0,0
2,71613.0,0,0,26,426,49,127,111,21,42,1,8,2,10,4,0,0,0,0,0,0,312,49,0,0,1,0,0,0,0,0,1,0
3,26646.0,1,0,26,11,4,20,10,3,5,2,2,0,4,6,0,0,0,0,0,0,139,30,0,0,1,0,0,0,0,0,1,0
4,58293.0,1,0,94,173,43,118,46,27,15,5,5,3,6,5,0,0,0,0,0,0,161,33,0,0,0,0,1,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2235,61223.0,0,1,46,709,43,182,42,118,247,2,9,3,4,5,0,0,0,0,0,0,381,47,0,0,1,0,0,0,1,0,0,0
2236,64014.0,2,1,56,406,0,30,0,0,8,7,8,2,5,7,0,0,0,1,0,0,19,68,0,0,0,0,1,0,0,0,1,0
2237,56981.0,0,0,91,908,48,217,32,12,24,1,2,3,13,6,0,1,0,0,0,0,155,33,0,0,1,0,0,1,0,0,0,0
2238,69245.0,0,1,8,428,30,214,80,30,61,2,6,5,10,3,0,0,0,0,0,0,156,58,0,0,0,1,0,0,0,0,1,0


In [167]:
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(dfCustomers)

In [168]:
df = pd.DataFrame(x_scaled)

In [169]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33
0,0.351086,0.0,0.0,0.585859,0.425318,0.442211,0.316522,0.664093,0.335878,0.274143,0.200000,0.296296,0.357143,0.307692,0.35,0.0,0.0,0.0,0.0,0.0,0.0,0.948498,0.378641,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
1,0.277680,0.5,0.5,0.383838,0.007368,0.005025,0.003478,0.007722,0.003817,0.018692,0.133333,0.037037,0.035714,0.153846,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.161660,0.407767,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,0.434956,0.0,0.0,0.262626,0.285332,0.246231,0.073623,0.428571,0.080153,0.130841,0.066667,0.296296,0.071429,0.769231,0.20,0.0,0.0,0.0,0.0,0.0,0.0,0.446352,0.300971,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,0.155079,0.5,0.0,0.262626,0.007368,0.020101,0.011594,0.038610,0.011450,0.015576,0.133333,0.074074,0.000000,0.307692,0.30,0.0,0.0,0.0,0.0,0.0,0.0,0.198856,0.116505,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,0.352051,0.5,0.0,0.949495,0.115874,0.216080,0.068406,0.177606,0.103053,0.046729,0.333333,0.185185,0.107143,0.461538,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.230329,0.145631,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2203,0.370288,0.0,0.5,0.464646,0.474883,0.216080,0.105507,0.162162,0.450382,0.769470,0.133333,0.333333,0.107143,0.307692,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.545064,0.281553,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
2204,0.387659,1.0,0.5,0.565657,0.271936,0.000000,0.017391,0.000000,0.000000,0.024922,0.466667,0.296296,0.071429,0.384615,0.35,0.0,0.0,0.0,1.0,0.0,0.0,0.027182,0.485437,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
2205,0.343885,0.0,0.0,0.919192,0.608171,0.241206,0.125797,0.123552,0.045802,0.074766,0.066667,0.074074,0.107143,1.000000,0.30,0.0,1.0,0.0,0.0,0.0,0.0,0.221745,0.145631,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2206,0.420217,0.0,0.5,0.080808,0.286671,0.150754,0.124058,0.308880,0.114504,0.190031,0.133333,0.222222,0.178571,0.769231,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.223176,0.388350,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
