# Data Cleaning and Data Wrangling

## Data

Dataset contains socio-demographic and firmographic features about 2.240 
customers.

|Feature |Description|
|--:|---|
|AcceptedCmp1| 1 if customer accepted the offer in the 1st campaign, 0 otherwise|
|AcceptedCmp2| 1 if customer accepted the offer in the 2nd campaign, 0 otherwise|
|AcceptedCmp3| 1 if customer accepted the offer in the 3rd campaign, 0 otherwise|
|AcceptedCmp4| 1 if customer accepted the offer in the 4th campaign, 0 otherwise|
|AcceptedCmp5| 1 if customer accepted the offer in the 5th campaign, 0 otherwise|
|Response (target)| 1 if customer accepted the offer in the last campaign, 0 otherwise|
|Complain| 1 if customer complained in the last 2 years|
|DtCustomer| data of customer's enrollment with the company|
|Education| customer's level of education|
|Marital| customer's marital status|
|Kidhome| number of small children in customer's household|
|Teenhome |number of teenagers in customer's household|
|Income| customer's yearly household income|
|MntFishProducts| amount spent on fish products in the last 2 years|
|MntMeatProducts| amount spent on meat products in the last 2 years|
|MntFruits| amount spent on fruits products in the last 2 years|
|MntSweetProducts| amount spent on sweet products in the last 2 years|
|MntWines| amount spent on wines products in the last 2 years|
|MntGoldProds| amount spent on gold products in the last 2 years|
|NumDealsPurchases| number of purchases made with discount|
|NunCatalogPurchases| number of purchases made using catalog|
|NunStorePurchases| number of purchases made directly in stores|
|NumWebPurchases| number of purchases made through company's web site|
|NumWebVisitsMonth| number of visits to company's web site in the last month|
|Recency|number of days since the last purchase|
|Z_Revenue|revenue from the new gadget|
|Z_CostContact|cost of contact for the sixth campaign|

In [2]:
import numpy as np
import pandas as pd
from pathlib import Path

import warnings
warnings.filterwarnings("ignore")

### Data Exploration

In [3]:
# Storing path
path = Path("../data/ifood_customers.csv")

# Read CSV with pandas
data = pd.read_csv(path)

# showing a sample of the dataset
data.sample(8)

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
410,7301,1982,2n Cycle,Single,23616.0,1,0,2012-11-17,76,4,...,8,0,0,0,0,0,0,3,11,1
1525,2408,1976,Graduation,Married,58597.0,1,1,2012-11-20,20,490,...,7,0,0,0,0,0,0,3,11,0
811,5585,1972,Graduation,Single,21359.0,1,0,2013-04-20,1,12,...,8,0,0,0,0,0,0,3,11,1
16,9736,1980,Graduation,Married,41850.0,1,1,2012-12-24,51,53,...,8,0,0,0,0,0,0,3,11,0
1491,7494,1950,PhD,Divorced,42873.0,1,1,2013-01-21,11,209,...,8,0,0,0,0,0,0,3,11,0
2053,10509,1955,Master,Single,36927.0,1,1,2014-06-02,46,51,...,8,0,0,0,0,0,0,3,11,0
1286,339,1972,PhD,Married,57091.0,0,1,2012-11-23,82,462,...,7,0,1,0,0,0,0,3,11,1
1206,7789,1965,PhD,Married,84618.0,0,0,2013-11-22,96,684,...,2,0,0,1,0,0,0,3,11,0
