### 1. Purpose of this notebook

The purpose of this notebook is to pre process the data, including removing the data not necessary for the analysis, inclusion of new data and some dtypes transformations.

### 2. Read data

#### 2.1 Import Python packages

In [1]:
import pandas as pd
import datetime

from src.paths import DATA

In [140]:
df = pd.read_csv(DATA / 'ml_project1_data_cleaned.csv')

df = (df.assign(Dt_Customer = pd.to_datetime(df['Dt_Customer'])))

### 3. New columns

For our analysis, we need some new columns:

- The age for each customer.
- The total amount spent in the last two years.
- The total number of purchases in the three channels in the last two years.
- The total of accepteds the offer in the first five campaigns.
- The total of accepteds the offer in the first five campaigns and the pilot campaigns.
- The total of year after registration.
- The total children.

#### Age

In [141]:
current_year = datetime.date.today().year

df = (df
      .assign(Age = current_year - df['Year_Birth'])
     )

#### Total amount spent

In [142]:
df = (df
      .assign(MntTotal = df[['MntWines', 'MntFruits','MntMeatProducts',
                              'MntFishProducts', 'MntSweetProducts']]
                          .sum(axis=1))
     )

#### Total purchases

In [143]:
df = (df
      .assign(NumTotalPurchases = df[['NumWebPurchases','NumCatalogPurchases',
                                     'NumStorePurchases',]]
                                  .sum(axis=1))
     )

#### Total accepted in the first five campaigns

In [144]:
df = (df
      .assign(AcceptedTotalFirstFiveCmps = df[['AcceptedCmp3', 'AcceptedCmp4', 
                                               'AcceptedCmp5', 'AcceptedCmp1', 'AcceptedCmp2']]
                                          .sum(axis=1))
     )

#### Total accepted in the first five campaigns + pilot campaign 

In [145]:
df = (df
      .assign(AcceptedTotalFirstFiveCmpsMorePilot = df[['AcceptedTotalFirstFiveCmps', 'Response']]
                          .sum(axis=1))
     )

#### Total year after registration

In [146]:
df = (df
      .assign(TotalYearAfterRegistration = current_year - df.Dt_Customer.dt.year)
     )

#### Total of childrens

In [147]:
df = (df
      .assign(TotalChildren = df['Kidhome'] + df['Teenhome'])
     )

### 4. Remove columns

- The column Z_Revenue is wrong. All the rows containing the same valor: 11. We know that the sucess of the oilot campaign was 15%, not 100%. We created a new column to fix it.

#### Z_Revenue

In [148]:
df = df.drop(columns='Z_Revenue')

### 5. Data transformation

In [149]:
columns = ['AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
           'AcceptedCmp2', 'Complain', 'Response']

df[columns] = df[columns].astype(bool)

In [150]:
df.to_csv(DATA / 'ml_project1_data_pre_processed.csv', index=False)