In [15]:
from pandas import read_csv
from scripts.dataset_preprocessing import (
    binning_date_by_period,
    label_encode_dataframe,
    drop_outliers,
    drop_shared_outliers
)

In [16]:
dataframe = read_csv("../data/raw/marketing_campaign.csv")
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   i

### UNDEBATTABLE ACTIONS
As we've seen with the exploratory anaylsis, some feature are useless because of the unique values, or useless in a model.  
The **Dt_Customer** is interesting to be plot, however in a model the multiple categories are irelevant.  
The others categoricals values will be either LabelEncode or OneHotEncode, we coul LabelEncode it in a first place.  
We are also going to create merged features.
***

In [17]:
dataframe.drop(columns=["ID", "Z_Revenue", "Z_CostContact"], inplace=True)
dataframe["Dt_Customer"] = dataframe["Dt_Customer"].apply(
    binning_date_by_period, args=("Year",)
)

In [18]:
categorical_columns = ["Education", "Marital_Status", "Dt_Customer"]
dataframe = label_encode_dataframe(dataframe, categorical_columns)

In [19]:
dataframe["Purchases"] = dataframe["MntWines"]+ dataframe["MntFruits"]+ dataframe["MntMeatProducts"]+ dataframe["MntFishProducts"]+ dataframe["MntSweetProducts"]+ dataframe["MntGoldProds"]
dataframe["Is_Alone"]=dataframe["Marital_Status"].replace({"Married":False, "Together":False, "Absurd":True, "Widow":True, "YOLO":True, "Divorced":True, "Single":True,})
dataframe["Education"]=dataframe["Education"].replace({"Basic":"Undergraduate","2n Cycle":"Undergraduate", "Graduation":"Graduate", "Master":"Postgraduate", "PhD":"Postgraduate"})
dataframe["Children"]=dataframe["Kidhome"]+dataframe["Teenhome"]
dataframe["Family_Size"] = dataframe["Is_Alone"].replace({False: 1, True:2}) + dataframe["Children"]
dataframe["Is_Parent"] = dataframe['Children'] > 0
dataframe["Offers"] = dataframe["AcceptedCmp1"]+ dataframe["AcceptedCmp2"]+ dataframe["AcceptedCmp3"]+ dataframe["AcceptedCmp4"]+ dataframe["AcceptedCmp5"]


### DEBATTABLE ACTIONS  
The remaining clean to do is about : outliers & missing values.  
In our case, we'll create 3 datasets to 3 different handles : 
- Dropping missing values & dropping outliers
- Filling missing values with median & dropping outliers
- Filling missing values with median (with outliers)
***

In [20]:
outliers_columns = [
    "Year_Birth",
    "Income",
    "MntWines",
    "MntFruits",
    "MntMeatProducts",
    "MntFishProducts",
    "MntSweetProducts",
    "MntGoldProds",
    "NumDealsPurchases",
    "NumWebPurchases",
    "NumCatalogPurchases",
    "NumWebVisitsMonth",
    "Children",
    "Purchases",
    "Family_Size",
    "Offers"
]

In [21]:
# Drop NA & outliers
dopna_outliers_data = dataframe.copy()

dopna_outliers_data.dropna(inplace=True)
dopna_outliers_data = drop_shared_outliers(dopna_outliers_data, outliers_columns, 90)

dopna_outliers_data.to_csv("../data/cleaned/marketing_campaign_dropna_no_outliers.csv", index=False)

Data points considered outliers for the column Year_Birth:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
192,1900,0,2,36640.0,1,0,1,99,15,6,...,0,0,1,0,65,2,1,3,True,0
239,1893,0,4,60182.0,0,1,2,23,8,0,...,0,0,0,0,22,4,1,5,True,0
339,1899,4,5,83532.0,0,0,1,36,755,144,...,0,0,0,0,1853,5,0,5,False,1


Data points considered outliers for the column Income:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
2233,1977,2,5,666666.0,1,0,1,23,9,14,...,0,0,0,0,62,5,1,6,True,0


Data points considered outliers for the column MntWines:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column MntFruits:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column MntMeatProducts:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
21,1979,2,3,2447.0,1,0,1,42,1,1,...,0,0,0,0,1730,3,1,4,True,0
164,1973,4,3,157243.0,0,1,2,98,20,2,...,0,0,0,0,1608,3,1,4,True,0
687,1982,4,3,160803.0,0,0,0,21,55,16,...,0,0,0,0,1717,3,0,3,False,0
1653,1977,2,5,157146.0,0,0,1,13,1,0,...,0,0,0,0,1730,5,0,5,False,0


Data points considered outliers for the column MntFishProducts:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column MntSweetProducts:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
1898,1945,4,4,113734.0,0,0,2,9,6,2,...,0,0,0,0,277,4,0,4,False,0


Data points considered outliers for the column MntGoldProds:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
1975,1969,2,3,4428.0,0,1,1,0,16,4,...,0,0,0,0,359,3,1,4,True,0


Data points considered outliers for the column NumDealsPurchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
21,1979,2,3,2447.0,1,0,1,42,1,1,...,0,0,0,0,1730,3,1,4,True,0
164,1973,4,3,157243.0,0,1,2,98,20,2,...,0,0,0,0,1608,3,1,4,True,0
287,1956,3,5,50898.0,1,1,1,88,285,28,...,0,0,0,0,859,5,2,7,True,0
432,1967,0,5,67309.0,1,1,1,76,515,47,...,0,0,0,0,1082,5,2,7,True,0
687,1982,4,3,160803.0,0,0,0,21,55,16,...,0,0,0,0,1717,3,0,3,False,0
1042,1991,2,4,8028.0,0,0,0,62,73,18,...,0,0,0,0,178,4,0,4,False,0
1147,1956,2,5,54450.0,1,1,0,0,454,0,...,0,0,0,0,684,5,2,7,True,0
1161,1956,2,5,54450.0,1,1,0,0,454,0,...,0,0,0,0,684,5,2,7,True,0
1245,1971,2,2,1730.0,0,0,2,65,1,1,...,0,0,0,0,8,2,0,2,False,0
1503,1973,3,3,54108.0,1,1,0,74,539,6,...,0,0,0,0,747,3,2,5,True,0


Data points considered outliers for the column NumWebPurchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
1806,1966,4,4,7144.0,0,2,1,92,81,4,...,0,0,0,0,416,4,2,6,True,0
1898,1945,4,4,113734.0,0,0,2,9,6,2,...,0,0,0,0,277,4,0,4,False,0
1975,1969,2,3,4428.0,0,1,1,0,16,4,...,0,0,0,0,359,3,1,4,True,0


Data points considered outliers for the column NumCatalogPurchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
21,1979,2,3,2447.0,1,0,1,42,1,1,...,0,0,0,0,1730,3,1,4,True,0
164,1973,4,3,157243.0,0,1,2,98,20,2,...,0,0,0,0,1608,3,1,4,True,0
687,1982,4,3,160803.0,0,0,0,21,55,16,...,0,0,0,0,1717,3,0,3,False,0
1653,1977,2,5,157146.0,0,0,1,13,1,0,...,0,0,0,0,1730,5,0,5,False,0


Data points considered outliers for the column NumWebVisitsMonth:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
9,1950,4,5,5648.0,1,1,2,68,28,0,...,0,0,0,0,49,5,2,7,True,1
774,1957,4,5,6835.0,0,1,0,76,107,2,...,0,0,0,0,137,5,1,6,True,0
1042,1991,2,4,8028.0,0,0,0,62,73,18,...,0,0,0,0,178,4,0,4,False,0
1245,1971,2,2,1730.0,0,0,2,65,1,1,...,0,0,0,0,8,2,0,2,False,0
1846,1963,4,3,4023.0,1,1,2,29,5,0,...,0,0,0,0,9,3,2,5,True,0


Data points considered outliers for the column Children:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column Purchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column Family_Size:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column Offers:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
124,1983,2,4,101970.0,0,0,1,69,722,27,...,1,0,0,1,1135,4,0,4,False,3
203,1977,4,5,102160.0,0,0,0,54,763,29,...,1,0,0,1,1240,5,0,5,False,3
246,1972,2,3,80134.0,1,0,1,40,1218,16,...,1,0,0,1,1690,3,1,4,True,3
252,1974,2,2,102692.0,0,0,1,5,168,148,...,1,1,0,1,1112,2,0,2,False,4
336,1968,2,2,75693.0,0,0,0,10,797,153,...,1,0,0,1,1442,2,0,2,False,3
417,1994,2,5,80134.0,0,0,2,11,966,26,...,1,1,0,0,1378,5,0,5,False,4
426,1986,2,3,92910.0,0,0,2,42,551,137,...,1,0,0,0,1795,3,0,3,False,3
430,1961,4,4,84865.0,0,0,1,1,1248,16,...,1,1,0,1,1688,4,0,4,False,4
559,1959,2,5,87771.0,0,1,1,61,1492,38,...,1,1,0,1,1957,5,1,6,True,4
575,1977,4,3,61996.0,0,1,1,27,1050,12,...,1,1,0,1,1230,3,1,4,True,3


In [22]:
# Fill NA & drop outliers


fillna_outliers_data = dataframe.copy()

fillna_outliers_data["Income"].fillna(
    fillna_outliers_data["Income"].median(), inplace=True
)
fillna_outliers_data = drop_shared_outliers(fillna_outliers_data, outliers_columns, 90)

fillna_outliers_data.to_csv("../data/cleaned/marketing_campaign_fillna_no_outliers.csv", index=False)

Data points considered outliers for the column Year_Birth:


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  fillna_outliers_data["Income"].fillna(


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
192,1900,0,2,36640.0,1,0,1,99,15,6,...,0,0,1,0,65,2,1,3,True,0
239,1893,0,4,60182.0,0,1,2,23,8,0,...,0,0,0,0,22,4,1,5,True,0
339,1899,4,5,83532.0,0,0,1,36,755,144,...,0,0,0,0,1853,5,0,5,False,1


Data points considered outliers for the column Income:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
2233,1977,2,5,666666.0,1,0,1,23,9,14,...,0,0,0,0,62,5,1,6,True,0


Data points considered outliers for the column MntWines:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column MntFruits:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column MntMeatProducts:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
21,1979,2,3,2447.0,1,0,1,42,1,1,...,0,0,0,0,1730,3,1,4,True,0
164,1973,4,3,157243.0,0,1,2,98,20,2,...,0,0,0,0,1608,3,1,4,True,0
687,1982,4,3,160803.0,0,0,0,21,55,16,...,0,0,0,0,1717,3,0,3,False,0
1653,1977,2,5,157146.0,0,0,1,13,1,0,...,0,0,0,0,1730,5,0,5,False,0
2228,1978,0,5,51381.5,0,0,0,53,32,2,...,0,0,0,0,1679,5,0,5,False,1


Data points considered outliers for the column MntFishProducts:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column MntSweetProducts:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
27,1986,2,4,51381.5,1,0,1,19,5,1,...,0,0,0,0,637,4,1,5,True,0
1898,1945,4,4,113734.0,0,0,2,9,6,2,...,0,0,0,0,277,4,0,4,False,0


Data points considered outliers for the column MntGoldProds:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
27,1986,2,4,51381.5,1,0,1,19,5,1,...,0,0,0,0,637,4,1,5,True,0
1975,1969,2,3,4428.0,0,1,1,0,16,4,...,0,0,0,0,359,3,1,4,True,0


Data points considered outliers for the column NumDealsPurchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
21,1979,2,3,2447.0,1,0,1,42,1,1,...,0,0,0,0,1730,3,1,4,True,0
90,1957,4,3,51381.5,2,1,0,4,230,42,...,0,0,0,0,603,3,3,6,True,0
164,1973,4,3,157243.0,0,1,2,98,20,2,...,0,0,0,0,1608,3,1,4,True,0
287,1956,3,5,50898.0,1,1,1,88,285,28,...,0,0,0,0,859,5,2,7,True,0
432,1967,0,5,67309.0,1,1,1,76,515,47,...,0,0,0,0,1082,5,2,7,True,0
687,1982,4,3,160803.0,0,0,0,21,55,16,...,0,0,0,0,1717,3,0,3,False,0
1042,1991,2,4,8028.0,0,0,0,62,73,18,...,0,0,0,0,178,4,0,4,False,0
1147,1956,2,5,54450.0,1,1,0,0,454,0,...,0,0,0,0,684,5,2,7,True,0
1161,1956,2,5,54450.0,1,1,0,0,454,0,...,0,0,0,0,684,5,2,7,True,0
1245,1971,2,2,1730.0,0,0,2,65,1,1,...,0,0,0,0,8,2,0,2,False,0


Data points considered outliers for the column NumWebPurchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
27,1986,2,4,51381.5,1,0,1,19,5,1,...,0,0,0,0,637,4,1,5,True,0
1806,1966,4,4,7144.0,0,2,1,92,81,4,...,0,0,0,0,416,4,2,6,True,0
1898,1945,4,4,113734.0,0,0,2,9,6,2,...,0,0,0,0,277,4,0,4,False,0
1975,1969,2,3,4428.0,0,1,1,0,16,4,...,0,0,0,0,359,3,1,4,True,0


Data points considered outliers for the column NumCatalogPurchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
21,1979,2,3,2447.0,1,0,1,42,1,1,...,0,0,0,0,1730,3,1,4,True,0
164,1973,4,3,157243.0,0,1,2,98,20,2,...,0,0,0,0,1608,3,1,4,True,0
687,1982,4,3,160803.0,0,0,0,21,55,16,...,0,0,0,0,1717,3,0,3,False,0
1653,1977,2,5,157146.0,0,0,1,13,1,0,...,0,0,0,0,1730,5,0,5,False,0


Data points considered outliers for the column NumWebVisitsMonth:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
9,1950,4,5,5648.0,1,1,2,68,28,0,...,0,0,0,0,49,5,2,7,True,1
774,1957,4,5,6835.0,0,1,0,76,107,2,...,0,0,0,0,137,5,1,6,True,0
1042,1991,2,4,8028.0,0,0,0,62,73,18,...,0,0,0,0,178,4,0,4,False,0
1245,1971,2,2,1730.0,0,0,2,65,1,1,...,0,0,0,0,8,2,0,2,False,0
1846,1963,4,3,4023.0,1,1,2,29,5,0,...,0,0,0,0,9,3,2,5,True,0


Data points considered outliers for the column Children:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column Purchases:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column Family_Size:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers


Data points considered outliers for the column Offers:


Unnamed: 0,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,MntFruits,...,AcceptedCmp1,AcceptedCmp2,Complain,Response,Purchases,Is_Alone,Children,Family_Size,Is_Parent,Offers
124,1983,2,4,101970.0,0,0,1,69,722,27,...,1,0,0,1,1135,4,0,4,False,3
203,1977,4,5,102160.0,0,0,0,54,763,29,...,1,0,0,1,1240,5,0,5,False,3
246,1972,2,3,80134.0,1,0,1,40,1218,16,...,1,0,0,1,1690,3,1,4,True,3
252,1974,2,2,102692.0,0,0,1,5,168,148,...,1,1,0,1,1112,2,0,2,False,4
336,1968,2,2,75693.0,0,0,0,10,797,153,...,1,0,0,1,1442,2,0,2,False,3
417,1994,2,5,80134.0,0,0,2,11,966,26,...,1,1,0,0,1378,5,0,5,False,4
426,1986,2,3,92910.0,0,0,2,42,551,137,...,1,0,0,0,1795,3,0,3,False,3
430,1961,4,4,84865.0,0,0,1,1,1248,16,...,1,1,0,1,1688,4,0,4,False,4
559,1959,2,5,87771.0,0,1,1,61,1492,38,...,1,1,0,1,1957,5,1,6,True,4
575,1977,4,3,61996.0,0,1,1,27,1050,12,...,1,1,0,1,1230,3,1,4,True,3


In [23]:
# Fill NA
fillna_data = dataframe.copy()

fillna_data["Income"].fillna(fillna_data["Income"].median(), inplace=True)

fillna_data.to_csv("../data/cleaned/marketing_campaign_fillna.csv", index=False)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  fillna_data["Income"].fillna(fillna_data["Income"].median(), inplace=True)
