### Customer Profiling

This activity is meant to give you practice exploring data including the use of visualizations with `matplotlib`, `seaborn`, and `plotly`.  The dataset contains demographic information on the customers, information on customer purchases, engagement of customers with promotions, and information on where customer purchases happened.  A complete data dictionary can be found below.  

Your task is to explore the data and use visualizations to inform answers to specific questions using the data.  The questions and resulting visualization should be posted in the group discussion related to this activity.  Some example problems/questions to explore could be:

-----

- Does income differentiate customers who purchase wine? 
- What customers are more likely to participate in the last promotional campaign?
- Are customers with children more likely to purchase products online?
- Do married people purchase more wine?
- What kinds of purchases led to customer complaints?

-----

### Data Dictionary

Attributes


```
ID: Customer's unique identifier
Year_Birth: Customer's birth year
Education: Customer's education level
Marital_Status: Customer's marital status
Income: Customer's yearly household income
Kidhome: Number of children in customer's household
Teenhome: Number of teenagers in customer's household
Dt_Customer: Date of customer's enrollment with the company
Recency: Number of days since customer's last purchase
Complain: 1 if customer complained in the last 2 years, 0 otherwise


MntWines: Amount spent on wine in last 2 years
MntFruits: Amount spent on fruits in last 2 years
MntMeatProducts: Amount spent on meat in last 2 years
MntFishProducts: Amount spent on fish in last 2 years
MntSweetProducts: Amount spent on sweets in last 2 years
MntGoldProds: Amount spent on gold in last 2 years
Promotion


AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
Response: 1 if customer accepted the offer in the last campaign, 0 otherwise


NumWebPurchases: Number of purchases made through the company’s web site
NumCatalogPurchases: Number of purchases made using a catalogue
NumStorePurchases: Number of purchases made directly in stores
NumWebVisitsMonth: Number of visits to company’s web site in the last month
```

In [1]:
import pandas as pd
import seaborn as sns
import plotly.express as px

In [2]:
df = pd.read_csv('data/marketing_campaign.csv', sep = '\t')

In [3]:
df.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,04-09-2012,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,08-03-2014,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,21-08-2013,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,10-02-2014,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,19-01-2014,94,173,...,5,0,0,0,0,0,0,3,11,0


In [202]:
#df.info()

Post your questions with an accompanying visualization in canvas.  You should generate at least three different questions and resulting visualizations.  Include complete sentence explanations of your interpretations of the visualizations.

In [6]:
df_lite = df[["ID","Year_Birth","Education","Marital_Status","Income","Kidhome","Teenhome","Dt_Customer","Recency","MntWines","MntFruits","MntMeatProducts","MntFishProducts","MntSweetProducts","MntGoldProds","NumDealsPurchases","NumWebPurchases","NumCatalogPurchases","NumStorePurchases","NumWebVisitsMonth","Complain"]]

#Convert birthdate to age and add that column to df_lite
df_lite["Age"] = 2024 - df_lite["Year_Birth"]
df_lite = df_lite[df_lite["Age"]<120]

#Find the total number of kids in each house
df_lite["total_kids"] = df_lite["Kidhome"] + df_lite["Teenhome"]
#print(df_lite)

df_liter = df_lite[["Income","Kidhome","Teenhome","Dt_Customer","MntWines","MntFruits","MntMeatProducts","MntFishProducts","MntSweetProducts","MntGoldProds","NumDealsPurchases","NumWebPurchases","NumCatalogPurchases","NumStorePurchases","NumWebVisitsMonth"]]
df_mnts_and_nums = df_lite[["MntWines","MntFruits","MntMeatProducts","MntFishProducts","MntSweetProducts","MntGoldProds","NumDealsPurchases","NumWebPurchases","NumCatalogPurchases","NumStorePurchases","NumWebVisitsMonth"]]
df_mnts = df_lite[["Age","Income","MntWines","MntFruits","MntMeatProducts","MntFishProducts","MntSweetProducts","MntGoldProds"]]
df_nums = df_lite[["Age","Income","NumDealsPurchases","NumWebPurchases","NumCatalogPurchases","NumStorePurchases"]]


#Find nums and mnts as a percent of their total spend/buy
#num_sums = df_nums.sum(axis=1)
#mnt_sums = df_mnts.sum(axis=1)
#print(num_sums)
df_lite["NumDealsPurchases_Percent"] = df_nums["NumDealsPurchases"] / num_sums
df_lite["NumWebPurchases_Percent"] = df_nums["NumWebPurchases"] / num_sums
df_lite["NumCatalogPurchases_Percent"] = df_nums["NumCatalogPurchases"] / num_sums
df_lite["NumStorePurchases_Percent"] = df_nums["NumStorePurchases"] / num_sums

df_lite["MntWines_Percent"] = df_mnts["MntWines"] / mnt_sums
df_lite["MntFruits_Percent"] = df_mnts["MntFruits"] / mnt_sums
df_lite["MntMeatProducts_Percent"] = df_mnts["MntMeatProducts"] / mnt_sums
df_lite["MntFishProducts_Percent"] = df_mnts["MntFishProducts"] / mnt_sums
df_lite["MntSweetProducts_Percent"] = df_mnts["MntSweetProducts"] / mnt_sums
df_lite["MntGoldProds_Percent"] = df_mnts["MntGoldProds"] / mnt_sums

#print(df_lite)
df_mnts_percent = df_lite[["Age","Income","MntWines_Percent","MntFruits_Percent","MntMeatProducts_Percent","MntFishProducts_Percent","MntSweetProducts_Percent","MntGoldProds_Percent"]]
df_nums_percent = df_lite[["Age","Income","NumDealsPurchases_Percent","NumWebPurchases_Percent","NumCatalogPurchases_Percent","NumStorePurchases_Percent"]]


#print(df_mnts_percent)
#sns.heatmap(df_liter.corr(), annot=True)
#print(df_nums)




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_lite["Age"] = 2024 - df_lite["Year_Birth"]


KeyError: "['MntWines_Percent', 'MntFruits_Percent', 'MntMeatProducts_Percent', 'MntFishProducts_Percent', 'MntSweetProducts_Percent', 'MntGoldProds_Percent'] not in index"

In [200]:
#sns.pairplot(df_mnts_and_nums)

In [225]:
#print(df_mnts)
#df_mnts.plot(kind='bar', stacked=True)
px.bar(df_lite, x = "Age", y=["MntWines","MntFruits","MntMeatProducts","MntFishProducts","MntSweetProducts","MntGoldProds"])
#print(df_lite["Income"])
#sns.scatterplot(data=df_lite, x = "MntWines", y="Income", hue="Age")

#px.bar(df_lite, x = "Age", y=["MntWines_Percent","MntFruits_Percent","MntMeatProducts_Percent","MntFishProducts_Percent","MntSweetProducts_Percent","MntGoldProds_Percent"])


In [238]:
px.bar(df_lite, x = ["NumDealsPurchases","NumWebPurchases","NumCatalogPurchases","NumStorePurchases"], y="Age")

In [231]:
px.bar(df_lite, x = "Age", y=["NumDealsPurchases","NumWebPurchases","NumCatalogPurchases","NumStorePurchases"])

In [284]:
df_lite_0Kids = df_lite[df_lite["total_kids"]==0]
avg_NumWeb_0Kids = df_lite_0Kids["NumWebPurchases"].sum() / 637

df_lite_1Kids = df_lite[df_lite["total_kids"]==1]
avg_NumWeb_1Kids = df_lite_1Kids["NumWebPurchases"].sum() / 1126

df_lite_2Kids = df_lite[df_lite["total_kids"]==2]
avg_NumWeb_2Kids = df_lite_2Kids["NumWebPurchases"].sum() / 421

df_lite_3Kids = df_lite[df_lite["total_kids"]==3]
avg_NumWeb_3Kids = df_lite_3Kids["NumWebPurchases"].sum() / 53


In [285]:
df_lite_0Kids = df_lite[df_lite["total_kids"]==0]
avg_NumCatalog_0Kids = df_lite_0Kids["NumCatalogPurchases"].sum() / 637

df_lite_1Kids = df_lite[df_lite["total_kids"]==1]
avg_NumCatalog_1Kids = df_lite_1Kids["NumCatalogPurchases"].sum() / 1126

df_lite_2Kids = df_lite[df_lite["total_kids"]==2]
avg_NumCatalog_2Kids = df_lite_2Kids["NumCatalogPurchases"].sum() / 421

df_lite_3Kids = df_lite[df_lite["total_kids"]==3]
avg_NumCatalog_3Kids = df_lite_3Kids["NumCatalogPurchases"].sum() / 53

In [286]:
df_lite_0Kids = df_lite[df_lite["total_kids"]==0]
avg_NumStore_0Kids = df_lite_0Kids["NumStorePurchases"].sum() / 637

df_lite_1Kids = df_lite[df_lite["total_kids"]==1]
avg_NumStore_1Kids = df_lite_1Kids["NumStorePurchases"].sum() / 1126

df_lite_2Kids = df_lite[df_lite["total_kids"]==2]
avg_NumStore_2Kids = df_lite_2Kids["NumStorePurchases"].sum() / 421

df_lite_3Kids = df_lite[df_lite["total_kids"]==3]
avg_NumStore_3Kids = df_lite_3Kids["NumStorePurchases"].sum() / 53

In [287]:
df_lite_0Kids = df_lite[df_lite["total_kids"]==0]
avg_NumDeals_0Kids = df_lite_0Kids["NumDealsPurchases"].sum() / 637

df_lite_1Kids = df_lite[df_lite["total_kids"]==1]
avg_NumDeals_1Kids = df_lite_1Kids["NumDealsPurchases"].sum() / 1126

df_lite_2Kids = df_lite[df_lite["total_kids"]==2]
avg_NumDeals_2Kids = df_lite_2Kids["NumDealsPurchases"].sum() / 421

df_lite_3Kids = df_lite[df_lite["total_kids"]==3]
avg_NumDeals_3Kids = df_lite_3Kids["NumDealsPurchases"].sum() / 53



#print("Average number of Deals Purchased by Kid Amount")
#print("Zero Kids: " + str(avg_NumDeals_0Kids))
#print("One Kid: " + str(avg_NumDeals_1Kids))
#print("Two Kids: " + str(avg_NumDeals_2Kids))
#print("Three Kids: " + str(avg_NumDeals_3Kids))

In [297]:
#creating a dataframe
print(df_lite["total_kids"].value_counts())
kid_qtys = [637,1126,321,53]

temp_data = {"Avg Num of Deal Purchases":[avg_NumDeals_0Kids,avg_NumDeals_1Kids,avg_NumDeals_2Kids,avg_NumDeals_3Kids],
             "Avg Num of Web Purchases":[avg_NumWeb_0Kids,avg_NumWeb_1Kids,avg_NumWeb_2Kids,avg_NumWeb_3Kids],
            "Avg Num of Catalog Purchases":[avg_NumCatalog_3Kids,avg_NumCatalog_3Kids,avg_NumCatalog_3Kids,avg_NumCatalog_3Kids],
            "Avg Num of Store Purchases":[avg_NumStore_0Kids,avg_NumStore_1Kids,avg_NumStore_2Kids,avg_NumStore_3Kids]}


final_data = pd.DataFrame(temp_data)
px.line(final_data).update_layout(xaxis_title="Number of Children", yaxis_title="Number of Purchases")


1    1126
0     637
2     421
3      53
Name: total_kids, dtype: int64


Average number of Deals Purchased by Kid Amount
Zero Kids: 4.39403453689168
One Kid: 4.290408525754884
Two Kids: 3.2327790973871733
Three Kids: 2.8679245283018866


In [None]:
print(df_lite["total_kids"].value_counts())

In [306]:
px.scatter(df_lite, x = "MntWines", y="Income", color="Age", marginal_x = "histogram", marginal_y = "histogram")

In [299]:
px.scatter(df_lite, x = "MntWines", y="Income", color="Age")

In [308]:
px.scatter(df_lite, x = "MntWines", y="Income", color="Marital_Status", marginal_x = "histogram", marginal_y = "histogram")

In [244]:
px.bar(df_lite, x="total_kids", y="MntWines")

In [243]:
px.bar(df_lite, x="total_kids", y="MntWines_Percent")

In [228]:
#Are customers with kids mroe likely to purchase wine?

#print(df_lite["total_kids"])
df_lite_0Kids = df_lite[df_lite["total_kids"]==0]
avg_MntWines_0Kids = df_lite_0Kids["MntWines_Percent"].sum() / 637
print(avg_MntWines_0Kids)

df_lite_1Kids = df_lite[df_lite["total_kids"]==1]
avg_MntWines_1Kids = df_lite_1Kids["MntWines_Percent"].sum() / 1126
print(avg_MntWines_1Kids)

df_lite_2Kids = df_lite[df_lite["total_kids"]==2]
avg_MntWines_2Kids = df_lite_2Kids["MntWines_Percent"].sum() / 421
print(avg_MntWines_2Kids)

df_lite_3Kids = df_lite[df_lite["total_kids"]==3]
avg_MntWines_3Kids = df_lite_3Kids["MntWines_Percent"].sum() / 53
print(avg_MntWines_3Kids)


0.009005989083800368
0.00755716009951007
0.005610728973098948
0.026103971961280722


In [210]:
print(df_lite["total_kids"].value_counts())

1    1126
0     637
2     421
3      53
Name: total_kids, dtype: int64


In [239]:
#Take two, with Amounts, not percents
#print(df_lite["total_kids"])
df_lite_0Kids = df_lite[df_lite["total_kids"]==0]
avg_MntWines_0Kids = df_lite_0Kids["MntWines"].sum() / 637
print(avg_MntWines_0Kids)

df_lite_1Kids = df_lite[df_lite["total_kids"]==1]
avg_MntWines_1Kids = df_lite_1Kids["MntWines"].sum() / 1126
print(avg_MntWines_1Kids)

df_lite_2Kids = df_lite[df_lite["total_kids"]==2]
avg_MntWines_2Kids = df_lite_2Kids["MntWines"].sum() / 421
print(avg_MntWines_2Kids)

df_lite_3Kids = df_lite[df_lite["total_kids"]==3]
avg_MntWines_3Kids = df_lite_3Kids["MntWines"].sum() / 53
print(avg_MntWines_3Kids)

486.88540031397173
267.4946714031972
141.5914489311164
171.37735849056602


In [230]:
px.bar(df_lite, x = "Age", y=["NumDealsPurchases_Percent","NumWebPurchases_Percent","NumCatalogPurchases_Percent","NumStorePurchases_Percent"])

In [237]:
limited_df = df_lite[df_lite["MntWines","MntFruits","MntMeatProducts","MntFishProducts","MntSweetProducts","MntGoldProds"]]
print(limited_df)
#sns.displot(data=limited_df, kind="kde", multiple="stack")

KeyError: ('MntWines', 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts', 'MntGoldProds')

In [169]:
px.box(df_mnts_percent)

In [85]:
#Color options "ID","Year_Birth","Education","Marital_Status","Income","Kidhome","Teenhome"
#px.box(df_lite, x="NumDealsPurchases", y="Income",color="Education")
#px.box(df_lite, x="NumDealsPurchases", y="Income")
px.box(df_nums)


      NumDealsPurchases  NumWebPurchases  NumCatalogPurchases  \
0                     3                8                   10   
1                     2                1                    1   
2                     1                8                    2   
3                     2                2                    0   
4                     5                5                    3   
...                 ...              ...                  ...   
2235                  2                9                    3   
2236                  7                8                    2   
2237                  1                2                    3   
2238                  2                6                    5   
2239                  3                3                    1   

      NumStorePurchases  NumWebVisitsMonth  
0                     4                  7  
1                     2                  5  
2                    10                  4  
3                     4                

In [86]:
px.box(df_mnts)

In [81]:
px.box(df_lite, x="Year_Birth", y="NumDealsPurchases")