# Marketing Campaigns Analysis

## 1. Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
HEIGHT_PLOTLY = 500
WIDTH_PLOTLY = 900

In [3]:
market_data = pd.read_csv("H:/Marketing_Campain_Analyst/Data/marketing-data (3).csv")
market_data.head(5)

Unnamed: 0,Income,Kidhome,Teenhome,Recency,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds,...,marital_Together,marital_Widow,education_2n Cycle,education_Basic,education_Graduation,education_Master,education_PhD,MntTotal,MntRegularProds,AcceptedCmpOverall
0,58138.0,0,0,58,635,88,546,172,88,88,...,0,0,0,0,1,0,0,1529,1441,0
1,46344.0,1,1,38,11,1,6,2,1,6,...,0,0,0,0,1,0,0,21,15,0
2,71613.0,0,0,26,426,49,127,111,21,42,...,1,0,0,0,1,0,0,734,692,0
3,26646.0,1,0,26,11,4,20,10,3,5,...,1,0,0,0,1,0,0,48,43,0
4,58293.0,1,0,94,173,43,118,46,27,15,...,0,0,0,0,0,0,1,407,392,0


In [4]:
market_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2205 entries, 0 to 2204
Data columns (total 39 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Income                2205 non-null   float64
 1   Kidhome               2205 non-null   int64  
 2   Teenhome              2205 non-null   int64  
 3   Recency               2205 non-null   int64  
 4   MntWines              2205 non-null   int64  
 5   MntFruits             2205 non-null   int64  
 6   MntMeatProducts       2205 non-null   int64  
 7   MntFishProducts       2205 non-null   int64  
 8   MntSweetProducts      2205 non-null   int64  
 9   MntGoldProds          2205 non-null   int64  
 10  NumDealsPurchases     2205 non-null   int64  
 11  NumWebPurchases       2205 non-null   int64  
 12  NumCatalogPurchases   2205 non-null   int64  
 13  NumStorePurchases     2205 non-null   int64  
 14  NumWebVisitsMonth     2205 non-null   int64  
 15  AcceptedCmp3         

## About Dataset
## Content
  **39 columns**
+ AcceptedCmp1 - 1 if customer accepted the offer in the 1st campaign, 0 otherwise
+ AcceptedCmp2 - 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
+ AcceptedCmp3 - 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
+ AcceptedCmp4 - 1 if customer accepted the offer in the 4th campaign, 0 otherwise
+ AcceptedCmp5 - 1 if customer accepted the offer in the 5th campaign, 0 otherwise
+ Response (target) - 1 if customer accepted the offer in the last campaign, 0 otherwise
+ Complain - 1 if customer complained in the last 2 years
+ DtCustomer - date of customer’s enrolment with the company
+ Education - customer’s level of education
+ Marital - customer’s marital status
+ Kidhome - number of small children in customer’s household
+ Teenhome - number of teenagers in customer’s household
+ Income - customer’s yearly household income
+ MntFishProducts - amount spent on fish products in the last 2 years
+ MntMeatProducts - amount spent on meat products in the last 2 years
+ MntFruits - amount spent on fruits products in the last 2 years
+ MntSweetProducts - amount spent on sweet products in the last 2 years
+ MntWines - amount spent on wine products in the last 2 years
+ MntGoldProds - amount spent on gold products in the last 2 years
+ NumDealsPurchases - number of purchases made with discount
+ NumCatalogPurchases - number of purchases made using catalogue
+ NumStorePurchases - number of purchases made directly in stores
+ NumWebPurchases - number of purchases made through company’s web site
+ NumWebVisitsMonth - number of visits to company’s web site in the last month
+ Recency - number of days since the last purchase

About plot we will choose column detail:
| Plot | Column | Purpose | Link |
| --- | --- | --- | --- |
| Histogram plot,Box Plot | Income | Visualize distribution of customer income | [Link](#plot1a) |
| Box plot,Bar plot,Pie plot, Area plot | MntWines, MntMeatProducts, MntSweetProducts,MntFruits, MntFishProducts, MntGoldProds | Visualize product category | [Link](#plot1b) |
| Scatter plot | Income,MntTotal | Visualize correlation income and Total Spending | [Link](#plot2b) |
| Bar plot | AcceptedCmp | Visualize percentage of customers accepted campaign | [Link](#plot1c) |
| Scatter plot | NumWebVisitsMonth,NumWebPurchases | Visualize correlation NumWebVisitsMonth and NumWebPurchases | [Link](#plot1d) |


### a. Demographic Analysis

- What is the distribution of customer income?


<a id='plot1a'></a>


In [5]:
import plotly.express as px

# Assuming 'Income' is the column name for income data in your market_data DataFrame
fig = px.histogram(market_data, x="Income", nbins=50, title="Distribution of Income")
fig.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY)  # Set smaller dimensions
fig.show()

fig_box = px.box(market_data, x="Income", title="Boxplot of Income")
fig_box.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY)  # Set smaller dimensions
fig_box.show()

In [6]:
print(market_data["Income"].describe())

count      2205.000000
mean      51622.094785
std       20713.063826
min        1730.000000
25%       35196.000000
50%       51287.000000
75%       68281.000000
max      113734.000000
Name: Income, dtype: float64


In [7]:
market_data["Income"].mode()

0    7500.0
Name: Income, dtype: float64

## **Explanation plot**

The histogram appers to have a **bell-shaped curved** meaning that the data might follow a normal-like distribution. The peak is between 40k-60k, indicating most incomes fall withing this range with the mean is about 51k approximately.

### b. Spending Behavior

- Which product category (e.g., wines, meats, sweets) has the highest and lowest
average spending?
- Which product category (e.g., wines, meats, sweets) has the highest and lowest
cumulative sum spending (USD) with vertical axis is Time/Order index?

<a id='plot1b'></a>


In [25]:
# prompt: Plot the boxplot for all product categories (wines, meats, sweets) and compare the average spending draw in Plotly

# Assuming 'MntWines', 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts', 'MntGoldProds' are the columns for product spending

# Create a new DataFrame with only the relevant columns
spending_data = market_data[['MntWines', 'MntMeatProducts', 'MntSweetProducts',
                             'MntFruits', 'MntFishProducts', 'MntGoldProds']]

# Rename columns for better readability in the plot
spending_data = spending_data.rename(columns={
    'MntWines': 'Wines',
    'MntMeatProducts': 'Meats',
    'MntSweetProducts': 'Sweets',
    'MntFruits': 'Fruits',
    'MntFishProducts': 'Fish',
    'MntGoldProds': 'Gold'
})

# Melt the DataFrame to create a long-format DataFrame for plotting
spending_melted = spending_data.melt(var_name='Product Category', value_name='Spending')


fig = px.box(spending_melted,
             x='Product Category',
             y='Spending',
             title='Spending Distribution by Product Category (in USD)',
             color='Product Category')


fig.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY)

fig.show()


# Calculate the average spending for each category
average_spending = spending_data.mean()

# Create a bar chart of the average spending
fig_bar = px.bar(average_spending,
                 x=average_spending.index,
                 y=average_spending.values,
                 title='Average Spending per Product Category (in USD)')

fig_bar.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY,
                      xaxis_title="Product Category",
                      yaxis_title="Average Spending")

fig_bar.show()

# Calculate the total average spending
total_spending = average_spending.sum()

# Calculate the percentage of average spending for each category
average_spending_percentage = (average_spending / total_spending) * 100

# Create a pie chart for the average spending distribution
fig_pie = px.pie(values=average_spending_percentage,
                 names=average_spending.index,
                 title='Average Spending Distribution by Product Category',
                 hole=0.4)  # Adjust the hole size for a donut chart (optional)

# Customize the layout
fig_pie.update_traces(textinfo='percent+label')  # Show percentage and category label
fig_pie.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY)

fig_pie.show()

# Create a cumulative sum DataFrame for plotting area chart
cumulative_spending = spending_data.cumsum()

# Plot an area chart
fig_area = px.area(cumulative_spending,
                   x=cumulative_spending.index,  # Use the DataFrame index as the x-axis
                   y=cumulative_spending.columns,  # Use product categories as the y-axis
                   title='Cumulative Spending Across Product Categories',
                   labels={'value': 'Cumulative Spending (USD)', 'index': 'Time/Order Index'},
                   color_discrete_sequence=px.colors.qualitative.Pastel)  # Optional: Set color palette

# Customize layout
fig_area.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY,
                       xaxis_title="Time/Order Index",
                       yaxis_title="Cumulative Spending (USD)")

fig_area.show()


- What is the total spending across all categories (MntTotal) for customers with varying income levels?

<a id='plot2b'></a>


In [10]:
# prompt: Can you display the correlation of this (income and Total Spending) and write into the plot

# Assuming 'Income' and 'MntTotal' are columns in your market_data DataFrame
fig = px.scatter(market_data, x="Income", y="MntTotal",
                 title="Total Spending (MntTotal) vs. Income",
                 labels={"MntTotal": "Total Spending", "Income": "Income"},
                 ) # Add trendline

fig.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY)

# Calculate the correlation
correlation = market_data['Income'].corr(market_data['MntTotal'])

# Add correlation to the plot
fig.add_annotation(
    x=0.05,  # Adjust x position as needed
    y=0.9,  # Adjust y position as needed
    text=f"Correlation: {correlation:.2f}",
    showarrow=False,
    font=dict(size=15, color="red"),  # Adjust font size as needed
    xref="paper",
    yref="paper"
)

fig.show()

The Pearson Correlation of this plot is: 0.82

### c. Campaign Effectiveness:

<a id='plot1c'></a>

- What percentage of customers accepted offers in each campaign
(AcceptedCmp1-5)? Is there a trend in acceptance rates across campaigns?

In [11]:
# Create the new column 'AcceptedCampaign'
market_data['AcceptedCampaign'] = 0

# Iterate through the accepted campaigns and assign corresponding values
for i in range(1, 6):
    market_data.loc[market_data[f'AcceptedCmp{i}'] == 1, 'AcceptedCampaign'] = i

In [12]:
# Calculate the percentage of customers who accepted each campaign 
campaign_acceptance_counts = market_data['AcceptedCampaign'].value_counts(normalize=True) * 100

# Create the bar chart using Plotly Express
fig = px.bar(
    campaign_acceptance_counts,
    x=campaign_acceptance_counts.index,
    y=campaign_acceptance_counts.values,
    labels={'x': 'Accepted Campaign', 'y': 'Percentage of Customers'},
    title='Percentage of Customers Who Accepted Each Campaign',
    color=campaign_acceptance_counts.index  # Use campaign number for color
)

# Update layout for better visualization
fig.update_layout(
    width=WIDTH_PLOTLY,
    height=HEIGHT_PLOTLY,
    xaxis=dict(
        tickmode='array',
        tickvals=campaign_acceptance_counts.index,
        ticktext=['Don\'t accepted campaign' if val == 0 else val for val in campaign_acceptance_counts.index]
    ),
    yaxis=dict(tickformat=".0%"),  # Format y-axis with percentage
    coloraxis_colorbar=dict(
        tickvals=campaign_acceptance_counts.index,
        ticktext=['Don\'t accepted campaign' if val == 0 else val for val in campaign_acceptance_counts.index]
    )
)

# Add percentage values to the bars
fig.update_traces(texttemplate='%{y:.2f}%', textposition='outside')

fig.show()


## d. Channel Preferences:

<a id='plot1d'></a>

- How does the number of website visits (NumWebVisitsMonth) relate to the number of website purchases (NumWebPurchases)?

In [13]:
import plotly.express as px
import plotly.graph_objects as go
from sklearn.linear_model import LinearRegression

# Assuming 'NumWebVisitsMonth' and 'NumWebPurchases' are in your market_data DataFrame

fig = px.scatter(market_data, x='NumWebVisitsMonth', y='NumWebPurchases',
                 title='Relationship between Website Visits and Purchases',
                 labels={'NumWebVisitsMonth': 'Number of Website Visits per Month',
                         'NumWebPurchases': 'Number of Website Purchases'})

fig.update_layout(width=WIDTH_PLOTLY, height=HEIGHT_PLOTLY)

# Calculate the correlation
correlation = market_data['NumWebVisitsMonth'].corr(market_data['NumWebPurchases'])

# Add correlation to the plot
fig.add_annotation(
    x=0.05,  # Adjust x position as needed
    y=0.9,  # Adjust y position as needed
    text=f"Correlation: {correlation:.2f}",
    showarrow=False,
    font=dict(size=15, color="red"),  # Adjust font size as needed
    xref="paper",
    yref="paper"
)

# Linear Regression
X = market_data[['NumWebVisitsMonth']]
y = market_data['NumWebPurchases']
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Add regression line to the plot
fig.add_trace(go.Scatter(x=X['NumWebVisitsMonth'], y=y_pred, mode='lines', name='Linear Regression'))

fig.show()