**Introduction**

* This notebook serves as a demonstration to train and enhance my Python skills, particularly focusing on data analysis and visualization. The analysis presented here is based on customer segmentation and marketing strategy optimization using a dataset from a digital marketing campaign. The primary objectives of this analysis are to identify patterns in conversion rates across different customer segments. 

Therefore, I propose the following questions:

**Questions：**

1. Is there a significant difference in conversion rates among different marketing channels?

2. Is the conversion rate for the Email channel higher than that for the Social Media channel?

3. Is the PPC channel more effective in driving customer conversions compared to other channels?

4. How does the conversion rate of the Referral channel compare to other channels?

5. How do different marketing channels perform among different customer segments?

6. What are the conversion rate differences across age groups for different channels?

7. Do different genders respond differently to various marketing channels?

8. How do customers of different income levels engage and convert across different channels?

In [None]:
import pandas as pd

# Load the data
df = pd.read_csv('/kaggle/input/predict-conversion-in-digital-marketing-dataset/digital_marketing_campaign_dataset.csv')

# Preview the first five rows of the data
print("Preview of the data:")
print(df.head())

# Display the data types of each column
print("\nData types of each column:")
print(df.dtypes)

In [None]:
import pandas as pd

missing_values = df.isnull().sum()
print("Missing values in each column:\n", missing_values)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Use boxplot to check for outliers
plt.figure(figsize=(15, 10))
sns.boxplot(data=df)
plt.xticks(rotation=90)
plt.title('Boxplot to Check for Outliers')
plt.show()

In [None]:
# Define age groups
age_bins = [0, 18, 25, 35, 45, 55, 65, 75]
age_labels = ['<18', '18-25', '26-35', '36-45', '46-55', '56-65', '66-75']
df['AgeGroup'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels, right=False)

# Verify the new AgeGroup column
print("\nData with AgeGroup:")
print(df[['Age', 'AgeGroup']].head(10))

# Define income groups
income_bins = [0, 25000, 50000, 75000, 100000, 125000, 150000]
income_labels = ['<25k', '25k-50k', '50k-75k', '75k-100k', '100k-125k', '125k-150k']
df['IncomeGroup'] = pd.cut(df['Income'], bins=income_bins, labels=income_labels, right=False)

# Print the first few rows to verify the change
print("\nData with IncomeGroup:")
print(df[['Income', 'IncomeGroup']].head(10))

In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# One-hot encode gender
onehot_encoder = OneHotEncoder(sparse_output=False)
gender_onehot = onehot_encoder.fit_transform(df[['Gender']])
gender_onehot_df = pd.DataFrame(gender_onehot, columns=onehot_encoder.get_feature_names_out(['Gender']))

# Add one-hot encoded columns back to the original dataframe
df = pd.concat([df, gender_onehot_df], axis=1)
print("\nData with one-hot encoded Gender columns:")
print(df.head())

# Label encode age group and income group
age_order = ['<18', '18-25', '26-35', '36-45', '46-55', '56-65', '66-75']
income_order = ['<25k', '25k-50k', '50k-75k', '75k-100k', '100k-125k', '125k-150k']

age_encoder = LabelEncoder()
income_encoder = LabelEncoder()

age_encoder.fit(age_order)
income_encoder.fit(income_order)

df['AgeGroup_LabelEncoded'] = age_encoder.transform(df['AgeGroup'])
df['IncomeGroup_LabelEncoded'] = income_encoder.transform(df['IncomeGroup'])

print("\nData after Label Encoding AgeGroup:")
print(df[['AgeGroup', 'AgeGroup_LabelEncoded']].head())

print("\nData after Label Encoding IncomeGroup:")
print(df[['IncomeGroup', 'IncomeGroup_LabelEncoded']].head())

In [None]:
# Overall conversion rates by gender
gender_conversion = df.groupby('Gender')['Conversion'].mean()
print("\nOverall Conversion Rates by Gender:")
print(gender_conversion)

# Visualize overall conversion rates by gender
plt.figure(figsize=(8, 6))
sns.barplot(x=gender_conversion.index, y=gender_conversion.values)
plt.title('Overall Conversion Rates by Gender')
plt.xlabel('Gender')
plt.ylabel('Conversion Rate')
plt.show()

In [None]:
# Analyze conversion rates for male group
df_male = df[df['Gender'] == 'Male']
male_income_age_conversion = df_male.groupby(['IncomeGroup', 'AgeGroup'], observed=True)['Conversion'].mean()
print("\nMale - Conversion rates by Income Group and Age Group:")
print(male_income_age_conversion)

male_campaign_conversion = df_male.groupby(['IncomeGroup', 'AgeGroup', 'CampaignType'], observed=True)['Conversion'].mean().unstack()
male_channel_conversion = df_male.groupby(['IncomeGroup', 'AgeGroup', 'CampaignChannel'], observed=True)['Conversion'].mean().unstack()

# Visualize conversion rates for male group
def plot_heatmap(data, title, xlabel, ylabel):
    plt.figure(figsize=(12, 8))
    sns.heatmap(data, annot=True, fmt=".2f", cmap="YlGnBu")
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.show()

# Visualize conversion rates by campaign type
plot_heatmap(male_campaign_conversion, 'Male - Conversion Rates by Income Group, Age Group, and Campaign Type', 'Campaign Type', 'Income and Age Group')

# Visualize conversion rates by campaign channel
plot_heatmap(male_channel_conversion, 'Male - Conversion Rates by Income Group, Age Group, and Campaign Channel', 'Campaign Channel', 'Income and Age Group')

In [None]:
# Analyze conversion rates for male group
df_male = df[df['Gender'] == 'Male']
male_income_age_conversion = df_male.groupby(['IncomeGroup', 'AgeGroup'], observed=True)['Conversion'].mean()
print("\nMale - Conversion rates by Income Group and Age Group:")
print(male_income_age_conversion)

male_campaign_conversion = df_male.groupby(['IncomeGroup', 'AgeGroup', 'CampaignType'], observed=True)['Conversion'].mean().unstack()
male_channel_conversion = df_male.groupby(['IncomeGroup', 'AgeGroup', 'CampaignChannel'], observed=True)['Conversion'].mean().unstack()

# Visualize conversion rates for male group
def plot_heatmap(data, title, xlabel, ylabel):
    plt.figure(figsize=(12, 8))
    sns.heatmap(data, annot=True, fmt=".2f", cmap="YlGnBu")
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.show()

# Visualize conversion rates by campaign type
plot_heatmap(male_campaign_conversion, 'Male - Conversion Rates by Income Group, Age Group, and Campaign Type', 'Campaign Type', 'Income and Age Group')

# Visualize conversion rates by campaign channel
plot_heatmap(male_channel_conversion, 'Male - Conversion Rates by Income Group, Age Group, and Campaign Channel', 'Campaign Channel', 'Income and Age Group')

In [None]:
# Calculate conversion rates by Gender and CampaignChannel
gender_channel_conversion_rates = df.groupby(['Gender', 'CampaignChannel'])['Conversion'].mean().unstack()
print("\nConversion rates by Gender and CampaignChannel:")
print(gender_channel_conversion_rates)

# Calculate conversion rates by Gender and CampaignType
gender_campaign_conversion_rates = df.groupby(['Gender', 'CampaignType'])['Conversion'].mean().unstack()
print("\nConversion rates by Gender and CampaignType:")
print(gender_campaign_conversion_rates)

# Visualize conversion rates by Gender and CampaignType
plt.figure(figsize=(12, 8))
sns.heatmap(gender_campaign_conversion_rates, annot=True, fmt=".2f", cmap="YlGnBu")
plt.title('Conversion Rates by Gender and CampaignType')
plt.xlabel('Campaign Type')
plt.ylabel('Gender')
plt.show()

# Visualize conversion rates by Gender and CampaignChannel
plt.figure(figsize=(12, 8))
sns.heatmap(gender_channel_conversion_rates, annot=True, fmt=".2f", cmap="YlGnBu")
plt.title('Conversion Rates by Gender and CampaignChannel')
plt.xlabel('Campaign Channel')
plt.ylabel('Gender')
plt.show()

In [None]:
# Calculate conversion rates by AgeGroup and CampaignChannel
age_channel_conversion_rates = df.groupby(['AgeGroup', 'CampaignChannel'], observed=True)['Conversion'].mean().unstack()
print("\nConversion rates by AgeGroup and CampaignChannel:")
print(age_channel_conversion_rates)

# Calculate conversion rates by AgeGroup and CampaignType
age_campaign_conversion_rates = df.groupby(['AgeGroup', 'CampaignType'], observed=True)['Conversion'].mean().unstack()
print("\nConversion rates by AgeGroup and CampaignType:")
print(age_campaign_conversion_rates)

# Visualize conversion rates by AgeGroup and CampaignType
plt.figure(figsize=(12, 8))
sns.heatmap(age_campaign_conversion_rates, annot=True, fmt=".2f", cmap="YlGnBu")
plt.title('Conversion Rates by AgeGroup and CampaignType')
plt.xlabel('Campaign Type')
plt.ylabel('Age Group')
plt.show()

# Visualize conversion rates by AgeGroup and CampaignChannel
plt.figure(figsize=(12, 8))
sns.heatmap(age_channel_conversion_rates, annot=True, fmt=".2f", cmap="YlGnBu")
plt.title('Conversion Rates by AgeGroup and CampaignChannel')
plt.xlabel('Campaign Channel')
plt.ylabel('Age Group')
plt.show()

In [None]:
# Calculate conversion rates by IncomeGroup and CampaignChannel
income_channel_conversion_rates = df.groupby(['IncomeGroup', 'CampaignChannel'], observed=True)['Conversion'].mean().unstack()
print("\nConversion rates by IncomeGroup and CampaignChannel:")
print(income_channel_conversion_rates)

# Calculate conversion rates by IncomeGroup and CampaignType
income_campaign_conversion_rates = df.groupby(['IncomeGroup', 'CampaignType'], observed=True)['Conversion'].mean().unstack()
print("\nConversion rates by IncomeGroup and CampaignType:")
print(income_campaign_conversion_rates)

# Visualize conversion rates by IncomeGroup and CampaignType
plt.figure(figsize=(12, 8))
sns.heatmap(income_campaign_conversion_rates, annot=True, fmt=".2f", cmap="YlGnBu")
plt.title('Conversion Rates by IncomeGroup and CampaignType')
plt.xlabel('Campaign Type')
plt.ylabel('Income Group')
plt.show()

# Visualize conversion rates by IncomeGroup and CampaignChannel
plt.figure(figsize=(12, 8))
sns.heatmap(income_channel_conversion_rates, annot=True, fmt=".2f", cmap="YlGnBu")
plt.title('Conversion Rates by IncomeGroup and CampaignChannel')
plt.xlabel('Campaign Channel')
plt.ylabel('Income Group')
plt.show()

**Summary:**

**Male Group:**

**Highest Conversion Rates:**
* Typically appear in higher income segments.
* Perform best in the **Referral** and **Email** channels.

**Lowest Conversion Rates:**
* Usually found in lower income segments.
* Perform poorly in the **PPC** and **SEO** channels.

**Female Group:**

**Highest Conversion Rates:**
Typically appear in higher income segments.
Perform best in the **Retention** activity type and the **SEO** and **Email** channels.

**Lowest Conversion Rates:**
Usually found in lower income segments.
Perform poorly in the **Consideration** activity type and the **SEO** and **PPC** channels.

**Recommendations:**
For all groups, focus on high-income segments. Use **referral** and **email marketing** for men, and **retention activities** and **SEO** for women.

**Personalized Emails:** Send emails that include the customer's name, recommend products based on their interests, and offer discounts.

**Exclusive Benefits for High-Income Female Customers:** Provide unique benefits such as early access to new products, dedicated customer service, and additional discounts to enhance their loyalty.