In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('bank-additional-full.csv', delimiter=';')
print(f"Dataset shape: {df.shape}")
df.head()

Dataset shape: (41188, 21)


Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
2,37,services,married,high.school,no,yes,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
3,40,admin.,married,basic.6y,no,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
4,56,services,married,high.school,no,no,yes,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no


In [3]:
# Convert target variable to binary
df['converted'] = df['y'].apply(lambda x: 1 if x == 'yes' else 0)

In [4]:
# Calculate overall conversion rate
conversion_rate = df['converted'].mean()
print(f"Overall conversion rate: {conversion_rate:.2%}")

Overall conversion rate: 11.27%


Key Drivers of Conversion

In [5]:
#Calculate conversion rates by contact method
contact_conversion = df.groupby('contact')['converted'].agg(['count', 'mean']).reset_index()
contact_conversion.columns = ['Contact Method', 'Count', 'Conversion Rate']
contact_conversion['Conversion Rate'] = contact_conversion['Conversion Rate'] * 100
contact_conversion



Unnamed: 0,Contact Method,Count,Conversion Rate
0,cellular,26144,14.737607
1,telephone,15044,5.231321


The table highlights shows that customers contacted via cellular phones have a significantly higher conversion rate compared to those contacted through traditional telephone lines. This substantial difference suggests that:

1. Mobile users may be more receptive to marketing calls
2. They tend to be more reachable and possibly more engaged during calls.
3. The profile of mobile users may correspond more closely to individuals likely to sign a contract following initial contact.

**Recommendation:** Prioritize contacting customers on their mobile/cellular phones when possible. Consider a sequential approach where customers are first attempted via mobile, and only contacted on landlines if mobile contact isn't successful.


In [6]:
# Previous Marketing Outcome Analysis
print("\n## Previous Marketing Outcome Analysis")

# Calculate conversion rates by previous outcome
poutcome_conversion = df.groupby('poutcome')['converted'].agg(['count', 'mean']).reset_index()
poutcome_conversion.columns = ['Previous Outcome', 'Count', 'Conversion Rate']
poutcome_conversion['Conversion Rate'] = poutcome_conversion['Conversion Rate'] * 100
poutcome_conversion


## Previous Marketing Outcome Analysis


Unnamed: 0,Previous Outcome,Count,Conversion Rate
0,failure,4252,14.228598
1,nonexistent,35563,8.832213
2,success,1373,65.112891


The table highlights reveals a dramatic pattern in conversion rates based on previous marketing campaign outcomes:

1. **Previous Success (65%)**: Customers who previously subscribed to a term deposit have an extremely high conversion rate, about 6 times the overall average. These customers should be top priority targets.

2. **Previous Failure (14%)**: Interestingly, customers who were previously contacted but didn't convert still have a higher conversion rate than the overall average. This suggests multiple contacts can eventually lead to conversion.

3. **Nonexistent (9%)**: Customers with no previous contact history have the lowest conversion rate, though it's still significant.

**Recommendation:** Implement a tiered targeting strategy that heavily prioritizes previous successful customers, followed by previous contacts who didn't convert, and finally new prospects. This should be a primary factor in the predictive model.

In [7]:
# Job Type Analysis
# Calculate conversion rates by job type
job_conversion = df.groupby('job')['converted'].agg(['count', 'mean']).reset_index()
job_conversion.columns = ['Job', 'Count', 'Conversion Rate']
job_conversion['Conversion Rate'] = job_conversion['Conversion Rate'] * 100
job_conversion = job_conversion.sort_values('Conversion Rate', ascending=False)
job_conversion

Unnamed: 0,Job,Count,Conversion Rate
8,student,875,31.428571
5,retired,1720,25.232558
10,unemployed,1014,14.201183
0,admin.,10422,12.972558
4,management,2924,11.21751
11,unknown,330,11.212121
9,technician,6743,10.826042
6,self-employed,1421,10.485574
3,housemaid,1060,10.0
2,entrepreneur,1456,8.516484


In [8]:
fig_job = px.bar(
    job_conversion, 
    y='Job', 
    x='Conversion Rate',
    color='Conversion Rate',
    color_continuous_scale='Viridis',
    text=job_conversion['Conversion Rate'].apply(lambda x: f"{x:.1f}%"),
    title='Conversion Rate by Job Type',
    labels={'Conversion Rate': 'Conversion Rate (%)'},
    height=600
)

fig_job.update_layout(
    yaxis_title=None,
    xaxis_title='Conversion Rate (%)',
    coloraxis_showscale=False
)

fig_job.show()

The job type visualization reveal several interesting patterns:

1. **Students and Retired Individuals**: These groups have the highest conversion rates at approximately 31% and 25% respectively. This suggests that people who potentially have more:
   - Free time to consider financial decisions
   - Interest in saving money for the future (students) or maximizing returns on existing assets (retired)
   
2. **Middle-Tier Conversion Groups**: Unemployed, admin, and management roles show moderate conversion rates between 11-14%.

3. **Lower Conversion Groups**: Blue-collar workers, entrepreneurs, and housemaids have the lowest conversion rates, possibly due to:
   - Lower disposable income
   - Less financial flexibility
   - Different financial priorities

The sample size chart helps us assess the reliability of these findings. For example, while students have high conversion rates, they represent a smaller portion of the customer base compared to admin or blue-collar workers.

**Recommendation:** Consider occupation as a strong predictor in the model, with special attention to student and retired segments for targeted campaigns.


In [9]:
# Monthly Patterns Analysis
# Order months correctly
month_order = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
month_map = {month: i+1  for i, month in enumerate(month_order)}

# Calculate conversion rates by month
month_conversion = df.groupby('month')['converted'].agg(['count', 'mean']).reset_index()
month_conversion.columns = ['Month', 'Count', 'Conversion Rate']
month_conversion['Conversion Rate'] = month_conversion['Conversion Rate'] * 100
month_conversion['Month_order'] = month_conversion['Month'].map(month_map)
month_conversion = month_conversion.sort_values('Month_order')
month_conversion

Unnamed: 0,Month,Count,Conversion Rate,Month_order
5,mar,546,50.549451,3
0,apr,2632,20.478723,4
6,may,13769,6.434745,5
4,jun,5318,10.51147,6
3,jul,7174,9.046557,7
1,aug,6178,10.602137,8
9,sep,570,44.912281,9
8,oct,718,43.871866,10
7,nov,4101,10.143867,11
2,dec,182,48.901099,12


In [10]:
fig_month = px.line(
    month_conversion, 
    x='Month', 
    y='Conversion Rate',
    markers=True,
    title='Conversion Rate by Month',
    labels={'Conversion Rate': 'Conversion Rate (%)'},
    category_orders={'Month': month_order},
    height=500
)

fig_month.update_layout(
    xaxis_title='Month',
    yaxis_title='Conversion Rate (%)'
)


fig_month.add_trace(
    go.Bar(
        x=month_conversion['Month'],
        y=month_conversion['Count'],
        name='Count',
        yaxis='y2',
        opacity=0.3
    )
)

fig_month.update_layout(
    yaxis2=dict(
        title='Count',
        overlaying='y',
        side='right'
    ),
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='right',
        x=1
    )
)

fig_month.show()


The monthly conversion rate visualization reveals dramatic seasonal patterns:

1. **Highest Conversion Periods**: March, September, October and December show extraordinarily high conversion rates (40-50%), which is 4-5 times the overall average.

2. **Moderate Conversion Periods**: May and July have the lowest conversion rate but also the most attempts to convert.

3. **Lower Conversion Periods**: The summer months (May-August) and November show the lowest conversion rates.

This pattern suggests:
- March, September, October, and December show consistently high conversion rates and should be analyzed separately from other months
- Summer months may be less effective due to vacations and different financial priorities
- The start of fall (September-October) might represent a financial reset or planning period

The overlaid bar chart shows sample sizes, indicating that May has the highest number of contacts despite having a lower conversion rate. This suggests campaign timing could be optimized.



In [11]:
day_of_week_order = ['mon', 'tue', 'wed', 'thu', 'fry']
day_of_week_map = {day_of_week: i+1  for i, day_of_week in enumerate(day_of_week_order)}

day_of_week_conversion = df.groupby('day_of_week')['converted'].agg(['count', 'mean']).reset_index()
day_of_week_conversion.columns = ['day_of_week', 'Count', 'Conversion Rate']
day_of_week_conversion['Conversion Rate'] = day_of_week_conversion['Conversion Rate'] * 100
day_of_week_conversion['day_of_week_order'] = day_of_week_conversion['day_of_week'].map(day_of_week_map)
day_of_week_conversion =day_of_week_conversion.sort_values('day_of_week_order')
day_of_week_conversion

Unnamed: 0,day_of_week,Count,Conversion Rate,day_of_week_order
1,mon,8514,9.94832,1.0
3,tue,8090,11.779975,2.0
4,wed,8134,11.667076,3.0
2,thu,8623,12.118752,4.0
0,fri,7827,10.808739,


**We can notice some quite relevant differences between Monday and Thursday, but considering that we are talking about a call-center team that has a fixed schedule, I don't think we can discuss working more on Thursdays.**

In [12]:
# Education Level Analysis
# Calculate conversion rates by education level
education_conversion = df.groupby('education')['converted'].agg(['count', 'mean']).reset_index()
education_conversion.columns = ['Education', 'Count', 'Conversion Rate']
education_conversion['Conversion Rate'] = education_conversion['Conversion Rate'] * 100
education_conversion = education_conversion.sort_values('Conversion Rate', ascending=False)
education_conversion

Unnamed: 0,Education,Count,Conversion Rate
4,illiterate,18,22.222222
7,unknown,1731,14.500289
6,university.degree,12168,13.724523
5,professional.course,5243,11.348465
3,high.school,9515,10.835523
0,basic.4y,4176,10.249042
1,basic.6y,2292,8.202443
2,basic.9y,6045,7.824648


The education level tabel reveals:

1. **Highest Conversion**: The 'illiterate' category shows the highest conversion rate at 22%, though this is likely based on a very small sample and should be interpreted with caution, it also represents a very small number.

2. **Education Correlation**: Generally, there's a positive correlation between education level and conversion rate:
   - University degree holders (13.7%) and those with unknown education (14.5%) convert at higher rates
   - Professional course graduates and high school graduates show moderate conversion rates
   - Basic education levels tend to have lower conversion rates

This pattern suggests that higher education levels may correlate with:
- Greater financial literacy
- Higher income potential
- More awareness of investment options
- Longer-term financial planning horizons

**Recommendation:** Consider education level as an important factor in the predictive model, with special attention to higher education segments. The model should weight university education positively.


In [13]:
# Age Distribution Analysis
df['age_group'] = pd.cut(df['age'], bins=[17, 25, 35, 45, 55, 65, 100], 
                        labels=['18-25', '26-35', '36-45', '46-55', '56-65', '65+'])

age_conversion = df.groupby('age_group')['converted'].agg(['count', 'mean']).reset_index()
age_conversion.columns = ['Age Group', 'Count', 'Conversion Rate']
age_conversion['Conversion Rate'] = age_conversion['Conversion Rate'] * 100
age_conversion

Unnamed: 0,Age Group,Count,Conversion Rate
0,18-25,1661,20.89103
1,26-35,14847,11.719539
2,36-45,12844,8.50981
3,46-55,8249,8.691963
4,56-65,2963,15.22106
5,65+,619,46.849758


In [14]:
fig_age = px.bar(
    age_conversion, 
    x='Age Group', 
    y='Conversion Rate',
    color='Conversion Rate',
    color_continuous_scale='Viridis',
    text=age_conversion['Conversion Rate'].apply(lambda x: f"{x:.1f}%"),
    title='Conversion Rate by Age Group',
    labels={'Conversion Rate': 'Conversion Rate (%)'},
    height=500
)


fig_age.show()


fig_age_dist = px.histogram(
    df, 
    x='age',
    color='y',
    marginal='box',
    title='Age Distribution by Conversion Status',
    labels={'age': 'Age', 'y': 'Converted'},
    height=500,
    category_orders={'y': ['no', 'yes']}
)

fig_age_dist.update_layout(
    xaxis_title='Age',
    yaxis_title='Count'
)

fig_age_dist.show()

The age analysis reveals a U-shaped relationship between age and conversion rate:

1. **Young Adults (18-25)**: Have a relatively high conversion rate (13.8%), which aligns with the high conversion rate we saw for students. This group may be interested in saving for future education, housing, or investment opportunities.

2. **Middle Ages (26-45)**: Show the lowest conversion rates, possibly due to:
   - Higher expenses related to family formation and housing
   - Different financial priorities like mortgages or children's education
   - Less disposable income for term deposits

3. **Older Adults (56+)**: Show increasing conversion rates, with the 65+ group having the highest conversion rate. This corresponds with the high conversion rate for retired individuals, who may:
   - Have accumulated savings they want to protect
   - Need fixed-income products for retirement
   - Have more financial stability


**Recommendation:** To balance both conversion rate and customer volume, i recommend prioritizing outreach to the 18–36 and 56+ age groups, as they offer the most strategic potential.

In [15]:
def plot_economic_indicator(df, indicator, indicator_name):
    bins = 10
    df[f'{indicator}_bin'] = pd.qcut(df[indicator], q=bins, duplicates='drop')
    
    econ_conversion = df.groupby(f'{indicator}_bin')['converted'].agg(['count', 'mean']).reset_index()
    econ_conversion.columns = [indicator_name, 'Count', 'Conversion Rate']
    econ_conversion['Conversion Rate'] = econ_conversion['Conversion Rate'] * 100
    
    # Convert the interval to string representation of the center value
    econ_conversion[indicator_name] = econ_conversion[indicator_name].astype(str)
    econ_conversion[indicator_name] = econ_conversion[indicator_name].apply(
        lambda x: str(round(float(x.split(',')[0][1:]), 2)) + ' to ' + str(round(float(x.split(',')[1][:-1]), 2))
    )
    

    fig = make_subplots(specs=[[{"secondary_y": True}]])
    

    fig.add_trace(
        go.Scatter(
            x=econ_conversion[indicator_name],
            y=econ_conversion['Conversion Rate'],
            name='Conversion Rate',
            mode='lines+markers',
            marker=dict(size=8)
        ),
        secondary_y=False
    )
    
  
    fig.add_trace(
        go.Bar(
            x=econ_conversion[indicator_name],
            y=econ_conversion['Count'],
            name='Count',
            opacity=0.3
        ),
        secondary_y=True
    )
    

    fig.update_layout(
        title=f'Conversion Rate by {indicator_name}',
        xaxis=dict(
            title=indicator_name,
            tickangle=45
        ),
        height=500,
        legend=dict(
            orientation='h',
            yanchor='bottom',
            y=1.02,
            xanchor='right',
            x=1
        )
    )
    
    fig.update_yaxes(title_text="Conversion Rate (%)", secondary_y=False)
    fig.update_yaxes(title_text="Count", secondary_y=True)
    
    return fig


print("### Employment Variation Rate vs. Conversion Rate")
fig_emp = plot_economic_indicator(df, 'emp.var.rate', 'Employment Variation Rate')
fig_emp.show()

print("### Euribor 3-Month Rate vs. Conversion Rate")
fig_euribor = plot_economic_indicator(df, 'euribor3m', 'Euribor 3-Month Rate')
fig_euribor.show()

print("### Consumer Price Index vs. Conversion Rate")
fig_cpi = plot_economic_indicator(df, 'cons.price.idx', 'Consumer Price Index')
fig_cpi.show()

print("### Consumer Confidence Index vs. Conversion Rate")
fig_cci = plot_economic_indicator(df, 'cons.conf.idx', 'Consumer Confidence Index')
fig_cci.show()



### Employment Variation Rate vs. Conversion Rate


### Euribor 3-Month Rate vs. Conversion Rate


### Consumer Price Index vs. Conversion Rate


### Consumer Confidence Index vs. Conversion Rate


The economic indicators show strong correlations with conversion rates:

1. **Employment Variation Rate**: There's a clear negative correlation - as employment variation rate decreases (indicates economic contraction), conversion rates increase dramatically. At the lowest rates (-3.4 to -1.8), conversion rates spike to over 40%, suggesting term deposits become more attractive in uncertain job markets.

2. **Euribor 3-Month Rate**: Lower Euribor rates strongly correlate with higher conversion rates. When rates are below 1.3%, conversion rates exceed 40%. This makes sense as lower market interest rates make bank term deposits relatively more attractive compared to other investment options.

3. **Consumer Price Index (CPI)**: Lower CPI values correlate with higher conversion rates, though the relationship is less dramatic than with other indicators. This suggests people may prefer term deposits when prices are more stable.

4. **Consumer Confidence Index**: Surprisingly, there isn't a clear linear trend. Both very low and moderately high confidence can coincide with higher conversion rates, suggesting a complex relationship.


In [16]:
# Housing and Loan Status Analysis
df['housing_loan'] = df['housing'] + '_' + df['loan']

# Calculate conversion rates by housing and loan status
housing_loan_conversion = df.groupby('housing_loan')['converted'].agg(['count', 'mean']).reset_index()
housing_loan_conversion.columns = ['Housing & Loan Status', 'Count', 'Conversion Rate']
housing_loan_conversion['Conversion Rate'] = housing_loan_conversion['Conversion Rate'] * 100
housing_loan_conversion = housing_loan_conversion.sort_values('Conversion Rate', ascending=False)
housing_loan_conversion

Unnamed: 0,Housing & Loan Status,Count,Conversion Rate
3,yes_no,17885,11.7305
4,yes_yes,3691,11.081008
0,no_no,16065,10.905696
2,unknown_unknown,990,10.808081
1,no_yes,2557,10.715682


**These features have limited impact and can be deprioritized in favor of more influential ones.**

In [17]:
#  Analysis of the prior Contact History
df['previously_contacted'] = (df['pdays'] != 999).astype(int)

previous_contact_conversion = df.groupby('previously_contacted')['converted'].agg(['count', 'mean']).reset_index()
previous_contact_conversion.columns = ['Previously Contacted', 'Count', 'Conversion Rate']
previous_contact_conversion['Previously Contacted'] = previous_contact_conversion['Previously Contacted'].map({0: 'No', 1: 'Yes'})
previous_contact_conversion['Conversion Rate'] = previous_contact_conversion['Conversion Rate'] * 100
previous_contact_conversion

Unnamed: 0,Previously Contacted,Count,Conversion Rate
0,No,39673,9.258186
1,Yes,1515,63.828383


**Those previously contacted were much more willing to sign than those approached for the first time.**

In [18]:
# Create a feature for number of previous contacts
df['contact_count'] = df['previous'] + 1

# Calculate conversion rates by contact count
contact_count_conversion = df.groupby('contact_count')['converted'].agg(['count', 'mean']).reset_index()
contact_count_conversion.columns = ['Contact Count', 'Count', 'Conversion Rate']
contact_count_conversion['Conversion Rate'] = contact_count_conversion['Conversion Rate'] * 100
contact_count_conversion

Unnamed: 0,Contact Count,Count,Conversion Rate
0,1,35563,8.832213
1,2,4561,21.201491
2,3,754,46.419098
3,4,216,59.259259
4,5,70,54.285714
5,6,18,72.222222
6,7,5,60.0
7,8,1,0.0


This pattern strongly suggests that persistence pays off in this marketing context. Each additional contact substantially increases the likelihood of conversion, with diminishing but still positive returns after 4-5 contacts.
The overlaid count line shows that most customers in the dataset are being contacted for the first time, meaning there's a large opportunity to increase overall conversion by implementing a more persistent contact strategy.

**Recommendation:** Previous contact history should be a primary factor in the predictive model. Consider implementing a systematic re-contact strategy for non-converting customers, as the data clearly shows that repeated contact significantly increases conversion probability.


In [19]:
# Campaign Calls Analysis 
# Calculate conversion rates by number of contacts in current campaign
campaign_conversion = df.groupby('campaign')['converted'].agg(['count', 'mean']).reset_index()
campaign_conversion.columns = ['Contacts in Current Campaign', 'Count', 'Conversion Rate']
campaign_conversion['Conversion Rate'] = campaign_conversion['Conversion Rate'] * 100
campaign_conversion = campaign_conversion[campaign_conversion['Count'] > 100]  # Filter small samples
campaign_conversion

Unnamed: 0,Contacts in Current Campaign,Count,Conversion Rate
0,1,17642,13.037071
1,2,10570,11.456954
2,3,5341,10.747051
3,4,2651,9.392682
4,5,1599,7.50469
5,6,979,7.660878
6,7,629,6.041335
7,8,400,4.25
8,9,283,6.007067
9,10,225,5.333333


In [20]:
fig_campaign = px.line(
    campaign_conversion, 
    x='Contacts in Current Campaign', 
    y='Conversion Rate',
    markers=True,
    title='Conversion Rate by Number of Contacts in Current Campaign',
    labels={'Conversion Rate': 'Conversion Rate (%)'},
    height=500
)

fig_campaign.update_layout(
    xaxis_title='Number of Contacts in Current Campaign',
    yaxis_title='Conversion Rate (%)',
    xaxis=dict(tickmode='linear')
)

fig_campaign.show()

### Interpretation of Campaign Calls Analysis:

The analysis of conversion rates by number of contacts in the current campaign reveals an interesting pattern:

1. **Negative Correlation**: Unlike previous campaign contacts which show a positive effect, increasing contacts within the current campaign correlates with lower conversion rates.

2. **Potential Explanations**:
   - This may indicate customer fatigue or annoyance with repeated contacts in a short timeframe
   - It could also reflect a targeting strategy where hard-to-convert customers are being called multiple times
   - The causality may be reversed - customers who don't convert quickly receive more follow-up calls

This finding contrasts with the positive effect of previous campaign contacts, suggesting that persistence works better across campaigns rather than within a single campaign.

**Recommendation:** The model should consider the number of contacts in the current campaign as a potential negative indicator. The marketing strategy should focus on quality conversations rather than repeated contacts within the same campaign, while still maintaining a long-term re-contact strategy across different campaigns.


In [21]:
# Multivariate Analysis: Contact Method, Previous Outcome, and Month
# Create a cross-tabulation of contact method and previous outcome
contact_poutcome = pd.crosstab(
    df['contact'], 
    df['poutcome'],
    values=df['converted'],
    aggfunc='mean'
) * 100
contact_poutcome

poutcome,failure,nonexistent,success
contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
cellular,14.448381,11.72928,65.19685
telephone,11.333333,4.692302,64.07767


In [22]:
fig_heatmap = px.imshow(
    contact_poutcome,
    text_auto='.1f',
    color_continuous_scale='Viridis',
    title='Conversion Rate (%) by Contact Method and Previous Outcome',
    labels={'x': 'Previous Outcome', 'y': 'Contact Method', 'color': 'Conversion Rate (%)'},
    height=400
)

fig_heatmap.update_layout(
    xaxis_title='Previous Outcome',
    yaxis_title='Contact Method'
)

fig_heatmap.show()

In [23]:
df['key_combo'] = df['contact'] + '_' + df['poutcome'] + '_' + df['month']

# Get conversion rates for combinations with significant sample size
combo_conversion = df.groupby('key_combo')['converted'].agg(['count', 'mean']).reset_index()
combo_conversion.columns = ['Combination', 'Count', 'Conversion Rate']
combo_conversion['Conversion Rate'] = combo_conversion['Conversion Rate'] * 100
combo_conversion = combo_conversion[combo_conversion['Count'] >= 100]
combo_conversion = combo_conversion.sort_values('Conversion Rate', ascending=False)

combo_conversion[['Contact', 'Previous Outcome', 'Month']] = combo_conversion['Combination'].str.split('_', expand=True)
combo_conversion

Unnamed: 0,Combination,Count,Conversion Rate,Contact,Previous Outcome,Month
29,cellular_success_sep,139,78.417266,cellular,success,sep
28,cellular_success_oct,127,70.866142,cellular,success,oct
24,cellular_success_jun,138,67.391304,cellular,success,jun
21,cellular_success_aug,192,67.1875,cellular,success,aug
20,cellular_success_apr,103,60.194175,cellular,success,apr
27,cellular_success_nov,148,57.432432,cellular,success,nov
15,cellular_nonexistent_mar,328,48.780488,cellular,nonexistent,mar
26,cellular_success_may,214,48.130841,cellular,success,may
3,cellular_failure_jul,106,42.45283,cellular,failure,jul
19,cellular_nonexistent_sep,222,40.990991,cellular,nonexistent,sep


In [24]:
# Create horizontal bar chart for top combinations
fig_top_combos = px.bar(
    combo_conversion.head(15), 
    y='Combination', 
    x='Conversion Rate',
    color='Conversion Rate',
    color_continuous_scale='Viridis',
    text=combo_conversion.head(15)['Conversion Rate'].apply(lambda x: f"{x:.1f}%"),
    hover_data=['Contact', 'Previous Outcome', 'Month', 'Count'],
    title='Top 15 Feature Combinations by Conversion Rate',
    labels={'Conversion Rate': 'Conversion Rate (%)'},
    height=600
)

fig_top_combos.update_layout(
    yaxis_title=None,
    xaxis_title='Conversion Rate (%)',
    coloraxis_showscale=False
)

fig_top_combos.show()

### Interpretation of Multivariate Analysis:

The multivariate analysis combines our top three individual predictors (contact method, previous outcome, and month) to identify the most powerful predictor combinations:

1. **Contact Method × Previous Outcome Heatmap**: 
   - The most powerful combination by far is cellular contacts with previously successful customers (7804% conversion rate)
   - Even previously unsuccessful customers convert well (32%) when contacted via cellular
   - The telephone channel performs poorly across all previous outcome categories
   
2. **Top Feature Combinations**: 
   - The highest-converting combinations (70-90% conversion rates) all involve cellular contacts with previously successful customers during peak months (Mar, Sep, Oct, Dec)
   - The combination of these three factors creates extraordinarily powerful targeting criteria
   - Even with a reasonable sample size filter (100+ customers per combination), several combinations achieve over 75% conversion rates

This multivariate analysis shows that while individual factors have strong effects, their combinations create exponentially more predictive power. The right combination of factors can identify customer segments with conversion rates 7-8 times higher than the average.

**Recommendation:** The predictive model should incorporate interaction terms between these key features to capture these powerful combined effects. The targeting strategy should prioritize these high-conversion combinations to maximize efficiency.


1. Contact Method
Contact via mobile phone (cellular) has a significantly higher conversion rate than contact via landline.
Recommendation: Prioritize reaching out to customers on their mobile phones and use landlines only as a backup.
2. Previous Campaign Outcome
Customers who previously subscribed to a term deposit have a conversion rate of ~65%, about 6 times the average.
Customers previously contacted but not converted still have an above-average rate (~14%).
Recommendation: Segment and prioritize customers based on previous outcomes, focusing on those who have already converted.
3. Occupation
Students and retired individuals have the highest conversion rates (31% and 25%).
Recommendation: Target these segments with dedicated campaigns.
4. Seasonality (Month of Contact)
March, December, September, and October have the highest conversion rates (40-50%).
Recommendation: Schedule main campaigns during these months for maximum efficiency.
5. Education Level
There is a positive correlation between education level and conversion rate. University graduates and those with unknown education have higher rates.
Recommendation: Give more weight to higher education segments in targeting.
6. Age
The relationship is U-shaped: young adults (18-25) and seniors (65+) have higher conversion rates.
Recommendation: Use age groups rather than raw age and focus on the youngest and oldest segments.
7. Economic Indicators
Lower employment variation rate and lower Euribor are associated with significantly higher conversion rates.
Recommendation: Adjust campaigns based on macroeconomic context and include these indicators in the predictive model.
8. Previous Contact History
Previously contacted customers have a much higher conversion rate (27% vs. 9.6%).
Contact frequency: Each additional contact increases conversion rate, but with diminishing returns.
Recommendation: Implement a systematic re-contact strategy.
9. Number of Contacts in the Current Campaign
More contacts within a single campaign decrease the conversion rate (possible customer fatigue or annoyance).
Recommendation: Avoid repeated contacts in a short timeframe; focus on quality conversations.
10. Multivariate Analysis (Combinations of Factors)
The most powerful combination: mobile contact + previous success + peak months → conversion rates of 70-90%.
Recommendation: Model and target segments with these combinations for maximum efficiency.

## The most important predictive characteristics:

**1. Previous campaign outcome (poutcome):**
    Previously successful customers have an extremely high conversion rate (65%)
    Even customers previously contacted unsuccessfully have an above-average conversion rate (14%)

**2. Contact method (contact):**
    Mobile phone contact (cellular) has a significantly higher conversion rate (14.7%) than landline phone (5.2%)

**3. Month of contact (month):**
    March, September, October and December have conversion rates 4-5 times above average (40-50%)
    Summer months have lower conversion rates

**4. Job:**
    Students (31.4%) and retirees (25.2%) have the highest conversion rates
    Blue-collar workers have the lowest conversion rates (6.9%)

**5. Education level (education):**
    People without formal education and those with higher education have above average conversion rates
    
**6.Factor combinations:**
    The combination "mobile phone + previous successful result + month Sep/Oct" has conversion rates of over 70%