# Analyzing the past marketing campaign to improve strategies for future ones.

**Data Dictionary**:

| Variable Name | Role    | Type        | Demographic        | Description                                                                                                                                               | Units | Missing Values |
|---------------|---------|-------------|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|-------|----------------|
| age           | Feature | Integer     | Age                |                                                                                                                                                           |       | no             |
| job           | Feature | Categorical | Occupation         | Type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown') |       | no             |
| marital       | Feature | Categorical | Marital Status     | Marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)                                          |       | no             |
| education     | Feature | Categorical | Education Level    | (Categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')                              |       | no             |
| default       | Feature | Binary      |                    | Has credit in default?                                                                                                                                     |       | no             |
| balance       | Feature | Integer     |                    | Average yearly balance                                                                                                                                     | Euros | no             |
| housing       | Feature | Binary      |                    | Has housing loan?                                                                                                                                          |       | no             |
| loan          | Feature | Binary      |                    | Has personal loan?                                                                                                                                         |       | no             |
| contact       | Feature | Categorical |                    | Contact communication type (categorical: 'cellular','telephone')                                                                                           |       | yes            |
| day   | Feature | Date        |                    | Last contact day of the week                                                                                                                               |       | no             |
| month         | Feature | Date        |                    | Last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')                                                                           |       | no             |
| duration      | Feature | Integer     |                    | Last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model. |       | no             |
| campaign      | Feature | Integer     |                    | Number of contacts performed during this campaign and for this client (numeric, includes last contact)                                                     |       | no             |
| pdays         | Feature | Integer     |                    | Number of days that passed by after the client was last contacted from a previous campaign (numeric; -1 means client was not previously contacted)          |       | yes            |
| previous      | Feature | Integer     |                    | Number of contacts performed before this campaign and for this client                                                                                      |       | no             |
| poutcome      | Feature | Categorical |                    | Outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')                                                                |       | yes            |
| deposit           | Target  | Binary      |                    | Has the client subscribed a term deposit?                                                                                                                 |       |                |



## Import Libraries and Data

In [19]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('bank.csv')

**Goal**: 
- Find the best strategies to improve for the next marketing campaign.
- How can the financial institution have a greater effectiveness for future marketing campaigns?

**Context**
In order to answer this, we have to analyze the last marketing campaign the bank performed and identify the patterns that will help us find conclusions in order to develop future strategies.
The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed. 

**Questions**:
1. Is gender relevant?
2. Is education relevant?
3. Is status and stability relavant?

### Exploratory Data Analysis (EDA)
Conduct a thorough EDA to find patterns and correlations between features and the target variable (y - term deposit subscription).

Analyze demographics:
- Age: Identify age groups with the highest conversion rates.
- Job, marital, education: Segment by profession, education level, and marital status to see if certain groups are more likely to subscribe.
- Contact information:
- Day of week, month, duration: Explore if specific days or months have higher conversion rates. Is there an ideal time to call clients?
- Campaign performance:
- Number of contacts (campaign): How does the number of contacts affect the likelihood of conversion?
- Previous contacts (pdays, previous): Does recontacting a client or a previous campaign's outcome affect the likelihood of a positive response?

In [20]:
df.dtypes

age           int64
job          object
marital      object
education    object
default      object
balance       int64
housing      object
loan         object
contact      object
day           int64
month        object
duration      int64
campaign      int64
pdays         int64
previous      int64
poutcome     object
deposit      object
dtype: object

In [21]:
df.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit
0,59,admin.,married,secondary,no,2343,yes,no,unknown,5,may,1042,1,-1,0,unknown,yes
1,56,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes
2,41,technician,married,secondary,no,1270,yes,no,unknown,5,may,1389,1,-1,0,unknown,yes
3,55,services,married,secondary,no,2476,yes,no,unknown,5,may,579,1,-1,0,unknown,yes
4,54,admin.,married,tertiary,no,184,no,no,unknown,5,may,673,2,-1,0,unknown,yes


In [22]:
df.isnull().sum()

age          0
job          0
marital      0
education    0
default      0
balance      0
housing      0
loan         0
contact      0
day          0
month        0
duration     0
campaign     0
pdays        0
previous     0
poutcome     0
deposit      0
dtype: int64

### Understand Key Metrics for Effectiveness 
Before diving into the analysis, define what "effectiveness" means in this context. For a marketing campaign, you likely want to improve:
- Conversion Rate: Percentage of clients who subscribed (y = 'yes').
- Client Retention: How many existing clients remained engaged or converted.
- Number of Contacts: How many calls were necessary to get a conversion.
- Response to Contact Methods: Analyze which contact methods were more successful (e.g., cellular vs. telephone).

### Segment Analysis
Segment your clients to better understand where the most effective conversions are happening:
- Customer segments based on demographic factors (age, job, marital status, education).
- Engagement patterns based on previous interaction (e.g., number of contacts, days since last contact).
- Contact method preferences: cellular vs. telephone.

### Conversion and Campaign Efficiency
Use the target variable y to investigate:
- What patterns exist for those who subscribed?
- Which demographic segments and contact strategies resulted in higher success rates?

### Feature Importance and Predictive Modeling
Apply classification techniques (e.g., logistic regression, decision trees, random forest) to understand which features are most predictive of a subscription:
- Feature importance: Identify the top features that predict whether a client will subscribe. This could be client demographics, number of contacts, or the outcome of previous campaigns.
- Churn/No response analysis: Analyze the features that are related to a 'no' response to help target less responsive clients more efficiently.

### Optimize Contact Strategy
- Duration of contact: How long should calls last? Is there an optimal call length for success?
- Number of contacts: Determine the ideal number of contacts that maximizes the likelihood of conversion without diminishing returns.
- Timing: Analyze if contacting clients at specific times, days, or months leads to better conversion rates.

### Key Insights and Recommendations
After analyzing the data:
- Target high-potential groups: Identify and focus on client segments more likely to subscribe.
- Optimize the number of contacts: Balance between too many contacts and too few, aiming for maximum effectiveness.
- Improve messaging strategies: Personalize campaigns based on client characteristics and past interactions.
- Test new approaches: Based on your findings, propose changes for the next campaign and run A/B tests to validate them.