# note 1:
make sure you are also using a predictive approach.

A predictive finding might include:

How well your model is able to predict the target
What features are most important to your model
A predictive recommendation might include:

The contexts/situations where the predictions made by your model would and would not be useful for your stakeholder and business problem
Suggestions for how the business might modify certain input variables to achieve certain target results

# note 2
you must build multiple models. Begin with a basic model, evaluate it, and then provide justification for and proceed to a new model. After you finish refining your models, you should provide 1-3 paragraphs in the notebook discussing your final model.

# note 3
With the additional techniques you have learned in Phase 3, be sure to explore:

Model features and preprocessing approaches
Different kinds of models (logistic regression, decision trees, etc.)
Different model hyperparameters

# note 4
At minimum you must build two models:

A simple, interpretable baseline model (logistic regression or single decision tree)
A version of the simple model with tuned hyperparameters


## Business Problem
### A bank has provided data from their marketing campaign aimed at encouraging customers to opt into their insurance coverage. The goal of this project is to analyze the data and present findings to a non-technical team, enabling them to make data-driven decisions to improve the effectiveness of their insurance marketing strategies.


# 1. Business Understanding
The bank is running a marketing campaign to promote its insurance products. The primary objective is to encourage existing and potential customers to enroll in these insurance plans. Understanding customer behavior and identifying the factors that influence their decision to opt into insurance coverage are critical to the success of the campaign.

The main goal of the campaign is to maximize the number of customers who purchase the bank's insurance products.By analyzing customer data, the bank aims to identify key segments of the population that are more likely to respond positively to the campaign.

The project will provide clear recommendations that the bank's non-technical team can use to improve their insurance marketing campaign, leading to more customer enrollments and better resource management.

# 2. Data Understanding

# 

In [3]:
# Import the necessary libraries
import pandas as pd

In [4]:
# Load the dataset into a dataframe
df = pd.read_csv('dataset.csv')
df

Unnamed: 0,occupation,age,education_level,marital_status,communication_channel,call_month,call_day,call_duration,call_frequency,previous_campaign_outcome,conversion_status
0,administrative_staff,28,high_school,married,unidentified,September,9,1,1,successful,not_converted
1,administrative_staff,58,unidentified,married,unidentified,June,5,307,2,unidentified,not_converted
2,jobless,40,high_school,divorced,mobile,February,4,113,1,unidentified,not_converted
3,retired_worker,63,high_school,married,mobile,April,7,72,5,unidentified,not_converted
4,business_owner,43,college,married,landline,July,29,184,4,unidentified,not_converted
...,...,...,...,...,...,...,...,...,...,...,...
45206,administrative_staff,50,high_school,divorced,mobile,May,6,114,1,unsuccessful,not_converted
45207,independent_worker,49,college,married,unidentified,May,13,98,1,unidentified,not_converted
45208,executive,30,college,married,mobile,June,12,175,2,other_outcome,not_converted
45209,retired_worker,59,elementary_school,married,landline,July,15,41,5,unidentified,not_converted


In [5]:
# Print the column names in the dataframe
df.columns

Index(['occupation', 'age', 'education_level', 'marital_status',
       'communication_channel', 'call_month', 'call_day', 'call_duration',
       'call_frequency', 'previous_campaign_outcome', 'conversion_status'],
      dtype='object')

In [6]:
# Print the dimensions of the dataframe
df.shape

(45211, 11)

In [9]:
# Brief summary of the numeric columns in the dataframe
df.describe()

Unnamed: 0,age,call_day,call_duration,call_frequency
count,45211.0,45211.0,45211.0,45211.0
mean,40.93621,15.806419,258.16308,2.763841
std,10.618762,8.322476,257.527812,3.098021
min,18.0,1.0,0.0,1.0
25%,33.0,8.0,103.0,1.0
50%,39.0,16.0,180.0,2.0
75%,48.0,21.0,319.0,3.0
max,95.0,31.0,4918.0,63.0
