# Analyzing Bank Marketing Data
### This documentation outlines the steps to analyze a bank marketing dataset, focusing on key metrics such as subscription rates, job type distribution, call durations, and monthly call proportions. Each step includes an explanation and the corresponding Python code.

# Importing necessary libraries for data analysis and visualization

### import pandas as pd   For data manipulation and analysis
### import numpy as np   For numerical operations on arrays
### import matplotlib.pyplot as plt   For creating static, animated, and interactive plots
### import seaborn as sn   For statistical data visualization built on top of Matplotlib


In [2]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sn

### Loading data into a pandas DataFrame (example using a CSV file)
df = pd.read_csv('data.csv')

In [3]:
df = pd.read_csv(r"C:\Users\sirmu\OneDrive\Desktop\my notebook\bank statement.csv")

##### df represents the data in a DataFrame structure, allowing you to efficiently manipulate and analyze large datasets. You can access, filter, group, sort, and save the data as needed.

In [4]:
df

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,56,housemaid,married,Lower Basic,no,no,no,telephone,may,Monday,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
1,57,services,married,High School,unknown,no,no,telephone,may,Monday,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
2,37,services,married,High School,no,yes,no,telephone,may,Monday,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
3,40,admin.,married,Intermediate Basic,no,no,no,telephone,may,Monday,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
4,56,services,married,High School,no,no,yes,telephone,may,Monday,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41183,73,retired,married,professional.course,no,yes,no,cellular,nov,Friday,...,1,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6,yes
41184,46,blue-collar,married,professional.course,no,no,no,cellular,nov,Friday,...,1,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6,no
41185,56,retired,married,university.degree,no,yes,no,cellular,nov,Friday,...,2,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6,no
41186,44,technician,married,professional.course,no,no,no,cellular,nov,Friday,...,1,999,0,nonexistent,-1.1,94.767,-50.8,1.028,4963.6,yes


# 1. Proportion of People Who Subscribed to a Term Deposit
### The goal is to calculate the proportion of people who subscribed to a term deposit (y). Using the value_counts(normalize=True) method, we normalize the counts to display proportions instead of raw numbers. This helps in understanding the relative frequency of each category (e.g., "yes" or "no").

### 1. Proportion of people who subscribed to a term deposit


In [13]:
y_proportion = df["y"].value_counts(normalize=True)

In [14]:
y_proportion

y
no     0.887346
yes    0.112654
Name: proportion, dtype: float64

##### Interpretation:
##### Proportion of 'no': The proportion of observations labeled as "no" is 0.887346, meaning approximately 88.7% of the data falls under this category.
##### Proportion of 'yes': The proportion of observations labeled as "yes" is 0.112654, meaning approximately 11.3% of the data falls under this category.

# 2. Distribution of Job Types

### To determine the distribution of job types, the value_counts() method is applied to the job column. This provides a count of occurrences for each unique job type, giving insights into the composition of customers' occupations.

In [7]:
job_distribution = df["job"].value_counts()

In [8]:
job_distribution

job
admin.           10422
blue-collar       9254
technician        6743
services          3969
management        2924
retired           1720
entrepreneur      1456
self-employed     1421
housemaid         1060
unemployed        1014
student            875
unknown            330
Name: count, dtype: int64

##### This dataset provides the job distribution across various categories. The 'admin.' category has the highest count (10,422 individuals), followed by 'blue-collar' jobs (9,254 individuals). The 'unknown' category has the fewest individuals (330). This information could be useful in understanding the composition of a workforce, analyzing job trends, or balancing job categories in a survey or study.

## 3. Average Duration of Calls
### The average call duration is computed using the mean() method on the duration column. This metric helps in understanding how long, on average, the calls last.


In [9]:
average_duration = df["duration"].mean()

In [10]:
average_duration

np.float64(258.2850101971448)

##### This value, 258.2850101971448, is a floating-point number stored as NumPy float64. It can be used in mathematical computations where double precision is necessary. The float64 type ensures that the value is stored with high precision, making it suitable for tasks that require accuracy in numerical results.

## 4. Proportion of Calls Made Each Month

### To find the proportion of calls made in each month, the value_counts(normalize=True) method is applied to the month column. This normalizes the data to show the relative frequency of calls for each month, aiding in identifying seasonal trends.


In [11]:
month_proportion = df["month"].value_counts(normalize=True)

In [12]:
month_proportion

month
may    0.334296
jul    0.174177
aug    0.149995
jun    0.129115
nov    0.099568
apr    0.063902
oct    0.017432
sep    0.013839
mar    0.013256
dec    0.004419
Name: proportion, dtype: float64

##### The dataset provides the proportions of occurrences for each month. May has the highest proportion at 33.43%, while December has the lowest proportion at 0.44%. This distribution can be useful for analyzing monthly trends, seasonal patterns, or event frequency across different months.