## The purpose of this kernel is to do visual data analysis in banking:

1.  Visualize a banking dataset with MatplotLib, Seaborn and Plotly libraries.
2.  Visually analyze for single features.
3.  Visually analyze for the feature interaction.
4.  Interactive Visual Analysis.


Import the libraries necessary to use in this lab. 
We can set a default figure size for further plots. Ignore the warnings.

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
plt.rcParams["figure.figsize"] = (8, 6)

import warnings
warnings.filterwarnings('ignore')

import plotly
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot
init_notebook_mode(connected=True)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Further specify the value of the `precision` parameter equal to 2 to display two decimal signs (instead of 6 as default).

In [3]:
pd.set_option("precision", 2)
pd.options.display.float_format = '{:.2f}'.format

## Load the Dataset


In this section you will load the source dataset.

In [4]:
train = pd.read_csv("/kaggle/input/banking-dataset-marketing-targets/train.csv", sep = ';')
test = pd.read_csv("/kaggle/input/banking-dataset-marketing-targets/test.csv", sep = ';')

## Attribute Information


In [5]:
train.head()

In [6]:
train.shape , test.shape

In [7]:
train.isna().sum() , test.isna().sum()

Use the map function for replacement the values ​​in column 

In [8]:
d = {"no": 0, "yes": 1}
train["y"] = train["y"].map(d)


In [9]:
train.columns

## Visual data analysis for single feature:

For each feature, you can build a separate histogram with hist function:

Histogram and a box plot for the analysis of numerical features,

In [10]:
train["age"].hist()

The histogram shows that most of our clients are between the ages of 25 and 50.

In [26]:
sns.boxplot(train["duration"])

Use the CountPlot for analyzing categorical features.

In [27]:
sns.countplot(train["y"]);

In [29]:
sns.countplot(train["marital"])

## Visual analysis for the feature interaction

Let's build a graph of the average client age depending on the marital status

In [11]:
train[["age", "marital"]].groupby("marital").mean().plot();

The plot shows that the average age of unmarried clients is significantly lower than that of the other clients.

Let's change the plot type, for example, to a bar chart. 

In [12]:
train[["age", "marital"]].groupby(
    "marital"
).mean().plot(kind="bar", rot=45);

pair plot graphics (Scatter Plot Matrix). This visualization will help us to look at one picture as at interconnection of various features.

In [13]:
sns.pairplot(train[["age", "duration", "campaign"]]);

This visualization identify an inverse relationship between a campaign and duration.

seaborn help you to build a distribution, for example, a distribution of the client age. 
By default, the graph shows a histogram and Kernel Density Estimation.

In [14]:
sns.distplot(train.age);

Use joint_plot to look for the relationship between two numerical features,it's a hybrid Scatter Plot and Histogram 

In [15]:
sns.jointplot(x="age", y="duration", data=train, kind="scatter")

Use Box Plot to compare the age of customers for the top 5 of the most common employment forms.

In [16]:
top_jobs = (train.job.value_counts().sort_values(ascending=False).head(5).index.values)
sns.boxplot(y="job", x="age", data=train[train.job.isin(top_jobs)], orient="h")

The plot shows that among the top-5 client categories, the most senior customers represent the management, and the largest number of outliers is among the admin. and technician.

A Heat Map allows you to look at the distribution of some numerical feature in two categories. We visualize the distribution of clients on family status and the type of employment.

In [17]:
job_marital_y = (train.pivot_table(index="job", columns="marital", values="y", aggfunc=sum))
sns.heatmap(job_marital_y, annot=True, fmt="d", linewidths=0.5);

The plot shows that the largest number of attracted clients among administrative workers is married (681), and there is the smallest number of attracted clients among customers with an unknown family status.

## Interactive Visual Analysis

Let's build Line Plot with the distribution of the total number and the number of attracted clients by age.

In [20]:
age_df = (train.groupby("age")[["y"]].sum().join(train.groupby("age")[["y"]].count(), rsuffix='_count'))
age_df.columns = ["Attracted", "Total Number"]

trace0 = go.Scatter(x=age_df.index, y=age_df["Attracted"], name="Attracted")
trace1 = go.Scatter(x=age_df.index, y=age_df["Total Number"], name="Total Number")

data = [trace0, trace1]
layout = {"title": "Statistics by client age"}

fig = go.Figure(data=data, layout=layout)

iplot(fig, show_link=False)

Build Bar Chart to see the distribution of customers by months.

In [22]:
month_index = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]
month_df = (train.groupby("month")[["y"]].sum().join(train.groupby("month")[["y"]].count(), rsuffix='_count')).reindex(month_index)
month_df.columns = ["Attracted", "Total Number"]

trace0 = go.Bar(x=month_df.index, y=month_df["Attracted"], name="Attracted")
trace1 = go.Bar(x=month_df.index, y=month_df["Total Number"], name="Total Number")

data = [trace0, trace1]
layout = {"title": "Share of months"}

fig = go.Figure(data=data, layout=layout)

iplot(fig, show_link=False)

Box plot. Consider the differences in the client age depending on the family status.

In [24]:
data = []

for status in train.marital.unique():
    data.append(go.Box(y=train[train.marital == status].age, name=status))
iplot(data, show_link=False)

Thanks for reading!