# Interactive Data Dashboards Tutorial👨🏼‍🏫

The goal of this repository is to explain how ipywidgets widgets can be linked to matplotlib charts to make them interactive. When doing exploratory data analysis, its quite common to explore data from various perspectives to understand it better. The [data](https://data.gov.ro/dataset/date-climatologice-de-la-cele-23-de-statii-esentiale-pentru-anul-2016) used in this tutorial were obtain from kaggle dataset.

✏️**About Dataset.**

The insurance.csv file includes 1,338 examples of beneficiaries currently enrolled in the insurance plan, with features indicating characteristics of the patient as well as the total medical expenses charged to the plan for the calendar year. The features are:

- age: This is an integer indicating the age of the primary beneficiary (excluding those above 64 years, since they are generally covered by the government).
- sex: This is the policy holder's gender, either male or female.
- bmi: This is the body mass index (BMI), which provides a sense of how over or under-weight a person is relative to their height. BMI is equal to weight (in kilograms) divided by height (in meters) squared. An ideal BMI is within the range of 18.5 to 24.9.
- children: This is an integer indicating the number of children / dependents covered by the insurance plan.
- smoker: This is yes or no depending on whether the insured regularly smokes tobacco.
- region: This is the beneficiary's place of residence in the U.S., divided into four geographic regions: northeast, southeast, southwest, or northwest.
- charges: Individual medical costs billed by health insurance 

We will investigate these variables combining matplotlib charts with ipywidgets widgets to generate interactive charts that update whenever the widget values change.

✏️**Import the libraries.**

In [123]:
import numpy as np # linear algebra
import pandas as pd # data manipulation and analysis
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization
sns.set_style('whitegrid') # set style for visualization
from ipywidgets import*#provides a list of widgets quite common in web apps and dashboards
import warnings
warnings.filterwarnings('ignore')#disable the display of all warning messages

✏️**Reading data**

In [124]:
db=pd.read_csv('insurance.csv')

In [125]:
db

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.900,0,yes,southwest,16884.92400
1,18,male,33.770,1,no,southeast,1725.55230
2,28,male,33.000,3,no,southeast,4449.46200
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.880,0,no,northwest,3866.85520
...,...,...,...,...,...,...,...
1333,50,male,30.970,3,no,northwest,10600.54830
1334,18,female,31.920,0,no,northeast,2205.98080
1335,18,female,36.850,0,no,southeast,1629.83350
1336,21,female,25.800,0,no,southwest,2007.94500


In [126]:
print('The size of the table:',db.shape)
print('The total number of cells in the table:',db.size)

The size of the table: (1338, 7)
The total number of cells in the table: 9366


✏️**Cleaning Data.**

In [127]:
db.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


- We ensure that the data types for each column are correct.

We notice that the numeric data uses the foat64 or int64 format. Keeping in mind that we do not have such high precision, we will change the data type to float32 format, respectively int32 format.

In [128]:
db=db.astype({'age': 'int32', 'bmi': 'float32', 'children': 'float32', 'charges': 'float32'})
db.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int32  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float32
 3   children  1338 non-null   float32
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float32
dtypes: float32(3), int32(1), object(3)
memory usage: 52.4+ KB


- Strip leading and trailing spaces for string data.

In [129]:
db['sex'] = db['sex'].str.strip()
db['smoker'] = db['smoker'].str.strip()
db['region'] = db['region'].str.strip()

- Lets check unique values in nominal data columns. We should see clear categories.

In [130]:
db['sex'].unique()

array(['female', 'male'], dtype=object)

In [131]:
db['smoker'].unique()

array(['yes', 'no'], dtype=object)

In [132]:
db['region'].unique()

array(['southwest', 'southeast', 'northwest', 'northeast'], dtype=object)

- Check for duplicates.

In [133]:
db.loc[db.duplicated(keep=False), :]

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
195,19,male,30.59,0.0,no,northwest,1639.56311
581,19,male,30.59,0.0,no,northwest,1639.56311


In [134]:
db.drop_duplicates(inplace=True)#remove duplicates from original dataset
db.loc[db.duplicated(keep=False), :]

Unnamed: 0,age,sex,bmi,children,smoker,region,charges


✏️**Statistical Aspects.**

- A descriptive statistic of numerical data.

In [135]:
db.describe()

Unnamed: 0,age,bmi,children,charges
count,1337.0,1337.0,1337.0,1337.0
mean,39.222139,30.663443,1.095737,13279.123047
std,14.044333,6.100469,1.205575,12110.362305
min,18.0,15.96,0.0,1121.873901
25%,27.0,26.290001,0.0,4746.344238
50%,39.0,30.4,1.0,9386.161133
75%,51.0,34.700001,2.0,16657.716797
max,64.0,53.130001,5.0,63770.429688


- A descriptive statistics of the categorical data.

In [136]:
db.select_dtypes(include=['object']).describe()

Unnamed: 0,sex,smoker,region
count,1337,1337,1337
unique,2,2,4
top,male,no,southeast
freq,675,1063,364


- Minimum, maximum and mean charge by gender.

In [137]:
data = db.groupby('sex').aggregate({'sex':'count','charges':['min','max','mean']})
data

Unnamed: 0_level_0,sex,charges,charges,charges
Unnamed: 0_level_1,count,min,max,mean
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
female,662,1607.510132,63770.429688,12569.579102
male,675,1121.873901,62592.875,13974.999023


- Minimum, maximum and mean charge by smoking status.

In [138]:
data = db.groupby('smoker').aggregate({'smoker':'count','charges':['min','max','mean']})
data

Unnamed: 0_level_0,smoker,charges,charges,charges
Unnamed: 0_level_1,count,min,max,mean
smoker,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
no,1063,1121.873901,36910.609375,8440.660156
yes,274,12829.455078,63770.429688,32050.232422


- Minimum, maximum and mean charge by region.

In [139]:
data = db.groupby('region').aggregate({'region':'count','charges':['min','max','mean']})
data

Unnamed: 0_level_0,region,charges,charges,charges
Unnamed: 0_level_1,count,min,max,mean
region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
northeast,324,1694.796387,58571.074219,13406.384766
northwest,324,1621.34021,60021.398438,12450.84082
southeast,364,1121.873901,63770.429688,14735.411133
southwest,325,1241.564941,52590.828125,12346.9375


- Count smokers by region.

In [140]:
data = db.groupby('region').aggregate({'smoker':'count'})
data

Unnamed: 0_level_0,smoker
region,Unnamed: 1_level_1
northeast,324
northwest,324
southeast,364
southwest,325


✏️**Data overview.**

- Data separation.

In [141]:
smoker_male=db[(db.sex == 'male')&(db.smoker == 'yes')].reset_index()#smoking men
no_smoker_male=db[(db.sex == 'male')&(db.smoker == 'no')].reset_index()#non-smoking men
smoker_female=db[(db.sex == 'female')&(db.smoker == 'yes')].reset_index()#smoking female
no_smoker_female=db[(db.sex == 'female')&(db.smoker == 'no')].reset_index()#non-smoking female

In [142]:
smoker_male

Unnamed: 0,index,age,sex,bmi,children,smoker,region,charges
0,14,27,male,42.130001,0.0,yes,southeast,39611.757812
1,19,30,male,35.299999,0.0,yes,southwest,36837.468750
2,29,31,male,36.299999,2.0,yes,southwest,38711.000000
3,30,22,male,35.599998,0.0,yes,southwest,35585.574219
4,34,28,male,36.400002,1.0,yes,southwest,51194.558594
...,...,...,...,...,...,...,...,...
154,1301,62,male,30.875000,3.0,yes,northwest,46718.164062
155,1303,43,male,27.799999,0.0,yes,southwest,37829.722656
156,1304,42,male,24.605000,2.0,yes,northeast,21259.378906
157,1307,32,male,28.120001,4.0,yes,northwest,21472.478516


In [143]:
no_smoker_male

Unnamed: 0,index,age,sex,bmi,children,smoker,region,charges
0,1,18,male,33.770000,1.0,no,southeast,1725.552246
1,2,28,male,33.000000,3.0,no,southeast,4449.461914
2,3,33,male,22.705000,0.0,no,northwest,21984.470703
3,4,32,male,28.879999,0.0,no,northwest,3866.855225
4,8,37,male,29.830000,2.0,no,northeast,6406.410645
...,...,...,...,...,...,...,...,...
511,1324,31,male,25.934999,1.0,no,northwest,4239.892578
512,1325,61,male,33.535000,0.0,no,northeast,13143.336914
513,1327,51,male,30.030001,1.0,no,southeast,9377.904297
514,1329,52,male,38.599998,2.0,no,southwest,10325.206055


In [144]:
smoker_female

Unnamed: 0,index,age,sex,bmi,children,smoker,region,charges
0,0,19,female,27.900000,0.0,yes,southwest,16884.923828
1,11,62,female,26.290001,0.0,yes,southeast,27808.724609
2,23,34,female,31.920000,1.0,yes,northeast,37701.875000
3,58,53,female,22.879999,1.0,yes,southeast,23244.791016
4,64,20,female,22.420000,0.0,yes,northwest,14711.744141
...,...,...,...,...,...,...,...,...
110,1308,25,female,30.200001,0.0,yes,southwest,33900.652344
111,1313,19,female,34.700001,2.0,yes,southwest,36397.574219
112,1314,30,female,23.655001,3.0,yes,northwest,18765.875000
113,1323,42,female,40.369999,2.0,yes,southeast,43896.375000


In [145]:
no_smoker_female

Unnamed: 0,index,age,sex,bmi,children,smoker,region,charges
0,5,31,female,25.740000,0.0,no,southeast,3756.621582
1,6,46,female,33.439999,1.0,no,southeast,8240.589844
2,7,37,female,27.740000,3.0,no,northwest,7281.505371
3,9,60,female,25.840000,0.0,no,northwest,28923.136719
4,13,56,female,39.820000,0.0,no,southeast,11090.717773
...,...,...,...,...,...,...,...,...
542,1331,23,female,33.400002,0.0,no,southwest,10795.937500
543,1332,52,female,44.700001,3.0,no,southwest,11411.684570
544,1334,18,female,31.920000,0.0,no,northeast,2205.980713
545,1335,18,female,36.849998,0.0,no,southeast,1629.833496


- Interactive Dashboard.

We create an interactive dashboard for each table in order to understand the relationship between the dataset variables.

In [146]:
#Creation of the interactive dashboard for the entire database
from ipywidgets import interact
@interact
def create_plot(variable = db.drop(['charges','bmi'], axis =1).columns):
    sns.set( rc = {'figure.figsize' : ( 12, 8 ), 
               'axes.labelsize' : 14 })
    chart=sns.barplot(data = db, x = variable, y ='charges')
    chart.set_title(f'Mean Bar Plot of the medical costs grouped by the {variable}',fontsize=14)

interactive(children=(Dropdown(description='variable', options=('age', 'sex', 'children', 'smoker', 'region'),…

In [147]:
#Creation of the interactive dashboard for the 'smoker_male' table
from ipywidgets import interact
@interact
def create_plot(variable = smoker_male.drop(['charges', 'bmi','index','smoker','sex'], axis =1).columns):
    sns.set( rc = {'figure.figsize' : ( 12, 8 ), 
               'axes.labelsize' : 14 })
    chart=sns.barplot(data = smoker_male, x = variable, y ='charges')
    chart.set_title(f'Mean Bar Plot of the medical costs grouped by the {variable}',fontsize=14)

interactive(children=(Dropdown(description='variable', options=('age', 'children', 'region'), value='age'), Ou…

In [148]:
#Creation of the interactive dashboard for the 'no_smoker_male' table
from ipywidgets import interact
@interact
def create_plot(variable = no_smoker_male.drop(['charges', 'bmi','index','smoker','sex'], axis =1).columns):
    sns.set( rc = {'figure.figsize' : ( 12, 8 ), 
               'axes.labelsize' : 14 })
    chart=sns.barplot(data = no_smoker_male, x = variable, y ='charges')
    chart.set_title(f'Mean Bar Plot of the medical costs grouped by the {variable}',fontsize=14)

interactive(children=(Dropdown(description='variable', options=('age', 'children', 'region'), value='age'), Ou…

In [149]:
#Creation of the interactive dashboard for the 'smoker_female' table
from ipywidgets import interact
@interact
def create_plot(variable = smoker_female.drop(['charges', 'bmi','index','smoker','sex'], axis =1).columns):
    sns.set( rc = {'figure.figsize' : ( 12, 8 ), 
               'axes.labelsize' : 14 })
    chart=sns.barplot(data = smoker_female, x = variable, y ='charges')
    chart.set_title(f'Mean Bar Plot of the medical costs grouped by the {variable}',fontsize=14)

interactive(children=(Dropdown(description='variable', options=('age', 'children', 'region'), value='age'), Ou…

In [150]:
#Creation of the interactive dashboard for the 'smoker_female' table
from ipywidgets import interact
@interact
def create_plot(variable = no_smoker_female.drop(['charges', 'bmi','index','smoker','sex'], axis =1).columns):
    sns.set( rc = {'figure.figsize' : ( 12, 8 ), 
               'axes.labelsize' : 14 })
    chart=sns.barplot(data = no_smoker_female, x = variable, y ='charges')
    chart.set_title(f'Mean Bar Plot of the medical costs grouped by the {variable}',fontsize=14)

interactive(children=(Dropdown(description='variable', options=('age', 'children', 'region'), value='age'), Ou…

We create a method called *create_scatter()* with two parameters that will be called for different variables from the database generating a scatter plot.

In [151]:
#Creation of the interactive dashboard (a scatter plot) for the db table
db_scatter= db[['bmi', 'age', 'charges']]
def create_scatter(variable1, variable2):
    with plt.style.context("ggplot"):
        fig = plt.figure(figsize=(8,4))

        plt.scatter(x =db_scatter[variable1],
                    y =db_scatter[variable2],
                    s=5
                   )
        plt.xticks(fontsize=15)
        plt.yticks(fontsize=15)
        plt.xlabel(variable1.capitalize())
        plt.ylabel(variable2.capitalize())

        plt.title("%s vs %s"%(variable1.capitalize(), variable2.capitalize()))


In [152]:
interact(create_scatter, variable1=db_scatter.columns, variable2=db_scatter.columns)


interactive(children=(Dropdown(description='variable1', options=('bmi', 'age', 'charges'), value='bmi'), Dropd…

<function __main__.create_scatter(variable1, variable2)>

Normal distribution of variables.

In [153]:
db_new= db[['bmi','charges']]
import warnings
warnings.filterwarnings('ignore')
def create_distribution(variable):
    with plt.style.context("ggplot"):
        fig = plt.figure(figsize=(8,4))
        sns.distplot(db_new[variable],kde=True)
        plt.title('Normal distribution of  %s'%(variable.capitalize()))
        

In [154]:
interact(create_distribution, variable=db_new.columns)


interactive(children=(Dropdown(description='variable', options=('bmi', 'charges'), value='bmi'), Output()), _d…

<function __main__.create_distribution(variable)>

Average medical expenses for 18-year-old smokers vs. nonsmokers.

In [155]:
def medical_expenses(gender):
    with plt.style.context("ggplot"):
        fig = plt.figure(figsize=(10,4))
        sns.boxplot(y="smoker", x="charges", data = db[(db.age==18)&(db.sex == gender)])
        plt.title('Average medical expenses for 18-year-old %s smokers vs. nonsmokers'%gender)

In [156]:
interact(medical_expenses, gender=['male','female'])

interactive(children=(Dropdown(description='gender', options=('male', 'female'), value='male'), Output()), _do…

<function __main__.medical_expenses(gender)>

Insurnce costs by region boxplot.

In [157]:
def medical_expenses(region):
    with plt.style.context("ggplot"):
        fig = plt.figure(figsize=(8,4))
        sns.boxplot(y="charges", x="region", data = db[db.region == region])
        plt.title('Insurnce costs in the %s region'%region)

In [158]:
interact(medical_expenses, region=['southeast', 'northeast','southwest','northwest'])

interactive(children=(Dropdown(description='region', options=('southeast', 'northeast', 'southwest', 'northwes…

<function __main__.medical_expenses(region)>