__Bokeh Introduction__

Bokeh is an interactive visualization library for modern web browser presentations. Its goal is to provide elegant, concise construction of versatile visualization, and to extend this capability with high-performance interactivity over very large or streaming datasets (https://bokeh.org/). 

For this DEMO we will use the Karnataka State(India) Education dataset (data.gov.in)

__EDA Bokeh - Karnataka Education dataset__

The dataset includes data about:
- Education Infrastructure;
- Education Awareness;
- Demographic features.

__Goal__

The goal of this notebook is to:
- Explain the data;
- Identify features that can be used for modeling;
- Analyze the dataset using the following methodology:
    1. Data fetch;
    2. Data cleansing (data preparation);
    3. Exploratory data analysis (EDA);
    4. Summary and Data Visualization.

__Import Libraries__

First install pandas and bokeh.

In [273]:
!pip install pandas-bokeh



In [274]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
# Import Bokeh Library for output
from bokeh.io import output_notebook
output_notebook()
from bokeh.models import ColumnDataSource
from bokeh.models import HoverTool
from bokeh.models import LinearInterpolator,CategoricalColorMapper
from bokeh.io import show
from bokeh.plotting import figure
from bokeh.palettes import Spectral8
from bokeh.models import Legend

__Data Fetching__

In [275]:
data = pd.read_csv('Town-wise-education - Karnataka.csv.xls')

__Exploratory Data Analysis__

Let's take a glance of the data. We observe  the first and last few rows.

In [276]:
data.head()

Unnamed: 0,Table Name,State Code,District Code,Town Code,Total/ Rural/ Urban,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
0,C2308,29,1,40117000,Urban,Belgaum (M Corp.),All ages,399653,204598,195055,...,362,7143,5210,1933,41152,26488,14664,3,2,1
1,C2308,29,1,40117000,Urban,Belgaum (M Corp.),0-6,47642,24768,22874,...,0,0,0,0,0,0,0,0,0,0
2,C2308,29,1,40117000,Urban,Belgaum (M Corp.),7,6759,3495,3264,...,0,0,0,0,0,0,0,0,0,0
3,C2308,29,1,40117000,Urban,Belgaum (M Corp.),8,8067,4152,3915,...,0,0,0,0,0,0,0,0,0,0
4,C2308,29,1,40117000,Urban,Belgaum (M Corp.),9,6948,3559,3389,...,0,0,0,0,0,0,0,0,0,0


In [277]:
data.tail()

Unnamed: 0,Table Name,State Code,District Code,Town Code,Total/ Rural/ Urban,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
807,C2308,29,26,42604000,Urban,Mysore (M Corp.),65-69,13904,6773,7131,...,9,289,228,61,1898,1585,313,0,0,0
808,C2308,29,26,42604000,Urban,Mysore (M Corp.),70-74,10754,5501,5253,...,3,149,118,31,1177,1012,165,0,0,0
809,C2308,29,26,42604000,Urban,Mysore (M Corp.),75-79,5359,2728,2631,...,2,73,62,11,574,503,71,0,0,0
810,C2308,29,26,42604000,Urban,Mysore (M Corp.),80+,6236,2848,3388,...,2,57,42,15,410,344,66,0,0,0
811,C2308,29,26,42604000,Urban,Mysore (M Corp.),Age not stated,539,267,272,...,1,12,4,8,44,28,16,0,0,0


Let's examine the shape of this dataset.

In [278]:
data.shape

(812, 46)

This means that we have 812 dimensions(rows) and 46 features (columns) in this dataset. 

Now let's explore the data types of the dataset.

In [279]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 812 entries, 0 to 811
Data columns (total 46 columns):
 #   Column                                                                                     Non-Null Count  Dtype 
---  ------                                                                                     --------------  ----- 
 0   Table Name                                                                                 812 non-null    object
 1   State Code                                                                                 812 non-null    int64 
 2   District Code                                                                              812 non-null    int64 
 3   Town Code                                                                                  812 non-null    int64 
 4   Total/ Rural/ Urban                                                                        812 non-null    object
 5   Area Name                                                

This shows that there are categorical (object) and numerical features (int64) in the dataset. 

Let's explore further...

Let's examine if there are any nulls in the dataset.

In [280]:
data.isnull().sum()

Table Name                                                                                   0
State Code                                                                                   0
District Code                                                                                0
Town Code                                                                                    0
Total/ Rural/ Urban                                                                          0
Area Name                                                                                    0
Age-Group                                                                                    0
Total - Persons                                                                              0
Total - Males                                                                                0
Total - Females                                                                              0
Illiterate - Persons                              

Let's also look at the entire statistical metrics (count, mean, standard deviation, etc.). 

In [281]:
data.describe(include = 'all')

Unnamed: 0,Table Name,State Code,District Code,Town Code,Total/ Rural/ Urban,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
count,812,812.0,812.0,812.0,812,812,812,812.0,812.0,812.0,...,812.0,812.0,812.0,812.0,812.0,812.0,812.0,812.0,812.0,812.0
unique,1,,,,1,28,29,,,,...,,,,,,,,,,
top,C2308,,,,Urban,Belgaum (M Corp.),All ages,,,,...,,,,,,,,,,
freq,812,,,,812,29,28,,,,...,,,,,,,,,,
mean,,29.0,15.035714,41509820.0,,,,27507.64,14222.53,13285.11,...,26.519704,482.08867,371.315271,110.773399,3043.32266,1905.359606,1137.963054,0.054187,0.03202,0.022167
std,,0.0,6.714906,671824.5,,,,164098.3,85432.87,78673.9,...,239.305779,3034.946141,2344.507666,693.711738,22771.068866,13770.026347,9031.48477,0.429515,0.279085,0.220975
min,,29.0,1.0,40117000.0,,,,50.0,23.0,27.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,,29.0,11.25,41127000.0,,,,2916.0,1466.75,1425.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,,29.0,16.5,41655000.0,,,,5657.5,2880.5,2710.0,...,0.0,12.0,9.0,2.0,7.5,4.5,2.0,0.0,0.0,0.0
75%,,29.0,20.0,42009250.0,,,,13917.5,7124.5,6696.0,...,4.0,199.75,153.5,48.25,1205.0,844.0,252.75,0.0,0.0,0.0


Now let's look in detail the categorical features. For this basically extract all the categorical features into a dataframe object.

In [282]:
categorical_features = data.select_dtypes(include=[object])
categorical_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 812 entries, 0 to 811
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Table Name           812 non-null    object
 1   Total/ Rural/ Urban  812 non-null    object
 2   Area Name            812 non-null    object
 3   Age-Group            812 non-null    object
dtypes: object(4)
memory usage: 25.5+ KB


Let's observe the unique categories for all the object varables.

In [283]:
for column_name in data.columns:
    if data[column_name].dtypes == 'object':
        data[column_name] = data[column_name].fillna(data[column_name].mode().iloc[0])
        unique_category = len(data[column_name].unique())
        print("Feature '{column_name}' has '{unique_category}' unique categories".format(column_name = column_name,
                                                                                         unique_category=unique_category))

Feature 'Table Name' has '1' unique categories
Feature 'Total/ Rural/ Urban' has '1' unique categories
Feature 'Area Name' has '28' unique categories
Feature 'Age-Group' has '29' unique categories


'Table Name' and 'Total/Rural/Urban' features are redundant. Therefore we can eliminate them.

Also from the dataset, we can also eliminate 'State Code' as we are dealing with only Karnataka.

From the above observations, we can conclude that we do not need to do any imputation as there are no missing values.

__Data Cleansing__

Now let's drop the columns mentioned above:

- 'Table Name';
- 'State Code';
- 'Total/Rural/Urban'.

In [284]:
data.drop('Table Name',axis =1,inplace = True)
data.drop('State Code',axis =1,inplace = True)
data.drop('Total/ Rural/ Urban',axis =1,inplace = True)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 812 entries, 0 to 811
Data columns (total 43 columns):
 #   Column                                                                                     Non-Null Count  Dtype 
---  ------                                                                                     --------------  ----- 
 0   District Code                                                                              812 non-null    int64 
 1   Town Code                                                                                  812 non-null    int64 
 2   Area Name                                                                                  812 non-null    object
 3   Age-Group                                                                                  812 non-null    object
 4   Total - Persons                                                                            812 non-null    int64 
 5   Total - Males                                            

Furthermore, let's get more insights into the data by observing the first few records say for a district & town.

In [285]:
data.head()

Unnamed: 0,District Code,Town Code,Area Name,Age-Group,Total - Persons,Total - Males,Total - Females,Illiterate - Persons,Illiterate - Males,Illiterate - Females,...,Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons,Educational Level - Technical Diploma or Certificate Not Equal to Degree Males,Educational Level - Technical Diploma or Certificate Not Equal to Degree Females,Educational Level - Graduate & Above Persons,Educational Level - Graduate & Above Males,Educational Level - Graduate & Above Females,Unclassified - Persons,Unclassified - Males,Unclassified - Females
0,1,40117000,Belgaum (M Corp.),All ages,399653,204598,195055,91358,36857,54501,...,362,7143,5210,1933,41152,26488,14664,3,2,1
1,1,40117000,Belgaum (M Corp.),0-6,47642,24768,22874,47642,24768,22874,...,0,0,0,0,0,0,0,0,0,0
2,1,40117000,Belgaum (M Corp.),7,6759,3495,3264,1375,662,713,...,0,0,0,0,0,0,0,0,0,0
3,1,40117000,Belgaum (M Corp.),8,8067,4152,3915,568,292,276,...,0,0,0,0,0,0,0,0,0,0
4,1,40117000,Belgaum (M Corp.),9,6948,3559,3389,275,137,138,...,0,0,0,0,0,0,0,0,0,0


__Few Data Observations__

General observations for each district:

- There are 29 unique records for 'Age Group' for each 'Town Code';
    - 'All Ages' category is a summation of all ages;
- The following categories are associated with Persons, Males and Females, where 'Persons' is the sum of males and females:
    - Illiterate;
    - Literate;
    - Educational Level - Literate without Educational Level;
    - Educational Level - Below Primary;
    - Educational Level - Primary;
    - Educational Level - Middle;
    - Educational Level - Matric/Secondary;
    - Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary;
    - Educational Level - Non-technical Diploma or Certificate Not Equal to Degree;
    - Educational Level - Technical Diploma or Certificate Not Equal to Degree;
    - Educational Level - Graduate & Above;
    - Unclassified.
- 'Total - Persons'
- 'Total - Male',
- 'Total - Female'
  
__Current Focus__
 
Moreover, we focus our attention on 'Primary Education'. So our analysis will be conducted only on the following:

- Illiterate
- Educational Level - Literate without Educational Level
- Educational Level - Below Primary
- Educational Level - Primary
- Unclassified

This means that the following features can be eliminated from the dataset because are not relevant in our EDA:

- Total - Persons
- Total - Males
- Total - Females
- Literate - Persons
- Literate - Males
- Literate - Females
- Educational Level - Middle Persons
- Educational Level - Middle Males
- Educational Level - Middle Females
- Educational Level - Matric/Secondary Persons
- Educational Level - Matric/Secondary Males
- Educational Level - Matric/Secondary Females
- Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Persons
- Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Males
- Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Females
- Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Persons
- Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Males
- Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females
- Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons
- Educational Level - Technical Diploma or Certificate Not Equal to Degree Males
- Educational Level - Technical Diploma or Certificate Not Equal to Degree Females
- Educational Level - Graduate & Above Persons
- Educational Level - Graduate & Above Males
- Educational Level - Graduate & Above Females

__Key Note__
Since our focus is on primary education, we can eliminate other age groups.

In [286]:
data = data[data['Age-Group'] != 'All ages']

Now let's look in detail at the features that need to be dropped as they are not relevant in our context.

In [287]:
columns = [ 'Total - Persons',
           'Total - Males',
           'Total - Females',
           'Literate - Persons',
           'Literate - Males',
           'Literate - Females',
           'Educational Level - Middle Persons',
           'Educational Level - Middle Males',
           'Educational Level - Middle Females',
           'Educational Level - Matric/Secondary Persons',
           'Educational Level - Matric/Secondary Males',
           'Educational Level - Matric/Secondary Females',
           'Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Persons',
           'Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Males',
           'Educational Level - Higher Secondary/Intermediate Pre-University/Senior Secondary Females',
           'Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Persons',
           'Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Males',
           'Educational Level - Non-technical Diploma or Certificate Not Equal to Degree Females',
           'Educational Level - Technical Diploma or Certificate Not Equal to Degree Persons',
           'Educational Level - Technical Diploma or Certificate Not Equal to Degree Males',
           'Educational Level - Technical Diploma or Certificate Not Equal to Degree Females',
           'Educational Level - Graduate & Above Persons',
           'Educational Level - Graduate & Above Males',
           'Educational Level - Graduate & Above Females']                                                                            

data.drop(columns,axis =1,inplace = True)

__Exploratory Data Analysis (EDA)__

In this section, we visualize data focused on 'Primary Education' as mentioned above. We use Bokeh interactive data visualization library to achieve this. The mouse hover-over function is enabled to see the data visualizations for each district. Also, the size of the circle shows the size of the feature.

The visualizations show district and age-wise data representations for each of the features. 

_Illeiterate:_

In [288]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Illiterate - Persons'],
    area = data['Area Name'],
    illerate = data['Illiterate - Persons'],
    illerate_male = data['Illiterate - Males'],
    illerate_female = data['Illiterate - Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Illiterate - Persons'].min(),data['Illiterate - Persons'].max()],
    y = [2,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,100000))

p = figure(title = 'Illiteracy District/Area Wise',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Illerate - Total Persons ','@illerate'),
                           ('Illerate - Total Males ','@illerate_male'),
                           ('Illerate - Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Illiterates',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         source = source)

show(p,notebook_handle=True)



_Educational Level - Literate without Educational Level:_

In [289]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Educational Level - Literate without Educational Level Persons'],
    area = data['Area Name'],
    illerate = data['Educational Level - Literate without Educational Level Persons'],
    illerate_male = data['Educational Level - Literate without Educational Level Males'],
    illerate_female = data['Educational Level - Literate without Educational Level Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Educational Level - Literate without Educational Level Persons'].min(),
         data['Educational Level - Literate without Educational Level Persons'].max()],
    y = [2,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,6000))

p = figure(title = 'Educational Level - Literate without Educational Level (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Educational Level - Literate without Educational Level - Total Persons ','@illerate'),
                           ('Educational Level - Literate without Educational Level - Total Males ','@illerate_male'),
                           ('Educational Level - Literate without Educational Level - Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Educational Level - Literate without Educational Level',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         source = source)
p.legend.border_line_color = None
show(p,notebook_handle=True)



_Educational Level - Below Primary:_

In [290]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Educational Level - Below Primary Persons'],
    area = data['Area Name'],
    illerate = data['Educational Level - Below Primary Persons'],
    illerate_male = data['Educational Level - Below Primary Males'],
    illerate_female = data['Educational Level - Below Primary Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Educational Level - Below Primary Persons'].min(),
         data['Educational Level - Below Primary Persons'].max()],
    y = [2,50]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,100000))

p = figure(title = 'Educational Level - Below Primary (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Educational Level - Below Primary Total Persons ','@illerate'),
                            ('Educational Level - Below Primary Total Males ','@illerate_male'),
                           ('Educational Level -  Below Primary Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Educational Level - Below Primary Persons',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         source = source)
p.legend.border_line_color = None
show(p,notebook_handle=True)



_Educational Level - Primary:_

In [291]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Educational Level - Primary Persons'],
    area = data['Area Name'],
    illerate = data['Educational Level - Primary Persons'],
    illerate_male = data['Educational Level - Primary Males'],
    illerate_female = data['Educational Level - Primary Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Educational Level - Primary Persons'].min(),
         data['Educational Level - Primary Persons'].max()],
    y = [2,50]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,100000))

p = figure(title = 'Educational Level - Primary (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Educational Level -  Primary Total Persons ','@illerate'),
                           ('Educational Level -  Primary Total Male ','@illerate_male'),
                           ('Educational Level -  Primary Total Female ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'No of Educational Level - Primary Persons',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         source = source)
p.legend.border_line_color = None
show(p,notebook_handle=True)



_Unclassified:_

In [292]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Unclassified - Persons'],
    area = data['Area Name'],
    illerate = data['Unclassified - Persons'],
    illerate_male = data['Unclassified - Males'],
    illerate_female = data['Unclassified - Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Unclassified - Persons'].min(),
         data['Unclassified - Persons'].max()],
    y = [1,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,400))

p = figure(title = 'Unclassified -  (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Unclassified - Total Persons ','@illerate'),
                           ('Unclassified - Total Males ','@illerate_male'),
                           ('Unclassified - Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'Unclassified - Persons',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,

         source = source)
p.legend.border_line_color = None
show(p,notebook_handle=True)



_Summary (Total)_

We want to find the total numbers for each category by summing the values. Therefore, we create three new features for the same namely:

- Total
- Total_Males
- Total_Females

In [293]:
data['Total']=data['Illiterate - Persons']+data['Educational Level - Below Primary Persons']+data['Educational Level - Literate without Educational Level Persons']+data['Educational Level - Primary Persons']+data['Unclassified - Persons']
data['Total_Males']=data['Illiterate - Males']+data['Educational Level - Below Primary Males']+data['Educational Level - Literate without Educational Level Males']+data['Educational Level - Primary Males']+data['Unclassified - Males']
data['Total_Females']=data['Illiterate - Females']+data['Educational Level - Below Primary Females']+data['Educational Level - Literate without Educational Level Females']+data['Educational Level - Primary Females']+data['Unclassified - Females']

In [294]:
source = ColumnDataSource(dict(
    x = data['District Code'],
    y = data['Total'],
    area = data['Area Name'],
    illerate = data['Total'],
    illerate_male = data['Total_Males'],
    illerate_female = data['Total_Females'],
    age = data['Age-Group']
)       
)

size_mapper = LinearInterpolator(
    x = [data['Total'].min(),
         data['Total'].max()],
    y = [5,100]
)

color_mapper = CategoricalColorMapper(
    factors = list(data['Area Name'].unique()),
    palette = Spectral8
)

PLOT_OPTS = dict(height = 800,width = 800,x_range = (1,30),y_range=(10,120000))

p = figure(title = 'Summary -  (District vs. Age Wise)',
           toolbar_location = 'above',
           tools = [HoverTool(
               tooltips = [('Area ','@area'),
                           ('Summary','@illerate'),
                           ('Total Males ','@illerate_male'),
                           ('Total Females ','@illerate_female'),
                           ('Age Group ','@age'),
                        ],show_arrow = False)],
           x_axis_label = 'District Code',
           y_axis_label = 'Total Population needing Primary Education',
           **PLOT_OPTS)

p.circle(x='x',
         y='y', 
         size = {'field': 'illerate','transform':size_mapper},
         color = {'field': 'area','transform':color_mapper},
         alpha = 0.7,
         source = source)
p.legend.border_line_color = None
show(p,notebook_handle=True)



__Conclusions__

- The above summary data visualization shows that the target districts that need attention are: 
    - Hubli Darwad;
    - Mysore;
    - Bangalore;
    - Belguam;
    - Gulbarga;
    - Bellary;
    - Davanagiri;
    - Mangalore.
- Age groups of 0-6 years and 30-45 years need more attention.