<font size = 5><center>Introduction to Python for Data Science</center></font><br>
<font size = 5><center><strong>Data Visualization</strong></center></font><br>
<font size = 5><center>(Interactive Data Visualization with hvplot)</center></font><br>
<center><font size = 4>Rodrigo C. Belleza Jr.</font><br/>College of Computing and Multimedia Studies<br/>Manuel S. Enverga University Foundation</center>

#### Intalling hvplot package

Execute **pip install hvplot** in the Anaconda prompt

**Documentation**

https://hvplot.holoviz.org/

#### Import the Pandas and Numpy packages

In [1]:
import pandas as pd

#### Open the diseases.csv file and display the first 5 rows

In [4]:
diseases_data = pd.read_csv('diseases.csv')
diseases_data.head()

Unnamed: 0,Year,Week,State,measles,pertussis
0,1928,1,Alabama,3.67,
1,1928,2,Alabama,6.25,
2,1928,3,Alabama,7.95,
3,1928,4,Alabama,12.58,
4,1928,5,Alabama,8.03,


#### Group according to year, apply sum to measles column

In [5]:
diseases_by_year = diseases_data[["Year","measles"]].groupby("Year").sum()

In [6]:
diseases_by_year.head()

Unnamed: 0_level_0,measles
Year,Unnamed: 1_level_1
1928,16924.34
1929,12060.96
1930,14575.11
1931,15427.67
1932,14481.11


### <font color = 'blue'>PART III : Interactive Data Visualization with hvplot</font>

#### Import the hvplot package and display a curve plot.

Using the hvplot produce an interactive plot (pan, hover or zoom functionalities) compared to matplotlib which produce a static plot. The hvplot return a HoloViews object (here HoloViews Curve) which displays as Bokeh plot. Holoviews plots are much richer and make it easy to capture your understanding while exploring the data.

In [8]:
import hvplot.pandas

**Visualize using line chart**

In [9]:
diseases_by_year.hvplot()

In [11]:
diseases_by_year.hvplot(line_color = 'red', line_width = 3, title = 'Measles Cases from 1928-2011', 
                       ylabel = 'No. of Measles Cases', width = 800, height = 500, 
                       fontsize={'xticks':11,'yticks':11,'ylabel':12,'xlabel':12,'title':16})    

In [12]:
def hook(plot,element):
    plot.handles['xaxis'].axis_label_text_font_style = 'normal'
    plot.handles['xaxis'].axis_label_text_color = 'blue'
    plot.handles['yaxis'].axis_label_text_font_style = 'normal'
    plot.handles['yaxis'].axis_label_text_color = 'blue'

diseases_by_year.hvplot(line_color = 'red', line_width = 2, title = 'Measles Cases', 
                       ylabel = 'No. of Measles Cases', width = 800, height = 500, 
                       fontsize={'xticks':12,'yticks':12,'ylabel':14,'xlabel':14,'title':16}).opts(hooks=[hook])    

**Visualize using Table**

In [18]:
data = diseases_data[["Year","measles"]].groupby("Year").sum()
data

Unnamed: 0_level_0,measles
Year,Unnamed: 1_level_1
1928,16924.34
1929,12060.96
1930,14575.11
1931,15427.67
1932,14481.11
...,...
2007,0.00
2008,0.00
2009,0.00
2010,0.00


In [17]:
data.hvplot.table()

In [19]:
by_year = diseases_data[["Year","measles"]].groupby("Year",as_index=False).sum()
by_year

Unnamed: 0,Year,measles
0,1928,16924.34
1,1929,12060.96
2,1930,14575.11
3,1931,15427.67
4,1932,14481.11
...,...,...
79,2007,0.00
80,2008,0.00
81,2009,0.00
82,2010,0.00


In [20]:
by_year.hvplot.table()

#### Placing vertical line and text

The following example captures important points on the plot itself by placing vertical line and text annotation. 1963 was import with respect to measles and how about we record this point on the graph itself. This will also help us to compare the number of measles cases before and afer the vaccine introduction.

In [21]:
import holoviews as hv

vaccination_introduced = diseases_by_year.hvplot(line_color = 'red', line_width = 2, title = 'Measles Cases from 1928-2011', 
                       ylabel = 'No. of Measles Cases', width = 800, height = 500, 
                       fontsize={'xticks':11,'yticks':11,'ylabel':12,'xlabel':12,'title':16}) * \
hv.VLine(1963).options(color='green') * \
hv.Text(1964, 27000, "Measles Vaccine Introduced", halign='left') * \
hv.Text(1964, 25000, "Result to decline in measles cases", halign='left') 


vaccination_introduced

#### Integrating Filter (dropdown) in the Plot

Here we were able to use data that was used for making the plot. Also, it is now very easy to break data in many different ways.

In [22]:
measles_agg = diseases_data.groupby(['Year', 'State'])[['measles']].sum()
measles_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,measles
Year,State,Unnamed: 2_level_1
1928,Alabama,334.99
1928,Alaska,0.00
1928,Arizona,200.75
1928,Arkansas,481.77
1928,California,69.22
...,...,...
2011,Virginia,0.00
2011,Washington,0.00
2011,West Virginia,0.00
2011,Wisconsin,0.00


In [23]:
by_state = measles_agg.hvplot('Year', groupby='State', width=500, dynamic=False)
by_state

Instead of a dropdown, we can place charts side by side for better comparison.

In [24]:
by_state["Alabama"].relabel('Alabama')  + by_state["Florida"].relabel('Florida')

We can also change the type of plots, say to a bar chart. Let us compare the measles pattern from 1980 to 1985 across four states.

In [25]:
measles_year = diseases_data.groupby(diseases_data.Year)[['measles']].sum()

measles_year.loc[1980:1990].hvplot.bar('Year', rot=45,color = 'coral',line_color=None)

In [26]:
measles_year.hvplot.scatter('Year',rot=90,cmap='viridis',size=20)

In [33]:
measles_agg = diseases_data.groupby(['Year', 'State'])['measles'].sum()

In [34]:
states = ['New York', 'Alabama', 'California', 'Florida']
#measles_agg.loc[1980:1990,states].hvplot.bar('Year', by='State', stacked = True, rot=90,cmap=['red','green','blue','yellow'])
measles_agg.loc[1980:1990,states].hvplot.bar('Year', by='State', stacked = True, rot=90,cmap='tab20',line_color=None)

In [35]:
states = ['New York', 'Alabama', 'California', 'Florida']
measles_agg.loc[1980:1990, states].hvplot.bar('Year', by='State', rot=90)