In [None]:
import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import cufflinks as cf
import plotly
import panel as pn
pn.extension()
cf.go_offline()
%matplotlib inline
plt.style.use('ggplot')

<img src="../img/logo_white_bkg_small.png" align="right" />

# Worksheet 4 - Data Visualization
This worksheet will walk you through the basic process of preparing a visualization using Python/Pandas/Matplotlib/Seaborn/Cufflinks.  

For this exercise, we will be creating a line plot comparing the number of hosts infected by the Bedep and ConfickerAB Bot Families in the Government/Politics sector.

## Prepare the data
The data we will be using is in the `dailybots.csv` file which can be found in the `data` folder.  As is common, we will have to do some data wrangling to get it into a format which we can use to visualize this data.  To do that, we'll need to:
1.  Read in the data
2.  Filter the data by industry and botnet
The result should look something like this:

<table>
    <tr>
        <th></th>
        <th>date</th>
        <th>ConflikerAB</th>
        <th>Bedep</th>
    </tr>
    <tr>
        <td>0</td>
        <td>2016-06-01</td>
        <td>255</td>
        <td>430</td>
    </tr>
    <tr>
        <td>1</td>
        <td>2016-06-02</td>
        <td>431</td>
        <td>453</td>
    </tr>
</table>

The way I chose to do this might be a little more complex, but I wanted you to see all the steps involved.

###  Step 1 Read in the data
Using the `pd.read_csv()` function, you can read in the data.

In [None]:
DATA_HOME = '../data/'
data = pd.read_csv(DATA_HOME + 'dailybots.csv')
data.head()

In [None]:
data['botfam'].value_counts()

### Step 2:  Filter the Data
The next step is to filter both by industry and by botfam.  In order to get the data into the format I wanted, I did this separately.  First, I created a second dataframe called `filteredData` which only contains the information from the `Government/Politics` industry.

In [None]:
# Your code here...

Next, I created a second DataFrame which only contains the information from the `ConfickerAB` botnet.  I also reduced the columns to the date and host count.  You'll need to rename the host count so that you can merge the other data set later.

In [None]:
# Your code here...

Repeat this porcess for the `Bedep` botfam in a separate dataFrame.  

### Step 3: Merge the DataFrames.
Next, you'll need to merge the dataframes so that you end up with a dataframe with three columns: the date, the `ConfickerAB` count, and the the `Bedep` count.  Pandas has a `.merge()` function which is documented here: http://pandas.pydata.org/pandas-docs/stable/merging.html


In [None]:
# Your code here...

##  Create the first chart
Using the `.plot()` method, plot your dataframe and see what you get.  

In [None]:
#Your code here...

## Step 3 Customizing your plot:
The default plot doesn't look horrible, but there are certainly some improvements which can be made.  Try the following:
1.  Change the x-axis to a date by converting the date column to a date object.
2.  Move the Legend to the upper center of the graph
3.  Make the figure size larger.
4.  Instead of rendering both lines on one graph, split them up into two plots
5.  Add axis labels

There are many examples in the documentation which is available: http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html

In [None]:
#Your code here...

### Move the Legend to the Upper Center of the Graph
For this, you'll have to assign the plot variable to a new variable and then call the formatting methods on it. 

In [None]:
# Your code here...

### Make the Figure Size Larger:


In [None]:
# Your code here... 

### Adding Subplots
The first thing you'll need to do is call the `.subplots( nrows=<rows>, ncols=<cols> )` to create a subplot.
Next, plot your charts using the `.plot()` method.  To do add the second plot to your figure, add the `ax=axes[n]` to the `.plot()` method.

In [None]:
# Your code here...

# Making it Interactive

Using `cufflinks`, plot an interactive time series chart of this data. Plot each series on a separate line.  The documentation for `cufflinks` can be found here: https://plot.ly/ipython-notebooks/cufflinks/.   

In [None]:
# Your code here...

## Building Dashboards with Interactive Widgets
In this last example, you are going to create a chart to visualize the breakdown of bots attacking each industry.  In order to do this we will be using the `panel` module.  The complete documentation for the `Panel` module are available here: http://panel.pyviz.org/index.html

The first thing you will need to do is define a fuction which takes an argument of an industry and returns a figure from a visualization.  In order to do that, you will have to do a bit of data wrangling as well.  Specifically, you will need to:
1.  Filter your data by the user supplied industry
2.  Remove extraneous columns
3.  Aggregate the data by the `Industry` column
4.  Calculate a `sum` of the `hosts` column
5.  Set the index to the `botfam` column.

Ultimately your data will need to be formatted like this:

<table>
    <tr>
        <th>hosts</th>
        <th>botfam</th>
    </tr>
    <tr>
        <td>Bedep</td>
        <td>52049</td>
    </tr>
    <tr>
        <td>ConfickerAB</td>
        <td>321373</td>
    </tr>
    <tr>  
        <td>Necurs</td>
        <td>48037</td>
    </tr>
    <tr>
        <td>Olmasco</td>	
        <td>1572</td>
    </tr>
    <tr>
        <td>PushDo</td>
        <td>62485</td>
    </tr>
    <tr>
        <td>Ramnit</td>
        <td>78753</td>
    </tr>
    <tr>
        <td>Sality</td>
        <td>56600</td>
    </tr>
    <tr>
        <td>Zeus</td>
        <td>16156</td>
    </tr>
    <tr>
        <td>Zusy</td>
        <td>45648</td>
    </tr>
    <tr>
        <td>zeroaccess</td>
        <td>24456</td>
    </tr>
</table>



In [None]:
industry_list = ['Manufacturing', 'Retail', 'Education', 'Healthcare/Wellness',
                                                        'Government/Politics', 'Finance']
def plot_industry_bar_chart(selected_industry):
    #Your code here...
    return fig

Next, use the `pn.interact()` function to actually render the widget and graph.  This function takes two arguments:
1.  A function which renders the chart.  (This is the function that you wrote in the previous step)
2.  A list of inputs, in this case the industries, that you want to pass to the charting function

Documentation available here: http://panel.pyviz.org/user_guide/Interact.html


In [None]:
#Your code here...