# Interactive Visualizations

Use **Code** cells to write and run any code you need to answer the question and **Markdown** cells to write out answers in words. After you are finished with the assignment, remember to download it as an **HTML file** and submit it in **ELMS**.

In [1]:
from requests import get

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
import plotly.express as px

## Dynamic Visualizations

So far, we have only worked with static visualizations. That is, there were no moving pieces or changes that you would make to the graphs after they were made. These are the more traditional visualizations and most commonly used within reports (after all, you can't animate figures on a piece of paper ... at least not yet!).

However, now that presentations and reports are distributed and accessed online much more frequently, the use of interactive, dynamic visualizations has grown. Dynamic visualizations can open up a whole new dimension and allow you to show relationships and trends more clearly than with a static visualization. In addition, they can be more flexible and allow for customization of different views of the data much more easily than creating any individual static visualizations.

These interactive visualizations are typically used for:
- Flexible custom views of the data
- Creating dashboards and other data tools for non-technical users
- Showing a time dimension that would be otherwise difficult to show
- Making adjustments in real-time as data comes in
- And more ...

To start, we'll bring in some datasets to use within our figures.

In [2]:
us_data =pd.read_csv('us_df.csv')
census_data = pd.read_csv('county_df.csv')


In [4]:
md_va = census_data[(census_data.state == 'Maryland') | (census_data.state == 'Virginia')]
md_va.head()

Unnamed: 0,county,num_households,mean_income,percent_employed,percent_bachelors,percent_graduate,percent_hoh_married,percent_hh_over65,percent_born_us,percent_broadband,state
306,"Allegany County, Maryland",27588.0,81756,54.8,21.6,8.7,39.4,37.9,98.3,86.4,Maryland
307,"Anne Arundel County, Maryland",223871.0,180257,70.0,46.7,21.6,51.5,31.9,87.3,96.7,Maryland
308,"Baltimore County, Maryland",337395.0,143702,66.6,43.9,20.3,43.3,35.5,84.5,93.2,Maryland
309,"Calvert County, Maryland",32921.0,168314,63.7,38.6,16.0,62.0,31.5,92.7,96.1,Maryland
310,"Carroll County, Maryland",65133.0,155906,66.4,40.7,14.9,60.7,35.3,92.5,94.0,Maryland


In [5]:
data_file = '201807-CAH_PulseOfTheNation_Raw.csv'
potn = pd.read_csv(data_file)
potn.head()

Unnamed: 0,gender,age,age_range,political_party,rep_change,dem_change,political_leaning,education,race,race_other,...,stereotypes,respect_for_authority,handout,climate_change,epa,biz_regulations,nuclear_emissions,voter_turnout_help,american_heartland,south_racist
0,Female,55,55-64,Independent,,,Moderate,Some college,White,,...,Yes,Strongly Agree,Strongly Disagree,Well-informed,No,Yes,Yes,Help Democrats more,Yes,Yes
1,Female,34,25-34,Strong Democrat,,No,Strong Liberal,Graduate degree,White,,...,Yes,Strongly Disagree,Strongly Disagree,Well-informed,No,No,Yes,Help Democrats more,No,DK/REF
2,Male,49,45-54,DK/REF,,,Moderate,College degree,White,,...,DK/REF,Strongly Agree,Somewhat Agree,Well-informed,No,No,Yes,DK/REF,Yes,No
3,Male,41,35-44,Independent,,,Strong Conservative,High school or less,White,,...,Yes,Strongly Agree,Strongly Agree,Not Very Well-informed,Yes,Yes,DK/REF,Make no Difference,Yes,No
4,Female,65,65+,Independent,,,Moderate,Graduate degree,White,,...,Yes,Somewhat Disagree,Somewhat Disagree,Well-informed,No,No,Yes,Help Democrats more,Yes,Yes


## Plotly

The `plotly` package provides an easy way to create quick interactive visualizations. Here, we'll go over how to use the "express" functions, which create quick interactive visualizations without needing lots of code or customization. The `plotly` package also allows for more complicated animations and dynamic aspects, including maps and 3-D plots. For more guides on how to use `plotly`, see the tutorials at https://plotly.com/python/#animations.  

The `scatter` function from plotly uses syntax similar to seaborn, but creates a scatterplot with points that you can hover over to get information about them. 

In [6]:
fig = px.scatter(census_data, x = 'percent_bachelors', y = 'mean_income')
fig.show()

Note that the hover information only has the values of the variables that are plotted. This is useful if we want to get the exact values, but we might also want to know what each observation represents. For example, which county is the one that had the highest percent of people with a Bachelor's degree? Or the lowest? We can add this information to the graph by adding the `hover_data` argument.

In [7]:
fig = px.scatter(census_data, x = 'percent_bachelors', y = 'mean_income', hover_data = ['county'])
fig.show()

Finally, we'll generally want to add at least some basic annotations to this plot to make it easier to interpret. We can assign x and y axis labels with the "labels" argument, and we can add a title with the title argument. In this example, I've also added a subtitle by using some html tags: `<br>` starts a new line, and `<sup>` gives us <sup>Superscript text</sup>

In [8]:
fig = px.scatter(census_data.dropna(), 
                 x = 'percent_bachelors', 
                 y = 'mean_income', 
                 hover_data = ['county'], 
                 size = 'num_households',
                 # x and y axis labels as a key:value dictionary
                 labels={"mean_income": "Mean Income",
                         "percent_bachelors": "% Bachelor's Degree"},
                 # title + subtitle (note the HTML tags)
                 title="Mean Income by % Bachelor's degree<br><sup>Points scaled by household size</sup>",
                 
                )

fig.show()

Plotly has sensible default settings for most of the color options, but you can make additional modifications to an existing figure using the `update_layout` (which controls the background) and `update_markers` methods (which controls the points, bars, boxes etc.) Here, I'm making a transparent background and modifying the color of the points

In [9]:

fig.update_layout({
    # all zeros here makes a transparent background
    'plot_bgcolor': 'rgba(0, 0, 0, 0)',
    'paper_bgcolor': 'rgba(0, 0, 0, 0)'

})

fig.update_traces(
    # red markers with a white border
    marker_line_color="white", 
    marker_color="red"
    )

<font color ='red'>**Question 1: Create a visualization that plots the percent of people with a bachelor's degree with the percent of people employed within a county, with the size of the observation scaled to the mean income of that county. Add a title and axis labels to the plot**</font>

## Interactive Bar Charts

We can use `plotly` to make interactive bar charts as well. First, we start by using `crosstab` in order to make the table that contains the underlying data in the bar chart. We'll use `normalize = 'index'` in order to get proportions rather than raw counts so that we can see the relationship between two variables.

In this example, we look at political party and gender.

In [None]:
party_by_gender = pd.crosstab(potn.political_party, potn.gender, normalize = 'index')
party_by_gender

As before, we need to reorder the variables so that they are in a more intuitive order. 

In [None]:
party_order = ['Strong Democrat', 'Not Very strong Democrat', 'Independent',
              'Not very Strong Republican', 'Strong Republican', 'DK/REF']
party_by_gender = party_by_gender.loc[party_order, :]

Now that we have our data in the form that we need, using `px.bar` and specifying the x and y variables gives us the bar chart that we want. This bar chart allows us to see the relationship between gender and political party, as well as allowing us to see the exact values of each of the bars and colors by hovering over them. This provides a clean image for comparing the groups as well as a way to see exact values if we want to without having them clutter up the graph.

In [None]:
fig = px.bar(party_by_gender, x = party_by_gender.index, y = party_by_gender.columns,
            labels = {''}
            
            )
fig.show()

We could have also made this into a horizontal bar graph by adding the `orientation = 'h'` argument. Note that this requires you to switch the `x` and `y` arguments, because the elements that you are putting on the x- and y-axes are now changed. 

In [None]:
fig = px.bar(party_by_gender, x = party_by_gender.columns, y = party_by_gender.index, orientation = 'h')
fig.show()

Finally, we would probably want to add descriptive labels to this plot, I'm also modifying the xaxis to use % values in the tick marks instead of proportions 

In [None]:
fig = px.bar(party_by_gender, x = party_by_gender.columns, y = party_by_gender.index, orientation = 'h',
                # x and y axis labels as a key:value dictionary
                 labels={"political_party": "Party ID",
                         "value": "Percent"},
                 # title + subtitle (note the HTML tags)
                 title="Gender by Party ID",
                 
            
            )
# rescale to % instead of 0 to 1
fig.layout.xaxis.tickformat = '0%'
fig.show()

### Boxplots

You can make boxplots in the same manner as well. This will add information about the underlying computation that was done to create the boxplots, such as the median and quartile values. Similar to seaborn, we provide the DataFrame as well as specify the `x` and `y` variables. We can also provide an argument for `color` which will further split the data into groups to do more comparisons.

In [None]:
fig = px.box(potn, x = 'gender', y = 'age', color = 'race',
            title = "Distribution of age by gender and race"             
             #leaving x and y axis labels as-is, since they're already fairly descriptive
            )
fig.show()

<font color ='red'>**Question 2: Create a visualization that compares the boxplot of mean income for counties in Maryland and in Virginia. Be sure to add descriptive labels and a title**</font>

In [None]:
# trying adding the points = 'all' as an extra argument to px.box. What do you see?



## Sliders and Animated Plots

The `plotly` package also provides the ability to add sliders and animations to graphs. This is most useful for when you want to show changes over time or want to look at different cuts of the data according to some categorical variable.

Let's take a look at an example using the built-in Gapminder dataset within `plotly`.

In [None]:
gm = px.data.gapminder()
gm.head()


This data contains country level information about characteristics such as life expectancy, population, and GDP per capita. The Gapminder website (https://www.gapminder.org) also has resources to find data on other characteristics for countries around the world, such as fertility and child mortality. We can make a scatterplot similar to before using the `scatter` function, but also add an `animation_frame` argument to make it so that it animates by year. The `animation_group` argument makes sure that observations that are the same across time points are animated smoothly and match. 

In [None]:
fig = px.scatter(gm, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
                 size="pop", color="continent", hover_name="country",
                 log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])
fig.show()

We could have done this with the ACS data too, but we would need to get data over multiple years first. The fact that we defined a function separately makes this easier to do. Using a `for` loop, we can use the `get_county_data` function to pull data for a range of years, then use `pd.concat` to combine the datasets together.

In [None]:
year_range = range(2013,2020)
acs = []

for year in year_range:
    df = get_county_data(year, census_key)
    df['year'] = year
    acs.append(df) 
    
acs_over_years = pd.concat(acs, ignore_index = True)

To avoid issues with some counties having the same name across states, we'll combine the `county` and `state` variables and create a new variable that has a unique county-state combination. 

In [None]:
acs_over_years['county_state'] = acs_over_years.county + ', ' + acs_over_years.state

Finally, we use `dropna()` to remove any rows with NAs in them, then create the visualization. Here, we are graphing the `percent_employed` variable with `mean_income`. Note that some of the variable names changed over the years, so it would take a bit more work to get the appropriate `percent_bachelors` variable.

In [None]:
fig = px.scatter(acs_over_years.dropna(), x="percent_employed", y="mean_income", 
                 animation_frame="year", animation_group="county_state",
                 size="num_households", hover_name="county",
                 range_x=[25,90], range_y=[0,300000])
fig.show()