## Importing Plotly

Plotly is a pretty extensive plotting library.  It can be as simple or as complex as you want.  As a result, there are two primary ways of importing Plotly:

```python
# The easier but less powerful version
import plotly.express as px

# The more complex but more powerful version
import plotly.graph_objects as go
```


In an effort to make Plotly more approachable, the developers over at Plotly have created Plotly Express.  Express allows us to create beautiful graphs with Python without having to write as much code.  However, there are some features of Plotly that are not available when using Express.  During those situations, using plotly.graph_objects becomes necessary.

For the purposes of this class, we will be using Plotly Express.  The following concepts covered in this module are using Plotly Express, and are mostly incompatible with Plotly graph objects. 

For more information on Plotly and how to use it with Python, please reference [their website](https://plotly.com/python/).  There are a ton of different plots available with Plotly that we don't have time to cover in this course.    

## Scatter Plot

Scatter plots are used to display data points on the x-axis and the y-axis of a plot.  We can simply pass in lists into the x and y arguments, as can be seen below.  

In [None]:
import plotly.express as px

fig = px.scatter(x=[0, 4, 5, 7, 8], y=[3, 6, 3, 5, 8])
fig.show()

If you notice above, we save our scatter plot to the `fig` variable.  From there, we display the plot by running `fig.show()`. The `fig` is an example variable name.  We could have just as easily called the variable `figure`.  In that case, we would call `figure.show()`.  **Make sure you use the `.show()` function.  Otherwise the plot may not appear.**

Additionally, we can pass Pandas data frames into Plotly to generate our scatter plot.  The data frame is the first argument when creating the scatter plot.  We specify `x=` as the data frame column we want to plot along the x-axis.  Then, we can use `y=` to specify the column that we want to plot along the y-axis.  Additionally, we can use the `colors=` argument to create different colors for the scatter plot points depending upon the value in the column.  

Finally, the `hover_data=` argument allows us to have additional data added to each of the points when we hover over the data points.  We pass in a list of the relevant columns into the `hover_data=` argument.  Plotly plots are really nice since they are interactive.  We can zoom into the plot as well as hover over the points to get more information.

## Exercise 1

In the exercise below, we are trying to create a scatter plot of the Ballotpedia campaign contributions data that we collected module 4.  We want to see if there is anything interesting going on with Indiana's campaign contributions.  Granted, a scatter plot is not the best plot type in this scenario, but this is just a demonstration.  When we run the code below, the plot doesn't appear.  Fix the code below so a plot appears.  

**Hint: You need to add one line of code**

In [None]:
import pandas as pd
import plotly.express as px

data = pd.read_csv("https://raw.githubusercontent.com/nalderto/POL300-Public/master/All-Campaign-Contributions.csv")

data =  # Remove rows with "Total" in the "Year"

data =  # Only include data points from Indiana

fig = "代码"


## Histogram

According to Plotly, a histogram is a "representation of the distribution of numerical data, where the data are binned and the count for each bin is represented."  They are commonly used in statistics.  A histogram can be created using:

```python
fig = px.histogram(data_frame, x="column_name")
```

The `x=` argument is the data attribute column we want to study.  An optional argument is `nbins=`.  Here we specify how many bars we want in our histogram.  If we don't provide this argument, Plotly will make this decision for us.  So, if we want 20 bars, we would write something similar to below:

```python
fig = px.histogram(data_frame, x="column_name", nbins=20)
```

## Exercise 2

As you can see below, we have a histogram of the average spending for State Senate campaigns in Washington state from 2000 to 2018.  While Plotly did a pretty good job with selecting the number of bars, we want a few more.

**Change the plot to display 50 bars in the histogram.**

In [None]:
import pandas as pd
import plotly.express as px

data = pd.read_csv("https://raw.githubusercontent.com/nalderto/POL300-Public/master/All-Campaign-Contributions.csv")

data = data[data['Year'] != 'Total'] # Remove rows with "Total" in the "Year"

data = data[data['State'] == 'Washington'] # Only include data points from Washington

data = data[data['Chamber'] == 'State Senate'] # Only include State Senate data points

fig = px.histogram(data, x='Average')
fig.show()

## Maps

One of the biggest strengths of Plotly is the number of different types of plots available.  Of course, scatterplots, line graphs, histograms, etc. are available, however so are some other neat plots like starburst charts and Sankey Diagrams.  A personal favorite of many is the ability to create map based visualizations.  This is an especially useful plot in political science, as many things we investigate are geography based.

To create a US state map plot in Plotly we use the following functions:

```python
fig = px.choropleth(averages, locations="state_abbreviations", 
                    locationmode="USA-states", 
                    color="lifeExp", 
                    hover_name="state_name", scope="usa")
```

To create a global map, we use the following function:

```python
fig = px.choropleth(df, locations="countries",
                    color="lifeExp",
                    hover_name="country")
```

The `locations` argument for the state map must by comprised of a column on the two-letter abbreviations of the corresponding state.  In the world map function, the `locations` argument must be the [3-letter ISO country codes](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3).  

The `color` argument is the data column you would like to see visualized.  The `lifeExp` is just an example.  You would replace this with whatever you are interested in studying, like GDP or smoking rates.  

The `hover_name` argument should be a column representing the name of the respective state or country, however you can change it to be something else if you like.  

In [None]:
import pandas as pd
import plotly.express as px

data = pd.read_csv("https://raw.githubusercontent.com/nalderto/POL300-Public/master/All-Campaign-Contributions.csv")

# We need to add the two-letter state abbreviations to our data frame
state = pd.read_csv("https://worldpopulationreview.com/static/states/name-abbr.csv", names=["State", "Abbr"])
data = data.merge(state, on="State")

data = data[data['Year'] == '2018'] # We only want data from 2018
data = data[data['Chamber'] == 'State Senate'] # We only want State Senate data

# We need to convert the average to an integer (i.e. $4,321 => 4321)
data['Average'] = "你的答案"

# Group the states by abbreviation and get the average cost across all their districts
averages = data.groupby(['Abbr', 'State'], as_index=False)['Average'].mean()

# Create the map plot
fig = px.choropleth(averages, locations="Abbr", locationmode="USA-states", color="Average", hover_name="State", scope="usa")

fig.show() # Our data set was not complete, which is why some states are grey

## Exercise 3
According to the map plot above, which state had the most expensive state senate races on average in 2018?  Why do you theorize this is the case? 

*Hint: think about the state population and the number of state senate seats*

Type your answer in the comment below.

In [None]:
# Type your answer on the line below
# 
# Type your answer on the line above