# Lets get Visual
We saw in the last notebook how to extract numerical information from our data - like the survival rates for men and women. We'll see in this notebook how we can display that kind of information visually. We'll use some techiniques we saw in the gapminder data, and learn some new ones as well

### Imports
Always remember to start your notebook by importing any libraries you will need. In this case we need pandas and plotly express, and I'll load colab's data_table extension so our data is a little prettier

In [0]:
import pandas as pd
import plotly.express as px

%load_ext google.colab.data_table

### Data
We always need to load our data as well. In this case we will be looking at AIDS data from around the world. 

Our data is usually in csv format, so we will usually load our data with `pd.read_csv`, but if our data is in another format we might need a different pandas function.

We pass the location and name of our file to `pd.read_csv` as a string. In this case our data is found at 
```
https://think.cs.vt.edu/corgis/datasets/csv/energy/energy.csv
```
So our call is:
```
pd.read_csv('https://think.cs.vt.edu/corgis/datasets/csv/energy/energy.csv')
```
Finally, we always need to save it to a variable so we can work with the data once it is loaded.

In [0]:
energy=pd.read_csv('https://think.cs.vt.edu/corgis/datasets/csv/energy/energy.csv')

We now have our data loaded and saved to our variable `aids`. to see our data we just type in aids on its own.

In [4]:
energy

Unnamed: 0,State,Year,Production.Coal,Consumption.Commercial.Coal,Consumption.Commercial.Distillate Fuel Oil,Consumption.Commercial.Geothermal,Consumption.Commercial.Hydropower,Consumption.Commercial.Kerosene,Consumption.Commercial.Liquefied Petroleum Gases,Consumption.Commercial.Natural Gas,Consumption.Commercial.Solar,Consumption.Commercial.Wind,Consumption.Commercial.Wood,Consumption.Electric Power.Coal,Consumption.Electric Power.Distillate Fuel Oil,Consumption.Electric Power.Natural Gas,Consumption.Electric Power.Wood,Consumption.Industrial.Coal,Consumption.Industrial.Distillate Fuel Oil,Consumption.Industrial.Geothermal,Consumption.Industrial.Hydropower,Consumption.Industrial.Kerosene,Consumption.Industrial.Liquefied Petroleum Gases,Consumption.Industrial.Natural Gas,Consumption.Industrial.Other Petroleum Products,Consumption.Industrial.Solar,Consumption.Industrial.Wind,Consumption.Industrial.Wood,Consumption.Refinery.Coal,Consumption.Refinery.Distillate Fuel Oil,Consumption.Refinery.Liquefied Petroleum Gases,Consumption.Refinery.Natural Gas,Consumption.Residential.Coal,Consumption.Residential.Distillate Fuel Oil,Consumption.Residential.Geothermal,Consumption.Residential.Kerosene,Consumption.Residential.Liquefied Petroleum Gases,Consumption.Residential.Natural Gas,Consumption.Residential.Wood,Consumption.Transportation.Coal,...,Expenditure.Commercial.Kerosene,Expenditure.Commercial.Liquefied Petroleum Gases,Expenditure.Commercial.Natural Gas,Expenditure.Electric Power.Coal,Expenditure.Electric Power.Distillate Fuel Oil,Expenditure.Electric Power.Natural Gas,Expenditure.Industrial.Coal,Expenditure.Industrial.Distillate Fuel Oil,Expenditure.Industrial.Kerosene,Expenditure.Industrial.Liquefied Petroleum Gases,Expenditure.Industrial.Natural Gas,Expenditure.Industrial.Other Petroleum Products,Expenditure.Residential.Coal,Expenditure.Residential.Distillate Fuel Oil,Expenditure.Residential.Kerosene,Expenditure.Residential.Liquefied Petroleum Gases,Expenditure.Residential.Natural Gas,Expenditure.Residential.Wood,Expenditure.Transportation.Coal,Expenditure.Transportation.Distillate Fuel Oil,Expenditure.Transportation.Liquefied Petroleum Gases,Expenditure.Transportation.Natural Gas,Price.Commercial.Coal,Price.Commercial.Distillate Fuel Oil,Price.Commercial.Kerosene,Price.Commercial.Liquefied Petroleum Gases,Price.Commercial.Natural Gas,Price.Electric Power.Coal,Price.Electric Power.Distillate Fuel Oil,Price.Electric Power.Natural Gas,Price.Industrial.Coal,Price.Industrial.Distillate Fuel Oil,Price.Industrial.Kerosene,Price.Industrial.Liquefied Petroleum Gases,Price.Industrial.Natural Gas,Price.Industrial.Other Petroleum Products,Price.Transportation.Coal,Price.Transportation.Distillate Fuel Oil,Price.Transportation.Liquefied Petroleum Gases,Price.Transportation.Natural Gas
0,Utah,1960,114689.0,2644.0,2110.0,0.0,0.0,33.0,450.0,10482.0,0.0,0.0,35.0,12840.0,67.0,3806.0,0.0,70489.0,5768.0,0.0,2.0,167.0,515.0,34662.0,11571.0,0.0,0.0,344.0,0.0,0.0,0.0,0.0,3805.0,580.0,0.0,5.0,673.0,23398.0,1840.0,1188.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00
1,Vermont,1960,0.0,769.0,2434.0,0.0,0.0,245.0,367.0,0.0,0.0,0.0,65.0,516.0,49.0,0.0,0.0,1096.0,1361.0,0.0,689.0,423.0,413.0,0.0,253.0,0.0,0.0,4420.0,0.0,0.0,0.0,0.0,1107.0,11906.0,0.0,3973.0,799.0,0.0,3457.0,19.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00
2,Rhode Island,1960,0.0,210.0,8046.0,0.0,0.0,95.0,224.0,1753.0,0.0,0.0,20.0,16146.0,76.0,356.0,0.0,95.0,2139.0,0.0,7.0,560.0,128.0,2969.0,1301.0,0.0,0.0,1789.0,0.0,0.0,0.0,0.0,302.0,32079.0,0.0,4367.0,448.0,6946.0,1049.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00
3,Michigan,1960,0.0,24324.0,18710.0,0.0,0.0,3206.0,735.0,44506.0,0.0,0.0,418.0,256299.0,451.0,5391.0,0.0,331969.0,41303.0,0.0,2285.0,15543.0,2181.0,121345.0,23881.0,0.0,0.0,14775.0,0.0,0.0,0.0,0.0,35003.0,101236.0,0.0,4336.0,8018.0,209046.0,22068.0,5547.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00
4,Missouri,1960,61027.0,11149.0,6411.0,0.0,0.0,8546.0,4272.0,33826.0,0.0,0.0,490.0,80476.0,1040.0,31310.0,0.0,62181.0,33332.0,0.0,0.0,1928.0,1819.0,81694.0,12971.0,0.0,0.0,7276.0,0.0,0.0,0.0,0.0,16044.0,7750.0,0.0,1359.0,16878.0,115001.0,25869.0,1056.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2800,Louisiana,2014,35541.0,0.0,2971.0,852.0,0.0,19.0,896.0,32136.0,0.0,0.0,321.0,207043.0,466.0,272974.0,0.0,2931.0,42250.0,42.0,0.0,155.0,614632.0,1165861.0,720939.0,0.0,0.0,113265.0,0.0,289.0,405.0,158453.0,0.0,12.0,948.0,2.0,1976.0,45711.0,2713.0,0.0,...,0.5,20.6,281.2,611.4,9.3,1254.5,15.2,947.8,3.2,9184.7,3772.8,8540.0,0.0,0.3,0.0,61.4,483.4,12.9,0.0,3904.3,12.1,0.4,0.00,21.48,25.33,23.03,8.75,2.95,19.94,4.60,5.20,22.59,20.88,14.95,4.54,24.53,0.0,26.57,29.52,8.08
2801,North Dakota,2014,389673.0,1279.0,6977.0,445.0,0.0,5.0,1898.0,14881.0,0.0,0.0,63.0,304585.0,300.0,2094.0,0.0,93331.0,71386.0,0.0,0.0,2.0,2465.0,44838.0,9608.0,0.0,0.0,121.0,0.0,6.0,152.0,3717.0,0.0,893.0,533.0,7.0,6077.0,13293.0,530.0,0.0,...,0.2,43.7,108.4,465.2,6.4,7.7,396.8,1652.9,0.0,54.6,136.1,0.0,0.0,24.1,0.2,170.7,110.8,2.1,0.0,1850.2,18.7,0.0,3.13,22.61,32.56,23.04,7.28,1.53,21.20,3.68,4.25,23.16,26.84,23.62,5.28,0.00,0.0,27.22,34.54,9.27
2802,Nebraska,2014,0.0,0.0,1894.0,720.0,0.0,2.0,692.0,33673.0,0.0,0.0,353.0,254573.0,569.0,4340.0,0.0,21971.0,25998.0,0.0,0.0,3.0,2590.0,90337.0,623.0,0.0,0.0,108.0,0.0,0.0,0.0,0.0,0.0,102.0,492.0,7.0,6589.0,43794.0,2981.0,0.0,...,0.1,16.0,235.6,355.5,12.5,24.5,40.1,604.9,0.1,61.5,494.3,18.7,0.0,2.8,0.2,185.9,369.6,11.5,0.0,2255.5,16.7,0.9,0.00,22.72,32.72,23.15,7.00,1.40,21.92,5.64,1.82,23.27,26.97,23.73,5.48,30.04,0.0,27.48,34.91,16.23
2803,District of Columbia,2014,0.0,48.0,575.0,0.0,0.0,0.0,2.0,18145.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,110.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,805.0,22.0,0.0,10.0,14769.0,28.0,0.0,...,0.0,0.1,213.1,0.0,0.0,0.0,0.0,2.5,0.0,0.2,0.0,0.0,0.0,22.6,0.0,0.4,185.9,0.1,0.0,59.8,0.1,12.3,3.00,23.16,30.35,25.50,11.75,0.00,0.00,0.00,0.00,22.66,0.00,31.56,0.00,0.00,0.0,26.45,27.59,11.75


We can see that our data has 2,805 rows and 85 columns. Each row represents the data for a particular state in a particular year. The columns represent a great deal of different pieces of information regarding energy sources and consumption. Read the table at https://think.cs.vt.edu/corgis/csv/energy/ to see what each column represents.

## Visualizations

We now need to decide how we are going to visualize our data. The kind of graph you want will depend on what you are trying to display. Plotly has a large number of graphs we can make, but the most common are:
- **Line Graph:** We usually use line graphs when we are showing the change in something over time. We use time on our x-axis, and the value we are looking at on our y-axis. Sometimes we use non-time variables on our x-axis, but it is less common.
- **Scatter Plot** We use scatter plots to show the relationship between two variables, one on the x-axis and one on the y-axis
- **Bar Chart** We use a bar chart to show the amount (how much stuff there is)of a variable in diffferent categories.  The categories may be time based, or they may be something else entirely
- **Pie Chart** We use a pie chart to show how a "whole" is broken up into parts. For instance, we might show how the whole population of Sky Islands is broken up into freshman, sophmores, juniors, seniors, and staff.

Let's see how we might analyze the `energy` data with each of these graph types.



### Line Graph
A natural question might be to find out how the consumption of a particular energy source has changed over time. To simplify our analysis, let's just look at Arizona.

In [0]:
# I'm just going to look at Arizona, so lets make an Arizona sub-frame
az_mask=energy['State']=='Arizona'
az_energy=energy[az_mask]

Now let's make a line graph showing how the consumption of natural gas as a source of energy for electric power has changed in Arizona:

In [6]:
px.line(az_energy,x='Year',y='Consumption.Electric Power.Natural Gas')

We can see that the consumption was fairly low until around 2000, when consumption began to rise quickly until consumption began to fall again. 
#### Improving it
Before we move on to our next graph, let's learn how we can make this one a bit better. When we're making a graph to show information, we always want a title to describe the graph, and our y axis label is unclear.

Luckily, Plotly makes it easy to give a graph a title, but it is slightly trickier to change the label. For the title we just add  `title='Our Title'` to our function call. 

For our y-axis label, or any other time we want to change the label of values from a column we pass a dictionary (we haven't covered these yet, so just copy my syntax) that has the form `{'name of column':'new label'}` we can rename as many columns this way as we want, we just put in a comma to separate them :

`{'name of coumn1':'new label1','name of column2':'new label2'}`

In [8]:
#making our graph and saving it to our variable, fig
px.line(az_energy, x='Year',y='Consumption.Electric Power.Natural Gas',
            # changing our title
            title='Natural Gas Consumption for electric power in Arizona',
            # changing our y label (which has data from the column Data.HIV Prevalence.Adults)
            labels={'Consumption.Electric Power.Natural Gas':'Natural gas consumption in BTUs'})



In [9]:
#making our graph and saving it to our variable, fig
px.line(az_energy, x='Year',y='Consumption.Electric Power.Natural Gas',
            # changing our title
            title='Natural Gas Consumption for electric power in Arizona',
            # changing our y label (which has data from the column Data.HIV Prevalence.Adults)
            labels={'Consumption.Electric Power.Natural Gas':'Natural gas consumption in BTUs', 'Year':'Hello, I am your x-label'})

### Scatter plot

We use a scatter plot when we want to see correlations between two variables. Let's compare electrical power consumption of coal and natural gas to see if there's a correlation. 




In [19]:
px.scatter(az_energy,x='Price.Electric Power.Coal',y='Consumption.Electric Power.Coal')

We see that there's a strong positive correlation - as price goes up, consumption goes up. This is unusual -- normally as prices go up people buy less, not more. Why do we see the opposite in this case?

#### Improving it
Let's add a title again, and change the labels to more readable labels (that also include our units). Notice that when we change our x and y labels, our data's labels also change when we hover over a point. 

Let's also add `'hover_name='Year'` so that the year will be displayed when we hover over a data point, perhaps we can understand our data better with this additional information.

In [25]:
px.scatter(az_energy,x='Price.Electric Power.Coal',y='Consumption.Electric Power.Coal',
            title='Arizona Coal consumption vs. Coal Price',
            labels={'Consumption.Electric Power.Coal': 'Coal Consumption (billion BTUs)',
                    'Price.Electric Power.Coal':'Cost of coal (Dollars/million BTU)'},
            hover_name='Year')

It seems like the older datapoints are in the lower left corner, and the more recent datapoints are in the upper right. If we assign color based on year, we might be able to see if this pattern holds.

In [27]:
px.scatter(az_energy,x='Price.Electric Power.Coal',y='Consumption.Electric Power.Coal',
            title='Arizona Coal consumption vs. Coal Price',
            labels={'Consumption.Electric Power.Coal': 'Coal Consumption (billion BTUs)',
                    'Price.Electric Power.Coal':'Cost of coal (Dollars/million BTU)'},
            hover_name='Year', color='Year')

That looks great! and explains some anomalies in our data.

### Bar Chart
Bar charts are good at displaying the quantity of something in different categories. In order to illustrate this example and the pie chart example, let's switch from looking at one state to one time:

In [0]:
mask_2000=energy['Year']==2000
#sorting by values so our states are in alphabetical order when we graph them
energy2000=energy[mask_2000].sort_values('State')

Now let's make a bar chart demonstrating coal consumption for electric power in each state:

In [36]:
px.bar(energy2000, x='State',y='Consumption.Electric Power.Coal',
        title='Coal Consumption for electric power by state in 2000',
        labels={'Consumption.Electric Power.Coal': 'Coal Consumption (billion BTUs)'})

We can see that the biggest consumer in 2000 was Texas, followed by Ohio, Indiana, and Pensylvania.

### Pie Chart
Pie Charts are good at showing the parts of a whole. In this case let's look at how total coal consumption for electric power is split up among the states. We'll look at the year 2000 again.



In [39]:
px.pie(energy2000, values='Consumption.Electric Power.Coal', names='State',
       title='Coal consumption in 2000 divided by state')

et voila

There are numerous other features of the graph you can change - the fonts, the color pallettes, 
the background color the tick marks and axes. Almost anything really. If you want to get fancy, go check out plotly's documentation at https://plot.ly/python/