# 08 Graphs
__Math 3080: Fundamentals of Data Science__

Reading:
* McKinney, Chapter 9 Plotting and Visualization

Outline:
1. Summary of the different plots we can use
2. Overview of MatplotLib
3. Overview of Seaborn
4. Interactive plots
    * Plotly

-----
## Interactive Plots
Plotting information onto graphs has always been helpful in passing a lot of information by simple means. After all, a picture is worth a thousand words! The better the picture, the more it says.

But what if we can add another dimension to these graphs? What if we can get users to interact with the graphs? For example, users can look at a graph and compare a number of variables to each other, then look closer at one particular variable and get exact numbers and more details about each variable. 

There are a number of libraries that have included interactions. Plots made with these libraries are called __interactive plots__. These are some of the libraries that include interactive abilities:
* Bokeh
* Plotly
* Altair
* mpld3
* matplotlib + ipywidgets
* Streamlit
* pygal
* bqplot

We will not be going through all of these. Many have done reviews on these libraries (e.g. [Northwester University - Research Computing Services](https://sites.northwestern.edu/researchcomputing/2022/02/03/what-is-the-best-interactive-plotting-package-in-python/)), and they all tend to agree that Bokeh, Plotly, and Altair are the top three libraries. So, we are going to use the Plotly package
* [Plotly Website](https://plotly.com/python/)

In [7]:
import plotly.express as px
import seaborn as sns

In [8]:
tips = sns.load_dataset('tips')
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [9]:
fig = px.scatter(tips, x='total_bill', y='tip')
fig.show()

In [10]:
fig = px.scatter(tips, x='total_bill', y='tip',
                 color='smoker', 
                 size='size',
                 symbol='time',
                 marginal_x='histogram', marginal_y='rug',
                 hover_data=['day','sex'])
fig.show()

In [11]:
fig = px.density_heatmap(tips, x='total_bill', y='tip',
                 marginal_x='box', marginal_y='violin',
                 hover_data=['day','sex'])
fig.show()

### Time Series (Stocks)

In [158]:
import requests
import pandas as pd

url = 'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=IBM&outputsize=full&apikey=Q7N3TQXDW2FE59QN'
r = requests.get(url)
data = r.json()

#print(data)
#list(data)
stock_data = pd.DataFrame(data['Time Series (Daily)'])

In [159]:
stock = pd.DataFrame(stock_data.loc['4. close'])
stock.columns = ['IBM']
stock

Unnamed: 0,IBM
2023-03-21,126.57
2023-03-20,125.94
2023-03-17,123.69
2023-03-16,124.7
2023-03-15,123.28
...,...
1999-11-05,90.25
1999-11-04,91.56
1999-11-03,94.37
1999-11-02,94.81


In [160]:
stock_codes = ['F','TSLA','AMZN','DIS']

for code in stock_codes:
    url = "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={0}&outputsize=full&apikey=Q7N3TQXDW2FE59QN".format(code)
    r = requests.get(url)
    data = r.json()
    stock2 = pd.DataFrame(pd.DataFrame(data['Time Series (Daily)']).loc['4. close'])
    stock2.columns = [code]
    stock = stock.join(stock2, how='left')

stock

Unnamed: 0,IBM,F,TSLA,AMZN,DIS
2023-03-21,126.57,11.72,197.58,100.61,96.54
2023-03-20,125.94,11.18,183.25,97.71,94.22
2023-03-17,123.69,11.3,180.13,98.95,93.2
2023-03-16,124.7,11.82,184.13,100.04,94.29
2023-03-15,123.28,11.71,180.45,96.2,93.1
...,...,...,...,...,...
1999-11-05,90.25,53.69,,64.94,24.31
1999-11-04,91.56,53.06,,63.06,26.5
1999-11-03,94.37,53.75,,65.81,26.88
1999-11-02,94.81,54.38,,66.44,26.25


In [None]:
# Now, plot the time series

### Image Progression
* (https://plotly.com/python/imshow/)

### Geographical

In [149]:
import numpy as np
import pandas as pd

# Load Data
votes = pd.read_csv('../Datasets/1976-2020-senate.csv', encoding="ISO-8859-1")

# Create a "Percent Votes" column
votes['percentvotes'] = np.round(votes['candidatevotes'] * 100 / votes['totalvotes'], 1)
#print(votes.head())

# Pivot table to get the percent votes by party in the 2020 election
results2020 = pd.pivot_table(votes[votes['year'] == 2020],
                             index='state_po',
                             columns='party_detailed',
                             values='percentvotes')
results2020['OTHER'] = np.round(100 - results2020['REPUBLICAN'] - results2020['DEMOCRAT'],1)

# Add state names to pivot table for labels
states = votes[['state','state_po']].drop_duplicates()
states.set_index('state_po', inplace=True)
states
results2020 = results2020.join(states)

# Replace NaN values with 0
results2020.replace(np.nan, 0, inplace=True)

In [150]:
fig = px.choropleth(results2020,
                    locationmode='USA-states',
                    scope='usa',
                    locations=results2020.index,
                    color='REPUBLICAN',
                    color_continuous_scale='RdBu_r',
                    range_color = [0,100],
                    hover_name='state',
                    hover_data = ['REPUBLICAN','DEMOCRAT','OTHER']
                    )
fig.show()

Data to use:
* Tips dataset
* Iris dataset
* Energy Production Data ([EIA ]())

Plots to show:
* A couple basic/statistical charts
* Financial charts
* Maps
* 3-dimensional charts