# Introduction to plotly

Some of you may have used plotly in previous courses.

The plotly website is available at https://plot.ly/

The first step is for you to log into the plotly website and set up an account.

Then, install the plotly package in python by running the following cell (note that if you already have plotly installed, you'll want to change "install" to "upgrade"):

In [None]:
!conda -y install plotly

# <font color='red'> Q1: Record your plotly username (but NOT your API Key) after completing your set-up (2 points)</font>

To complete your setup, follow the steps after "Installation" at https://plot.ly/python/getting-started/

If your local credentials with your API key aren't created, you will output several errors in the remaining parts of the notebook.

#enter your username here

# END Q1

Let's start by loading some sample data:

In [None]:
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")

In [None]:
df.head()

# <font color='red'>Q2: Create some basic descriptive stats and visualizations using this dataset (2 points)</font>

In [None]:
# your code and results go here

# END Q2

Now let's get going with plotly.  Working with the same small dataset, we can generate a prettier table using plotly:

In [None]:
import plotly.plotly as py
import plotly.figure_factory as ff


table = ff.create_table(df)
py.iplot(table, filename='jupyter-table1')

Now let's look at a potentially more useful chart:  let's see a ranked plot of the income gaps from these colleges.  It's pretty easy using plotly:

In [None]:
import plotly.plotly as py
import plotly.graph_objs as go

data = [go.Bar(x=df.School,
            y=df.Gap)]

py.iplot(data, filename='jupyter-basic_bar')

That's great, but it's not a lot different from what we can do using seaborn (how would you do that in seaborn?).  One of the things that plotly is particularly good at is plotting data on maps.  Let's load another data set, this one showing data on nuclear waste sites on American campuses:

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/Nuclear%20Waste%20Sites%20on%20American%20Campuses.csv')
site_lat = df.lat
site_lon = df.lon
location_names = list(df.text)

In [None]:
df.head()

In [None]:
import plotly.plotly as py
import plotly.graph_objs as go

data = [
    go.Scattermapbox(
        lat=site_lat,
        lon=site_lon,
        mode='markers',
        marker=dict(
            size=17,
            color='rgb(255, 0, 0)',
            opacity=0.7
        ),
    ),
    go.Scattermapbox(
        lat=site_lat,
        lon=site_lon,
        mode='markers',
        marker=dict(
            size=8,
            color='rgb(242, 177, 172)',
            opacity=0.7
        ),
        hoverinfo='none'
    )]


layout = go.Layout(
    title='Nuclear Waste Sites on Campus',
    autosize=True,
    hovermode='closest',
    showlegend=False,
    mapbox=dict(
        bearing=0,
        center=dict(
            lat=38,
            lon=-94
        ),
        pitch=0,
        zoom=3,
        style='light'
    ),
)

fig = dict(data=data, layout=layout)

py.iplot(fig, filename='jupyter-Nuclear Waste Sites on American Campuses')

Not bad, but let's modify the above code to show the site name when you hover over it.

# <font color='red'> Q3: Add hover text (2 points)</font>

Change ```hoverinfo='none',``` to 
```
text=location_names,
hoverinfo='text',
``` 
and show the new map below.

In [None]:
# insert code for new map here

# END Q3

# Mapping World Development Indicators

Now that we've got some mapping basics under our belts, let's turn to an even more interesting dataset:  the World Bank's World Development
Indicators.

Why is it more interesting?  Just ask Hans Rosling

In [None]:
from IPython.display import YouTubeVideo
# Hans Rosling's "The Best Stats You've Ever Seen"
YouTubeVideo('hVimVzgtD6w')

We're not going to get quite that fancy, but let's start exploring some World Development Indicators visually.
We're going to focus on non-animated maps, at least for now.

First, read the data (from https://www.kaggle.com/worldbank/world-development-indicators):

In [None]:
indicators = pd.read_csv("Indicators.csv")

# <font color='red'> Q4: How many indicators are there? For what range of years? (2 points)</font>


In [None]:
indicators.head()

In [None]:
#insert your code here

# END Q4

There are a lot of different indicators available.

There's lots of indicators to choose from, but let's filter our dataset down to something more reasonable:

In [None]:
indicatorName = "Life expectancy at birth, total (years)"
indicatorYear = 2013

filtered = indicators[(indicators.IndicatorName==indicatorName) & (indicators.Year==indicatorYear)]

In [None]:
filtered.head()

Not surprisingly, the data needs some cleanup:

In [None]:
correction = {"Antigua and Barbuda":"Antigua", "Bahamas, The":"Bahamas", "Brunei Darussalam":"Brunei",
"Cabo Verde":"Cape Verde", "Congo, Dem. Rep.":"Democratic Republic of the Congo", "Congo, Rep.":"Republic of Congo", 
"Cote d'Ivoire":"Ivory Coast", "Egypt, Arab Rep.":"Egypt", "Faeroe Islands":"Faroe Islands", "Gambia, The":"Gambia", 
"Iran, Islamic Rep.":"Iran", "Korea, Dem. Rep.":"North Korea", "Korea, Rep.":"South Korea", "Kyrgyz Republic":"Kyrgyzstan",
"Lao PDR":"Laos", "Macedonia, FYR":"Macedonia", "Micronesia, Fed. Sts.":"Micronesia", "Russian Federation":"Russia",
"Slovak Republic":"Slovakia", "St. Lucia":"Saint Lucia", "St. Martin (French part)":"Saint Martin", 
"St. Vincent and the Grenadines":"Saint Vincent", "Syrian Arab Republic":"Syria", "Trinidad and Tobago":"Trinidad", 
"United Kingdom":"UK", "United States":"USA", "Venezuela, RB":"Venezuela", "Virgin Islands (U.S.)":"Virgin Islands", 
"Yemen, Rep.":"Yemen"}

filtered.replace(correction, inplace=True)

And finally, let's generate a map of our filtered data:

In [None]:
scl = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
            [0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']]

data = [ dict(
        type='choropleth',
        colorscale = scl,
        autocolorscale = False,
        locations = filtered.CountryCode.values,
        z = filtered.Value.values,
        text = filtered.CountryName,
        marker = dict(
            line = dict (
                color = 'rgb(255,255,255)',
                width = 2
            ) ),
        colorbar = dict(
            title = "Count")
        ) ]

layout = dict(
        title = '{} in {}'.format(filtered.IndicatorName.unique()[0],filtered.Year.unique()[0]),
        geo = dict(
            scope='world',
            projection=dict( type='natural earth' ),
            showlakes = True,
            lakecolor = 'rgb(255, 255, 255)'),
             )
    
fig = dict( data=data, layout=layout )
py.iplot( fig, filename='d3-choropleth-map' )

# <font color='red'> Q5: Your turn! (2 points)</font>

You might look at another indicator, or perhaps the same indicator but a different year. You might also want to change the map projection to be something different.

Suggested steps:
1. refilter your data
2. change color scheme (look up how to work with rgb color codes)
3. change the title
4. change scope
4. regenerate the map.

In [None]:
Insert your code here

If you're done, generate one or two more.  We'll share our work at the end of today's class.

# END Q5

# <font color="green">END OF NOTEBOOK</font>
## Remember to submit HTML and IPYNB files via Canvas.