<img src="../figures/HeaDS_logo_large_withTitle.png" width="300">

<img src="../figures/tsunami_logo.PNG" width="600">


# Plotly

## Python Open Source Graphing Library

Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.

![gallery](https://miro.medium.com/max/1458/1*qKpV3vkPZYoffsvFSEuw8A.png)


Plotly has an easy-to-use interface to it called [Plotly express](https://plotly.com/python/plotly-express/). This library makes plotting with Plotly very easy. Plotly express works nicely with Pandas dataframes as input, we just need to specify which columns need to be plotted.

## Import modules

In [1]:
import pandas as pd
import plotly.express as px

## Exercise 1: Line graphs

1) Create a line graph of life expectancy per year for the three countries Denmark, Romania and Ghana.


In [46]:
df = px.data.gapminder().query("country=='Denmark' | country=='Ghana' | country=='Romania'")
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
408,Denmark,Europe,1952,70.78,4334000,9692.385245,DNK,208
409,Denmark,Europe,1957,71.81,4487831,11099.65935,DNK,208
410,Denmark,Europe,1962,72.35,4646899,13583.31351,DNK,208
411,Denmark,Europe,1967,72.96,4838800,15937.21123,DNK,208
412,Denmark,Europe,1972,73.47,4991596,18866.20721,DNK,208


In [47]:
px.line(df, x = 'year', y = 'lifeExp')

2) Color each line by the country. What do you observe when comparing to the previous plot?

In [48]:
px.line(df, x = 'year', y = 'lifeExp', color = 'country')

In the first plot we got 4 lines and one of them is strangely straight. This is because while plotly knows which life expectancy and year belong together (they are in the same row in the dataframe) it has to guess which dots to connect into a line. As you can see that produces a line that should not be there.

The `color` arguments helps us to avoid that since it tells which points belong to the same line.

3) Change the template of the plot.

In [50]:
#first, we save the plot into an object called 'fig'
fig = px.line(df, x = 'year', y = 'lifeExp', color = 'country')
#then we apply 'update_layout'
fig.update_layout(template = 'plotly_white')

In [51]:
#and again for 'simple_white'
fig.update_layout(template = 'simple_white')

## Quiz 1

What would you do if instead of a line chart you wanted to show the data in a scatter plot?

In [52]:
px.scatter(df, y = 'lifeExp', x = 'year')

## Exercise 2: Scatter plots and trendlines

1) Using the data from 'Africa', create a scatter plot using GDP and population. Try to make the countries as distinguishable as possible.

In [59]:
#subset the data
df = px.data.gapminder().query("continent == 'Africa'")
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
24,Algeria,Africa,1952,43.077,9279525,2449.008185,DZA,12
25,Algeria,Africa,1957,45.685,10270856,3013.976023,DZA,12
26,Algeria,Africa,1962,48.303,11000948,2550.81688,DZA,12
27,Algeria,Africa,1967,51.407,12760499,3246.991771,DZA,12
28,Algeria,Africa,1972,54.518,14760787,4182.663766,DZA,12


In [55]:
fig = px.scatter(df, x = 'gdpPercap', y = 'pop', color = 'country', symbol = 'country')
fig.show()

2) With an OLS fit

In [58]:
fig = px.scatter(df, x = 'gdpPercap', y = 'pop',
                 color = 'country', symbol = 'country',
                trendline = 'ols')
fig.show()

2) With a lowess fit

In [57]:
fig = px.scatter(df, x = 'gdpPercap', y = 'pop',
                 color = 'country', symbol = 'country',
                trendline = 'lowess')
fig.show()

You can see the lowess is extremely sensitive to deviations and overfits the data.

## Exercise 3: Bar charts and histograms

1) Use the gapminder data for Oceania and show the GDP for each year in a bar plot.

In [65]:
df = px.data.gapminder().query("continent == 'Oceania'")
px.bar(df, x='year', y='gdpPercap')

2) Now separate the bars into countries and put them next to each other instead of stacked on top of each other.

In [66]:
px.bar(df, x='year', y='gdpPercap', color='country',barmode='group')

3) Plot a histogram that shows the life expectancy for countries in Europe, colored by the year. Display the count inside the bar.

In [67]:
#create dataframe
gapminder_data = px.data.gapminder()
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'].isin([1987,2007]))]
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
19,Albania,Europe,1987,72.0,3075321,3738.932735,ALB,8
23,Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
79,Austria,Europe,1987,74.94,7578903,23687.82607,AUT,40
83,Austria,Europe,2007,79.829,8199783,36126.4927,AUT,40
115,Belgium,Europe,1987,75.35,9870200,22525.56308,BEL,56


In [70]:
#histogram of life expectancy, colored by year
# Technically those are two histograms, one for each year.
# we use barmode='group' to put the bars next to each other. You can check how it looks without that
px.histogram(df, x="lifeExp", color = 'year',barmode='group', text_auto=True)

Bonus: Using the tips dataset, create a chart that displays the average total bill depending on the day of the week with a histfunc.

In [71]:
df = px.data.tips()
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [72]:
px.histogram(df, y="total_bill", x = 'day', histfunc='avg')

## Exercise 4: Boxplots and violin plots



In [73]:
#the dataframe
gapminder_data = px.data.gapminder()
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'].isin([1987,2007]))]
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
19,Albania,Europe,1987,72.0,3075321,3738.932735,ALB,8
23,Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
79,Austria,Europe,1987,74.94,7578903,23687.82607,AUT,40
83,Austria,Europe,2007,79.829,8199783,36126.4927,AUT,40
115,Belgium,Europe,1987,75.35,9870200,22525.56308,BEL,56


1) Make a boxplot of life expectancy versus the year.

In [76]:
px.box(df, y="lifeExp", x="year")

2) Make a violin plot

In [74]:
px.violin(df, y="lifeExp", x="year")

## Exercise 5: Heatmaps

1) Plot correlations of gapminder data for Europe

In [83]:
#get the dataframe
df = px.data.gapminder().query("continent == 'Europe'")
df.corr()





Unnamed: 0,year,lifeExp,pop,gdpPercap,iso_num
year,1.0,0.706021,0.086338,0.608753,4.244928e-14
lifeExp,0.7060212,1.0,0.062315,0.780783,-0.02575329
pop,0.08633815,0.062315,1.0,0.109724,0.1947089
gdpPercap,0.6087531,0.780783,0.109724,1.0,0.06331104
iso_num,4.244928e-14,-0.025753,0.194709,0.063311,1.0


In [84]:
px.imshow(df.corr(), text_auto=True,color_continuous_scale='RdBu_r', range_color=[-1,1])





2) The same for Africa

In [85]:
df = px.data.gapminder().query("continent == 'Africa'")
px.imshow(df.corr(), text_auto=True, color_continuous_scale='RdBu_r', range_color=[-1,1])





This choice of color scheme helps us to immediately see which columns are highly correlated (darker red) which have a low correlation.

## Exercise 6: Marginals and facets

1) Scatter plot with a histogram

In [86]:
#data
df = px.data.iris()
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,1
3,4.6,3.1,1.5,0.2,setosa,1
4,5.0,3.6,1.4,0.2,setosa,1


In [87]:
fig = px.scatter(df,
                 x="sepal_length",
                 y="sepal_width",
                 color="species",
                 marginal_x="box",
                 marginal_y="histogram",
                 size='petal_width')
fig.show()

2) Divided with a facet on the species variable.

In [88]:
fig = px.scatter(df,
                 x="sepal_length",
                 y="sepal_width",
                 color="species",
                 marginal_x = "box",
                 facet_col = 'species',
                 size='petal_width')
fig.show()

## Exercise 7: Range sliders

1) Using the Africa's gapminder dataset, create a scatter plot with a [range selector](https://plotly.com/python/reference/layout/xaxis/#layout-xaxis-rangeselector).

In [89]:
df = px.data.gapminder().query("continent=='Africa'")
fig = px.scatter(df,
              x="year",
              y="pop",
              color='country',
              title="Life expectancy per year")

fig.update_xaxes(rangeselector=dict(
            buttons=list([
                dict(count=10,
                     label="10y",
                     step="year")])),
                visible = True, type = 'date',
                rangeslider_visible=True)

fig.show()

2) Modify the tool tip so that when you hover over it will provide information about life expectancy, population, GDP and country code.

In [90]:
#lets check the column names
df.columns

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap',
       'iso_alpha', 'iso_num'],
      dtype='object')

In [None]:
fig = px.scatter(df,
              x="year",
              y="pop",
              color='country',
              title="Life expectancy per year",
              hover_data = {"country" : False, "gdpPercap":True,
                           'lifeExp' : True, 'pop' : True, 'iso_num' : True})

fig.update_xaxes(rangeselector=dict(
            buttons=list([
                dict(count=10,
                     label="10y",
                     step="year")])),
                visible = True, type = 'date',
                rangeslider_visible=True)

fig.show()

## Exercise 8: Animations

Animate an african gapminder bar plot so that we see the evolution of life expectancy over time. Remember to separate the countries

In [91]:
df = px.data.gapminder().query("continent=='Africa'")

fig = px.bar(df,
             y="country",
             x="lifeExp",
             orientation="h",
             color="country",
             animation_frame="year",
             animation_group="country",
            title="Evolution of GDP",
            text="gdpPercap",  range_x=[0, 100])#, range_y = )

fig.update_traces(texttemplate='%{text:.2f}')
fig.show()