# Interactive Visualization of a Gapminder Data Set

This is a tiny bit of review, tacking Altair onto our arsenal of plotting tools.

We will find it useful to use an upgraded version of Altair and Seaborn, as well as to install Plotly Express.  If needed, execute the following cell to do that, then restart the notebook by shutting it down and re-opening.

In [None]:
!pip install -U altair seaborn plotly_express

## First step:  import the libraries and data

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import altair as alt
import plotly.express as px

We are going to look at Plotly's subset of data from Gapminder. 

[You may find it very interesting to explore https://www.gapminder.org/; Gapminder is an independent educational non-proﬁt ﬁghting global misconceptions.]

In [None]:
gp = px.data.gapminder()

In [None]:
gp.head()

In [None]:
gp_usa = gp.loc[gp['country']=='United States']

In [None]:
gp_usa

## Matplotlib

Tutorials: https://matplotlib.org/stable/tutorials/index.html

In [None]:
plt.plot(gp_usa['year'], gp_usa['lifeExp'])

In [None]:
plt.plot(gp_usa['year'], gp_usa['lifeExp'], 'o')

In [None]:
plt.scatter(gp_usa['year'], gp_usa['lifeExp'])
# plt.bar(gp_usa['year'], gp_usa['lifeExp'])
# plt.barh(gp_usa['year'], gp_usa['lifeExp'])

In [None]:
plt.scatter(gp_usa['year'], gp_usa['lifeExp'], color='black')
plt.plot(gp_usa['year'], gp_usa['lifeExp'], color='black')

plt.title('Life Expectancy vs Year for the Gapminder Data about USA', fontsize=16)
plt.xlabel('Year', fontsize = 12)
plt.ylabel('Life Expectancy', fontsize = 12)

## Pandas plotting

In [None]:
# instead of :
# plt.plot(gp_usa['year'], gp_usa['lifeExp'])
# we use the "plot" method of the dataframe itself

gp_usa.plot(x = 'year', y = 'lifeExp')

In [None]:
gp_usa.plot(x = 'year', y = 'lifeExp')
# gp_usa.plot(x = 'year', y = 'lifeExp', kind = 'scatter')
# gp_usa.plot.scatter(x = 'year', y = 'lifeExp')

In [None]:
# we can take the pandas plotting:
gp_usa.plot(x = 'year', y = 'lifeExp', kind = 'scatter')

# and combine it with lower level customization via matplotlib
plt.title('Life Expectancy vs Year for the Gapminder Data about USA', fontsize=16)
plt.xlabel('Year', fontsize = 12)
plt.ylabel('Life Expectancy', fontsize = 12)

In [None]:
# To combine plots, we can use the ax objects
ax = gp_usa.plot(x = 'year', 
                 y = 'lifeExp', 
                 kind = 'scatter', 
                 color='black')
gp_usa.plot(x = 'year', 
            y = 'lifeExp', 
            kind = 'line', 
            color='black',
            ax=ax)

# and combine it with lower level customization via matplotlib
plt.title('Life Expectancy vs Year for the Gapminder Data about USA', fontsize=16)
plt.xlabel('Year', fontsize = 12)
plt.ylabel('Life Expectancy', fontsize = 12)

plt.show()

## New libraries for interactive visualization:  Altair and Plotly

In [None]:
points = alt.Chart(gp_usa).mark_point().encode(
    x='year',
    y='lifeExp'
)
points

In [None]:
gp_usa['year'] = pd.to_datetime(gp_usa['year'], format = '%Y')

In [None]:
gp_usa.info()

In [None]:
gp_usa

In [None]:
points = alt.Chart(gp_usa).mark_point().encode(
    x='year',
    y='lifeExp'
)
points

In [None]:
points = alt.Chart(gp_usa).mark_point().encode(
    x='gdpPercap',
    y='lifeExp'
)
points

Branching out:  Let's look at all countries

In [None]:
points = alt.Chart(gp).mark_point().encode(
    x='gdpPercap',
    y='lifeExp'
)
points

The scale is skewed, so it can be useful to rescale the axes.  This will allow us to more easily distinguish the values of different countries.

In [None]:
import math

In [None]:
# make a new column of the log of another

loggdp = []

for i in gp['gdpPercap']:
    loggdp.append(math.log(i))

gp['gdpPercapLog'] = loggdp


# .... or ....


gp['gdpPercapLog'] = gp['gdpPercap'].apply(lambda x: math.log(x))

In [None]:
points = alt.Chart(gp).mark_point().encode(
    x='gdpPercapLog',
    y='lifeExp'
)
points

Altair will allow us to use javascript in tandem with the Python -> bring in interactivity!

In [None]:
selopac = alt.selection_point(nearest=True, on='mouseover', fields=['year'])

points = alt.Chart(gp).mark_point().encode(
    x='gdpPercapLog',
    y='lifeExp',
    opacity=alt.condition(selopac, alt.value(1), alt.value(0.2))
).add_params(
    selopac
)

points

In [None]:
selopac = alt.selection_point(fields=['country'], bind='legend')

points = alt.Chart(gp.loc[gp['pop']>100000000]).mark_point().encode(
    x='gdpPercapLog',
    y='lifeExp',
    color=alt.Color('country'),
    opacity=alt.condition(selopac, alt.value(1), alt.value(0.2))
).add_params(
    selopac
)

points

We can furthermore do a very fancy thing and combine interactive elements across several plots.

Let's make a scatter plot that also allows us to visualize the distribution of points within a region of space.

In [None]:
bars = alt.Chart(gp).mark_bar().encode(
    x='count(lifeExp)',
    y='lifeExp'
)
bars

In [None]:
bars = alt.Chart(gp).mark_bar().encode(
    alt.X('lifeExp', bin=True),
    y='count()'
)
bars

In [None]:
bars = alt.Chart(gp).mark_bar().encode(
    alt.X('gdpPercap', bin=True),
    y='count()'
)
bars

In [None]:
bars = alt.Chart(gp).mark_bar().encode(
    alt.X('gdpPercapLog', bin=True),
    y='count()'
)
bars

In [None]:
points = alt.Chart(gp).mark_point().encode(
    x='gdpPercapLog',
    y='lifeExp'
)

barsX = alt.Chart(gp).mark_bar().encode(
    alt.X('gdpPercapLog',bin=True),
    y='count()'
)

barsY = alt.Chart(gp).mark_bar().encode(
    alt.Y('lifeExp',bin=True),
    x='count()'
)

chart = alt.vconcat(barsX,
            alt.hconcat(points,barsY))

chart

In [None]:
my_si = alt.selection_interval()

points = alt.Chart(gp).mark_point().encode(
    x='gdpPercapLog',
    y='lifeExp'
).add_params(
    my_si
)

barsX = alt.Chart(gp).mark_bar().encode(
    alt.X('gdpPercapLog',bin=True),
    y='count()'
).transform_filter(
    my_si
)

barsY = alt.Chart(gp).mark_bar().encode(
    alt.Y('lifeExp',bin=True),
    x='count()'
).transform_filter(
    my_si
)

chart = alt.vconcat(barsX,
            alt.hconcat(points,barsY))

chart

In [None]:
points = alt.Chart(gp).mark_point().encode(
    alt.X('gdpPercapLog',scale=alt.Scale(domain=[4, 12])),
    alt.Y('lifeExp',scale=alt.Scale(domain=[20, 90]))
).add_params(
    my_si
)

barsX = alt.Chart(gp).mark_bar().encode(
    alt.X('gdpPercapLog',bin=True,scale=alt.Scale(domain=[4, 12])),
    y='count()'
).transform_filter(
    my_si
)

barsY = alt.Chart(gp).mark_bar().encode(
    alt.Y('lifeExp',bin=True,scale=alt.Scale(domain=[20, 90])),
    x='count()'
).transform_filter(
    my_si
)

chart = alt.vconcat(barsX,
            alt.hconcat(points,barsY))

chart

In [None]:
chart.save('/home/jovyan/lifeExpVsGDP_chart.html')

## Telling stories with Plotly & animation

* Reviewing our previously shown steps to build an animated visualization

In [None]:
px.scatter(gp_usa, 
           x="gdpPercap", 
           y="lifeExp")

The above is just for the USA.  We're going to expand to all countries now:

In [None]:
px.scatter(gp, 
           x="gdpPercap", 
           y="lifeExp")

Wait... not only do we not know which point is which country, we also don't know how the points evolve in time.

In [None]:
px.scatter(gp, 
           x="year", 
           y="lifeExp")

We could look at a plot of all values for a given year.

In [None]:
years = gp.year.unique()

i = years[0]

px.scatter(gp.loc[gp['year']==i], 
           x="gdpPercap", 
           y="lifeExp")

What's that outlier?

Let's add 'hover_name' so that we can more easily get information about points by simply moving our mouse to them.

In [None]:
i = years[0]

px.scatter(gp.loc[gp['year']==i], 
           x="gdpPercap", 
           y="lifeExp",
           hover_name='country')

This will benefit from further customization:

* At the moment, the scale has lots of low gdpPercap.  We can stretch out this scale to make the separation more visible by making it log scale.
* How do we know what's evolving where?
  * Add color so we can keep track of individual points
  * Lots of colors.... so also add population to distinguish the dots
* Change the axes' ranges to keep all points within the visualized space
* Change the axis ratio to spread out the points
* Change the size of the points to make it easier on our eyes to see smaller points

* **Change how we look at time:**
  * One way to look at time -> manually change what time you are plotting
  * Another way to visualize change over time -> dynamically change the plot in real time


In [None]:
px.scatter(gp, 
           x="gdpPercap", 
           y="lifeExp",
           hover_name='country', color='country', size='pop',
           log_x=True,
           range_x=[100,100000], 
           range_y=[25,90],
           width=800, 
           height=600,
           size_max=60,
           template='simple_white',
           animation_frame="year",
          )