# Animating Plots With Python

The aim of this notebook is to explore graph animation techniques with python.

To do so, we are using some data from [gapminder](https://www.gapminder.org/df/) about CO$_2$ Emissions, population and GDP per capita.
 
# Data preparation

Gapminder data are often used in examples about how to animate graphs with python. However, this time I wanted to add a personal touch to the publication, so I added some extra features to data to produce a different plot from the ones we can find in this type of demonstrations.

Hence, some previous steps of data preparation are needed. You can find a detailed description on the data wrangling process [in this notebook]().

In [None]:
df = pd.read_csv("prepared.csv")

# Animating Graphs Using `matplotlib`

First of all, we must load the necesary packages. When using matplotlib in jupyter notebooks I think it is great to use this little hack in the config &mdash;`%config InlineBackedn.figure_format='retina'`&mdash; to improve the quality of the outcoming plots.

In [19]:
import matplotlib.pyplot as plt
import seaborn as sns
%config InlineBackend.figure_format ='retina'

I like modifying a few parameters on seaborn style to get a nice eye-catching result, and an adequate resolution for the plot.

In [20]:
sns.set_style('whitegrid',{'grid.color':'.8'})
my_dpi=100

To then group by `continent`, I found it was necesary to transform this variable into categories.

In [27]:
df['continent']=pd.Categorical(df['continent'])

Something I had to do was to play with the axis limits. Here we can see that the max value for the Y axis (247) is way higher than the one that I set up afterwards (30). Data science is about making decissions. Here I chose to tell the story this way to highlight the linear relationship that stands out throughout the time between the GDP per capita and the CO$_2$ Emissions. Otherwise, this relationship is harder to appreciate. 

Furthermore, the amount of countries above the CO$_2$ Emissions limit of 30 are few and happens occasionaly. In a profesional context, the best practice would be to investigate if these data points are outliers or errors.

In [28]:
xmin = int(df['gdp_per_capita'].min())
xmax = int(df['gdp_per_capita'].max())

ymin = int(df['co2_per_capita'].min())
ymax = int(df['co2_per_capita'].max())

(xmin,xmax),(ymin,ymax)

((247, 177522), (0, 247))

A drawback that I found of using `matplotlib` is that it is complex to set fixed values for the size of the markers in the legend. [I got it from this stackoverflow answer](https://stackoverflow.com/a/47116009/11597692).

If you don't want the markers sticking out of the legend this step is mandatory.

So, we have to create a function to update a handle ([see the first parameter in `help(plt.axes.Axes.legend)`](https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend)) property in the `plt.legend()` function.

In [22]:
from matplotlib.legend_handler import HandlerPathCollection

marker_size = 36
def update_prop(handle, orig):
    handle.update_from(orig)
    handle.set_sizes([marker_size])

Then, you have to pass a dict to the `handler_map` parameter inside the `plt.legend()` funtion as you can see in the code chunk bellow.

In [23]:
# Some colors chosen from http://www.visibone.com/colorlab/
cdict = {
    "Asia": "#999900",
    "Europe": "#333399",
    "Africa": "#996600",
    "America": "#660000",
    "Oceania": "#669900"
}

# For each year:
for i in df.year.unique():
 
    # initialize a figure
    fig, ax = plt.subplots(1, 1)
    fig = plt.figure(figsize=(680/my_dpi, 480/my_dpi), dpi=my_dpi)
    
    # Add titles (main and on axis)
    plt.xscale('log')
    plt.xlabel("GDP per Capita")
    plt.ylabel("CO2 emissions per capita")
    plt.title("Year: "+str(i) )
    plt.xlim(100,200e3)
    plt.ylim(-1, 30)
    
    # Plot according to the year and contninent
    for continent in df.continent.unique():
        year_and_continent = ((df.year == i) & (df.continent == continent))
        x = df.gdp_per_capita[year_and_continent]
        y = df.co2_per_capita[year_and_continent]
        t = df.iso_alpha3_code[year_and_continent]
        s = df.population[year_and_continent] / 2e+5
        
        sc = plt.scatter(x, y, s=s, c=cdict[continent], label=continent, 
                         alpha=0.95, edgecolors="#eeee", linewidth=1)
        
    plt.legend(loc="upper left",
               handler_map={type(sc): HandlerPathCollection(update_func=update_prop)})
    
    # Save it
    filename='images/step_'+str(i)+'.png'
    plt.savefig(filename, dpi=my_dpi)
    plt.close("all") # to avoid displaying plots for every year

The result set of `.png` is saved and then transformed from png to GIF. For this I used the `PIL` package and `glob` to read the files. 

In [24]:
from PIL import Image
import glob

# Create the frames
frames = []
images = sorted(glob.glob("images/step_*.png",))

for i in images:
    new_frame = Image.open(i)
    frames.append(new_frame)

# Save into a GIF file that loops forever
frames[0].save('png_to_gif.gif', format='GIF',
               append_images=frames[1:], duration = 500,
               save_all=True, loop=0)

## Pros of `matplotlib`
* Full customizable data representation.

## Cons of `matplotlib` 
* Requires a considerable amount of code.
* There are some dependencies.
* Not interactive.


## The result

Finally, we can display the created gif with markdown language as in the snippet bellow:

```markdown
![Figure caption](path/to/the/image/file.extension)
```
![Plot animated with python](png_to_gif.gif)

# Animating Graphs Using `plotly.express`

In [25]:
import plotly.express as px

fig = px.scatter(df, x="gdp_per_capita", y="co2_per_capita", animation_frame="year",
                 animation_group="country", title = 'Time Series Plot Test', color="continent",
                 hover_name="country", log_x=True, range_y=[-1,30], range_x=[100,200e3], 
                 size="population", size_max=50)

fig.update_xaxes(title_text='GDP per Capita')
fig.update_yaxes(title_text='CO2 emissions per capita')
plt.close("all")

When wirting a post like this one, you need to save the result to then display it. I found that this was not an easy task. After searching through several sources I found [this great post by Matteo Guzzo](https://matteoguzzo.com/blog/embed-html-graphs-plotly/) which helped me a lot to complete this part.

To sum up, after creating the plot, we must save it into an `.html` file to then display it. The file must include a link to the `plotly.js` library. That's why it is mandatory to use this `include_plotlyjs='cdn'` parameter when saving the plot. See the Matteo's blog post for more detail on how this works.

In [26]:
with open('plotly_graph.html', 'w') as f:
    f.write(fig.to_html(include_plotlyjs='cdn'))

## Pros of `plotly`
* Reach a beatiful data representation with relatively little code
* Allows interactive plots

## Cons of `plotly`
* There's no easy way to put the result in a blog post.
* Plot owned by `plotly`.