# An interactive line graph showing differences in global temperature from the 1961-1990 average using Plotly Express

Inspired by a graph I saw at [Our World in Data](https://ourworldindata.org/), this visualization will show the upward trend in global temperatures from 1850 to 2022. 

I have employed Plotly Express and used a number of its features to make the graph interactive so that it can be used both to see the trends but also to explore the results without having to look up values in the original tabular form.

The data is sourced from [NASA and the Goddard Institute for Space Studies (GISS)](https://data.giss.nasa.gov/gistemp/).

### Import the data

In [1]:
# begin with installing Plotly
!pip install plotly



In [2]:
# basic imports
import pandas as pd
import plotly.express as px

In [68]:
# read the csv data
df = pd.read_csv('data/temperature-anomaly.csv')
df.head()

Unnamed: 0,Entity,Code,Year,Global average temperature anomaly relative to 1961-1990,Upper bound (95% confidence interval) of the annual temperature anomaly,Lower bound (95% confidence interval) of the annual temperature anomaly
0,Global,,1850,-0.417659,-0.246115,-0.589203
1,Global,,1851,-0.23335,-0.054832,-0.411868
2,Global,,1852,-0.229399,-0.049416,-0.409382
3,Global,,1853,-0.270354,-0.1107,-0.430009
4,Global,,1854,-0.29163,-0.150436,-0.432824


### EDA and data preparation

In [69]:
# begin the EDA
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 519 entries, 0 to 518
Data columns (total 6 columns):
 #   Column                                                                   Non-Null Count  Dtype  
---  ------                                                                   --------------  -----  
 0   Entity                                                                   519 non-null    object 
 1   Code                                                                     0 non-null      float64
 2   Year                                                                     519 non-null    int64  
 3   Global average temperature anomaly relative to 1961-1990                 519 non-null    float64
 4   Upper bound (95% confidence interval) of the annual temperature anomaly  519 non-null    float64
 5   Lower bound (95% confidence interval) of the annual temperature anomaly  519 non-null    float64
dtypes: float64(4), int64(1), object(1)
memory usage: 24.5+ KB


It is nice to see no missing values except the *Code* field which seems unused.

In [70]:
# drop Code which holds no data (0 non-null rows)
df = df.drop(columns=['Code'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 519 entries, 0 to 518
Data columns (total 5 columns):
 #   Column                                                                   Non-Null Count  Dtype  
---  ------                                                                   --------------  -----  
 0   Entity                                                                   519 non-null    object 
 1   Year                                                                     519 non-null    int64  
 2   Global average temperature anomaly relative to 1961-1990                 519 non-null    float64
 3   Upper bound (95% confidence interval) of the annual temperature anomaly  519 non-null    float64
 4   Lower bound (95% confidence interval) of the annual temperature anomaly  519 non-null    float64
dtypes: float64(3), int64(1), object(1)
memory usage: 20.4+ KB


In [71]:
df.shape

(519, 5)

In [72]:
# 519 rows, but are they all 'Global'?
df['Entity'].unique()

array(['Global', 'Northern Hemisphere', 'Southern Hemisphere'],
      dtype=object)

In [77]:
# no, so let's filter for 'Global' only
df_global = df.loc[df['Entity']=='Global']
df_global.shape

(173, 5)

### Basic line graph using Plotly Express

In [74]:
# basic line graph using Plotly Express
fig = px.line(
    df_global, 
    x="Year", 
    y="Global average temperature anomaly relative to 1961-1990",
    title='Global average temperature anomaly relative to 1961-1990',
)
fig.show()

Okay so we have our first line graph with no customization. It is clear and makes the upward trend perfectly visible and has some neat interactive features which you can see by hovering the mouse.

However to name a few aspects we will want to improve on: 
* it only shows the median
* it has no units
* it is really bland
* it doesn't emphasise the starting point (0 degrees)

Let's work on all these points

### Include lower and upper bounds

In [78]:
# first, let's look at more than the median
fig = px.line(
    df_global, 
    x="Year", 
    y=[
        "Global average temperature anomaly relative to 1961-1990",
        "Upper bound (95% confidence interval) of the annual temperature anomaly",
        "Lower bound (95% confidence interval) of the annual temperature anomaly",
    ],
    title='Global average temperature anomaly relative to 1961-1990',
)
fig.show()

In [79]:
# the column names are very long, so let's shorten them
df_global = df_global.rename(
    columns={
        'Global average temperature anomaly relative to 1961-1990' : 'Median',
        'Upper bound (95% confidence interval) of the annual temperature anomaly' : 'U.Bound',
        'Lower bound (95% confidence interval) of the annual temperature anomaly' : 'L.Bound',
    },
)

In [80]:
# round to 2 d.p.
columns_to_round = ['Median','U.Bound','L.Bound']
df_global[columns_to_round] = df_global[columns_to_round].round(2)
# reorder to put median in between bounds
df_global = df_global[['Year','U.Bound','Median','L.Bound']]
df_global.head()

Unnamed: 0,Year,U.Bound,Median,L.Bound
0,1850,-0.25,-0.42,-0.59
1,1851,-0.05,-0.23,-0.41
2,1852,-0.05,-0.23,-0.41
3,1853,-0.11,-0.27,-0.43
4,1854,-0.15,-0.29,-0.43


The data is how we want it now. Let's look at the line graph again.

In [84]:
# create the basic line graph
fig = px.line(
    df_global, 
    x="Year", 
    y=[
        "U.Bound",
        "Median",
        "L.Bound",
    ],
    title='Global average temperature anomaly relative to 1961-1990',
)

# display
fig.show()

And a small tweak with the appearance...

In [89]:
# create the basic line graph
fig = px.line(
    df_global, 
    x="Year", 
    y=[
        "U.Bound",
        "Median",
        "L.Bound",
    ],
    color_discrete_sequence=['red', 'blue', 'green'], # set colours
    title='Global average temperature anomaly relative to 1961-1990',
)

# set the overall layout
fig.update_layout(
    font=dict(family='Times New Roman'), # set font
    plot_bgcolor='rgb(242, 246, 252)', # set background colour as light grey
)

# display
fig.show()

I'm really happy with this so far. 

You can see the trend clearly, and it's neat to see the greater interquartile range as you go back in time which represents how the accuracy of global temperature measurement has improved over time.

However there are still some rough edges, and plenty that can be customized.

### Make the headings, axes and legend more meaningful

In [94]:
# set up the line graph
fig = px.line(
    df_global, 
    x="Year", 
    y=[
        "U.Bound",
        "Median",
        "L.Bound",
    ], 
    color_discrete_sequence=['red', 'blue', 'green'], # set colours
    title='Global average difference in temperature relative to 1961-1990', # I think difference is a better word than anomaly here
)

# set the overall layout
fig.update_layout(
    font=dict(family='Times New Roman'), # set font
    plot_bgcolor='rgb(242, 246, 252)', # set background colour as light grey
    legend_title=None, # remove unnecessary legend title
)

# set up x-axis
fig.update_xaxes(
    title_font=dict(family='Times New Roman'),
)
    
# set up y-axis
fig.update_yaxes(
    title_font=dict(family='Times New Roman'),
    title_text=f'Temperature ({chr(176)}C)',
)

#display
fig.show()

### Change the mouse hover event to a vertical trace

If you move the mouse over the graph it gives you readings from each line individually. It is difficult to navigate to see each reading.

To address this we will add a vertical trace on mouse hover, with a single view showing both bounds and median.

In [95]:
fig.update_layout(
    hovermode="x unified", # adds a vertical line trace on mouse hover
    hoverlabel= # formats the label on mouse hover
        dict(
            bgcolor="white",
            font_size=14,
        ),
)

# also needed to set up the trace on mouse hover
fig.update_traces(
    mode="lines",  # vertical line
    hovertemplate=None,
)

# re-set some graph features that are lost on update
fig.update_layout(
    legend_title=None, # remove unnecessary legend title
)
fig.update_xaxes(
    title_font=dict(family='Times New Roman'),
)
fig.update_yaxes(
    title_font=dict(family='Times New Roman'),
    title_text=f'Temperature ({chr(176)}C)',
)

#display
fig.show()

### Make the key aspects of the graph's narrative visible

To tell the story about the increase in global temperature we will use a horizontal line which shows the starting point (no change in temperature), and a vertical line to show the reference point against which the temperature differential has been calculated (the centre of the 1961-1990 period).

In [96]:
horizontal_line = dict( # add a horizontal line at 0 degrees
    type='line',
    yref='y',
    y0=0,
    y1=0,
    xref='paper',
    x0=0,
    x1=1,
    line=dict(color='black', width=2),
)
vertical_line = dict( # add a vertical line at the centre of the reference period
    type='line',
    yref='paper',
    y0=0,
    y1=1,
    xref='x',
    x0=1975.5,
    x1=1975.5,
    line=dict(color='grey', width=2, dash='dot')
)
vertical_line_label = dict( # label for centre of years 1961-1990
    xref='x',
    x=1976,
    yref='y',
    y=-0.5,   
    text=f'Centre = 0{chr(176)}C',
    font=dict(size=16),
    showarrow=True,
    arrowhead=1,
    ax=80,
    ay=0,
)

# add to graph
fig.update_layout(
    shapes=[ 
        horizontal_line,
        vertical_line,
    ],
    annotations=[
        vertical_line_label,
    ],
)

# re-set some graph features that are lost on update
fig.update_layout(
    legend_title=None, # remove unnecessary legend title
)
fig.update_xaxes(
    title_font=dict(family='Times New Roman'),
)
fig.update_yaxes(
    title_font=dict(family='Times New Roman'),
    title_text=f'Temperature ({chr(176)}C)',
)

#display
fig.show()

### What do we have now?

At this point I am pleased about how this graph is telling a key story about our current global situation. Nearly 1 degree over the last 50 years or so is a dramatic rise. 

I like the interactivity from the mouse hover event which allows the data to be read from without refering to the tabular data. You can select a sub-region by clicking and dragging also.

### What next with the graph?

Having done some further research I have maaxed out some of the features available to Plotly Express with regards to customization. For example, the ticks on the y-axis could do with having degress Celcius as their units, not just in the axis title, but to do this would mean moving beyond Express.

It would be great to apply some branding and a more stylish title, something I may do if I deploy the graph online in the future.

### What next with the data?

Regarding the data, our next step might be a regression model to see where we might reach with global temperatures in 10 or 20 years' time if these trends continued unabated. It is clearly not a linear relationship so a polynomial regression would be needed. 

However there are many predictive models available which take into account potential changes in countries' economic and energy policies, furthermore changes in population and economic/social development and the energy use arising from this, for better or worse, and these predictions would be of much greater value compared to what I could produce from this limited data.

Here at the United Nations would be a good start: [https://news.un.org/en/story/2023/05/1136732#:~:text=There%20is%20a%2066%20per,for%20at%20least%20one%20year.]
