# Final project for Social Data Analysis and Visualization (02806)

## Motivation

We chose to use two main datasets focusing on energy consumption and CO2 emissions
 - Data on Energy by our World in Data
 - https://github.com/owid/energy-data
 - Data on CO2 and greenhouse gas emissions by Our World in Data
 - https://github.com/owid/co2-data

We decided to go with these two datasets as they had the necessary data needed to properly explore the global warming of the world. They also have a large range in both years and countries covered for several important topics like energy consumption, production, and emissions. 

The goal for the end UX was to create an article styled website where users could explore the current data on global warming and try to figure out trends in the data to figure out what the general direction we are going towards. 
The goal was to naturally interest the users by encouraging them to interact with the data and we made sure not to conlclude anything for them concretely, as the idea was that each user should have their own experience and interpret the data their own way.



## Basic stats
  
 - choices in cleaning and preprocessing
 - discuss dataset stats and pey points/plots from the exploratory data analysis

We have worked with 4 main data sources from Our World in Data. To begin with, we have have looked at the average temperature anomaly, which is a collection of 681 records of the yearly average temperature in various parts of the world (Global, Southern Hemisphere, Northern Hemisphere, Tropics). Here we focus on global temperatures.

Secondly, we looked at the sources of CO2 emmisions, the data behind the pie chart on the website. Here we needed to format the data into the shape that was required for the chart, but beyond that we have just used the data values that were provided.

Last, but not least, for our main dataset we combined the CO2 emissions data and Energy data from Our World in Data. The resulting table consists of 181 columns covering a wide range of parameters.

In [None]:
PATH = Path(__file__).resolve().parents[0] / 'data'


def data_loader():
    df_energy = pd.read_csv('https://raw.githubusercontent.com/owid/energy-data/master/owid-energy-data.csv')
    df_co2 = pd.read_csv('https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv')
    df_temp = pd.read_csv(PATH / 'temperature-anomaly.csv')
    df_countries = pd.read_csv(PATH / 'continents2.csv', index_col=2)
    df_energy['continent'] = df_energy['iso_code'].apply(lambda x: get_continent(x, df_countries))
    df_co2['continent'] = df_co2['iso_code'].apply(lambda x: get_continent(x, df_countries))
    df_energy_dist = pd.read_csv(PATH / 'energy_distribution.csv')
    df = pd.merge(df_energy, df_co2.drop(['gdp', 'population', 'continent', 'country'], axis=1), on=['iso_code', 'year'])

    return df, df_temp, df_energy_dist



## Data analysis
 - Describe your data analysis and explain what you've learned about the dataset.
   - The data analysis consists of different plots used to visualise different aspects of the current condition of continents and countries in regards to energy expenditure and CO2 emissions. A lot of different conclusions can be drawn - the most notable can be seen in the plot of annual production based emission of CO2 measured in tonnes per person. In this graph its evident that the world in general hasn't moved noteworthy at all. We can also see that even though electricity based on oil and gas is beginning to decline these are by far the biggest sources of energy still. Comparatively it can be said that the "Green" energy sources combined are still far too small to actually compete against the fossil ones.


## Genre

We decided to do more of a magazine or article style that incorporates annotated charts for our story telling. To support the visual narrative we went with a consistent visual platform which utilizes a small select number of visualization types that become familiar objects to the user. Towards the end of the article we created a sort of sandbox section to allow the user to explore the data on their own using the provided filtering/selection options in order to draw their own conclusions. The structure of this website is very user directed path which sets the stage by presenting stimulating default views and many hover details so that the user can absorb the information in their own fashion. We also included some introductory text to help orient the user and a summary to wrap it all up and encourage the user to further reflect on what they have read.


 - 3 Visual narrative tools used
   - visual structuring -consistent visual platform 
   - highlighting - filtering/selection/search
   - transition guidance - familiar objects
  
 - 3 Narrative structure tools used
   - ordering - user directed path
   - interactivity - stimulating default views and hover highlighting/details
   - messaging - introductory text and summary/synthesis



## Visualizations

 - Explain the visualizations you've chosen
     - We have chosen a number of different visualisations for this part. We will not go into detail as to what they show, since that's stated on the website. The main reason for choosing these visualisations and not others is, that we feel they showcases the data in the most nuanced way as possible. Since the human mind is very keen at spotting differences between sizes and outliers, we have tried to convey the information in a way, where we utilize that part of the human perception. This is examplified with the two first plots where the breakdown of the data is done primarily by squares or triangles. We chose the pie chart (we know Pie carts are frown upon)regardless, because the visual split helped convey the message clearly. That combined with the tooltip makes the charts quite interpretable, since the actual numbers can be seen as well as the size compared to other datapoints. 

 - Why are they right for the story you want to tell?
     - They are right for the story, because they visualise the intentions in the most precise way possible. At the same time they call for interaction from the user, thus helping us create a better UX where the user feels compelled to interact with the plots. 


In [None]:
#Label formatter
def format_labels(fig):
    fig.update_layout(
        font_family="Times New Roman",
        font_color="black",
        title_font_family="Times New Roman",
        title_font_color="black",
        legend_title_font_color="black",
        title_font_size=12,
        template='plotly_white',
    )
    fig.update_layout(title_x=0.1)
    return fig

#Scatterplot
def create_scatter_plot(x, y, hover, color='continent', size='population', year_min=2018):
    _df = df[[x, y, size, color, 'year', 'iso_code']]
    _df = _df[_df['year'] >= year_min]
    _df.dropna(inplace=True)
    fig = px.scatter(
        _df, x=x, y=y,
        color=color,
        size=size,
        size_max=45,
        log_x=True, log_y=True,
        hover_name='iso_code',
        animation_frame='year',
        title=f"{utils.get_label(LABELS, y)} <br>vs {utils.get_label(LABELS, x)}"
    )
    fig = get_last_frame(fig)
    fig.update_traces(hovertemplate=hover)
    
    return st.plotly_chart(format_labels(fig), use_container_width=True)

#choropleth plots with animation
def choropleth_plot(y, year_min=2018):
    _df = df[df['year'] >= year_min]
    _df = _df[(_df['country'] != 'World') & (_df['country'] != 'World') & (_df['country'] != 'Asia Pacific') & (_df['country'] != 'OECD') & (_df['country'] != 'CIS') & (_df['country'] != 'Middle East') & (_df['country'] != 'Non-OECD')]
    fig = px.choropleth(
        _df.sort_values('year'),
        locations="iso_code",
        color=y,
        hover_name="country",
        color_continuous_scale='RdBu_r',
        range_color=[np.min(_df[y]), np.max(_df[y])],
        animation_frame='year',
    )
    fig.update_layout(
        title_text=utils.get_label(LABELS, y),

        geo=dict(
            showframe=False,
            showcoastlines=False,
            projection_type='equirectangular'
        ),
    )
    #start with last frame
    fig = get_last_frame(fig)
    
    return st.plotly_chart(format_labels(fig), use_container_width=True)

#treemap plots
def create_tree_plot(x, y, year=2018):
    _df = df.query(f'year == {year}')[[y, x, 'iso_code', 'country', 'continent']].dropna()
    fig = px.treemap(
        _df, 
        path=[px.Constant("world"), 'continent', 'country'], 
        values=y,
        color=x, color_continuous_scale='RdBu_r',
    )
    
    fig.update_traces(hovertemplate='<b>%{label} </b> <br>Box Color: %{color:.2f} <br>Box Size: %{value:.2f}') #
    fig.update_layout(margin = dict(t=0, l=0, r=0, b=0))
    
    return st.plotly_chart(format_labels(fig), use_container_width=True)

#line plots to show change
def create_lineplot_change(y, current_year, window_size):
    _df = df.query(f'year in {[current_year-window_size, current_year]}')[[y, 'year', 'iso_code', 'country', 'population', 'continent']]

    fig = go.Figure()
    fig.add_vline(x=current_year - window_size, line_width=2, line_color="black")
    fig.add_vline(x=current_year, line_width=2, line_color="black")

    changes = []

    for country in _df['country'].unique():
        df_c = _df[_df['country'] == country].sort_values('year')

        try:
            change = np.round(df_c.iloc[1][y] - df_c.iloc[0][y], 3)
        except Exception as e:
            change = np.nan

        changes.append(change)
        
        # Create and style traces
        if country == 'World':
            c = 'red'
            w = 3
            world = True
        else:
            c = 'white'
            w = 1
            world = False
        fig.add_trace(go.Scatter(x=df_c['year'], y=df_c[y],
                                 line=dict(color=c, width=w), name=country, hovertext=change))
        if world:
            fig.add_annotation(x=df_c['year'].iloc[1], y=df_c[y].iloc[1],
                               text="World",
                               showarrow=False,
                               xshift=20)


    fig.update_layout(showlegend=False)
    fig.update_layout(
        title_text=f"{utils.get_label(LABELS, y)} - global change: {np.nanmean(changes)}",
        geo=dict(
            showframe=False,
            showcoastlines=False,
            projection_type='equirectangular'
        ),
    )
    return st.plotly_chart(format_labels(fig), use_container_width=True)

#emissions pie chart
def create_emission_pie():
    _df = df_energy_dist.copy()

    fig = go.Figure(go.Sunburst(
        name = "",
        ids = _df['ids'],
        labels = _df['labels'],
        parents = _df['parents'],
        values=_df['share'],
        branchvalues="total",
        marker=dict(
            colors=_df['share'],
            colorscale='RdBu_r',
            cmid=_df['share'].mean()),
        hovertemplate='<b>%{label}: %{value:.2f}%',
        hoverinfo="none"
    ))

    fig.update_layout(margin = dict(t=0, l=0, r=0, b=0))

    return st.plotly_chart(format_labels(fig), use_container_width=True)

#energy consumption chart
def create_energy_consumption_source():
    labels = ['biofuel_consumption', 'coal_consumption', 'gas_consumption', 'hydro_consumption', 'nuclear_consumption', 'oil_consumption', 'other_renewable_consumption', 'solar_consumption', 'wind_consumption']
    _df = df[['year', 'biofuel_consumption', 'coal_consumption', 'country', 'gas_consumption', 'hydro_consumption', 'nuclear_consumption', 'oil_consumption', 'other_renewable_consumption', 'solar_consumption', 'wind_consumption']].dropna()
    _df = _df[(_df['country'] != 'World') & (_df['country'] != 'World') & (_df['country'] != 'Asia Pacific') & (_df['country'] != 'OECD') & (_df['country'] != 'CIS') & (_df['country'] != 'Middle East') & (_df['country'] != 'Non-OECD')]
    _df = _df.drop(['country'], axis=1)
    _df = _df.groupby('year').sum().reset_index()
    _df.columns = ['year'] + [f"{('_'.join(x.split('_')[:-1])).capitalize()}" for x in _df.columns.to_list()[1:]]
    _df = pd.melt(_df, id_vars=['year'], value_vars=_df.columns.to_list()[1:])
    fig = px.area(_df, x="year", y="value", color="variable", line_group="variable")
    fig.update_layout(
        hovermode="x unified",
        xaxis_title="Year",
        yaxis_title="Energy Consumption (TW/h)",
        legend_title='Source',
        legend_traceorder="reversed"
    )
    fig.update_traces(hovertemplate='%{y:.2f} TW/h')

    return st.plotly_chart(format_labels(fig), use_container_width=True)

## Discussion

 - what went well?
   - The Streamlit plugin to create the website went really well. It was easy to use and we were able to utilise what we had learned throughout the course in a easy way without to much of a hassle to create the actual website. This meant that we could focus on creating the visualisations instead of the website. 

 - what is still missing/ what could be improved? Why?
   - We could have gone into depth with specific columns to get in depth knowlege and maybe even do some Machine Learning to predict certain behaviours of the data. This could have been valuable to get a better insight into what we could expect. For instance we could have predicted CO2 emissions in 2030 based on the historical data, and a number of the other features. This was not chosen as the data comes with uncertainties and our focus were to give the user the ability to conclude on their own instead of us providing the user with conclusions.

## Contributions

![image info](./data/contributions.png)