# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib_venn import venn2
import plotly.express as px


# Uncomment if you haven't installed the following packages
#%pip install matplotlib-venn
# %pip install plotly


# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


# Read and clean data

### Temperature data
*We start by importing and cleaning temperature data from Goddard Institute of Space Studies*  
Link to data: https://data.giss.nasa.gov/gistemp/

In [3]:
#Loading dataset on global temperature
GlobalTemp = 'GlobalTemp.csv'
pd.read_csv(GlobalTemp).head()

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Land-Ocean: Global Means
Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,J-D,D-N,DJF,MAM,JJA,SON
1880,-.17,-.24,-.08,-.16,-.09,-.20,-.17,-.10,-.13,-.23,-.21,-.17,-.16,***,***,-.11,-.16,-.19
1881,-.19,-.13,.04,.06,.06,-.18,.01,-.03,-.15,-.21,-.18,-.07,-.08,-.09,-.16,.05,-.07,-.18
1882,.17,.14,.04,-.16,-.14,-.23,-.16,-.07,-.14,-.23,-.16,-.35,-.11,-.08,.08,-.09,-.15,-.18
1883,-.29,-.36,-.12,-.18,-.17,-.08,-.06,-.13,-.21,-.10,-.22,-.10,-.17,-.19,-.33,-.15,-.09,-.18


In [4]:
#Skipping the first row to make months headers
GlobalTemp = pd.read_csv(GlobalTemp, skiprows=1)
GlobalTemp.head()

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,J-D,D-N,DJF,MAM,JJA,SON
0,1880,-0.17,-0.24,-0.08,-0.16,-0.09,-0.2,-0.17,-0.1,-0.13,-0.23,-0.21,-0.17,-0.16,***,***,-0.11,-0.16,-0.19
1,1881,-0.19,-0.13,0.04,0.06,0.06,-0.18,0.01,-0.03,-0.15,-0.21,-0.18,-0.07,-0.08,-.09,-.16,0.05,-0.07,-0.18
2,1882,0.17,0.14,0.04,-0.16,-0.14,-0.23,-0.16,-0.07,-0.14,-0.23,-0.16,-0.35,-0.11,-.08,.08,-0.09,-0.15,-0.18
3,1883,-0.29,-0.36,-0.12,-0.18,-0.17,-0.08,-0.06,-0.13,-0.21,-0.1,-0.22,-0.1,-0.17,-.19,-.33,-0.15,-0.09,-0.18
4,1884,-0.12,-0.07,-0.35,-0.39,-0.33,-0.35,-0.29,-0.27,-0.26,-0.24,-0.33,-0.3,-0.28,-.26,-.10,-0.36,-0.3,-0.28


In [5]:
#Removing unused data
drop_these = ['J-D','D-N','DJF','MAM','JJA','SON']

GlobalTemp.drop(drop_these, axis=1, inplace=True)
GlobalTemp.head()

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,1880,-0.17,-0.24,-0.08,-0.16,-0.09,-0.2,-0.17,-0.1,-0.13,-0.23,-0.21,-0.17
1,1881,-0.19,-0.13,0.04,0.06,0.06,-0.18,0.01,-0.03,-0.15,-0.21,-0.18,-0.07
2,1882,0.17,0.14,0.04,-0.16,-0.14,-0.23,-0.16,-0.07,-0.14,-0.23,-0.16,-0.35
3,1883,-0.29,-0.36,-0.12,-0.18,-0.17,-0.08,-0.06,-0.13,-0.21,-0.1,-0.22,-0.1
4,1884,-0.12,-0.07,-0.35,-0.39,-0.33,-0.35,-0.29,-0.27,-0.26,-0.24,-0.33,-0.3


In [6]:
#Removing the year 2024, since there are a limited number of observations for temperature
GlobalTemp = GlobalTemp[GlobalTemp['Year'] != 2024]

In [7]:
#Calculate annual average temperature
GlobalTemp = GlobalTemp.apply(pd.to_numeric, errors='coerce') #Convert all non-numeric values to NaN
GlobalTemp['Mean'] = GlobalTemp.iloc[:, 1:].mean(axis=1)                  #Calculates the mean value
GlobalTemp.head()

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Mean
0,1880,-0.17,-0.24,-0.08,-0.16,-0.09,-0.2,-0.17,-0.1,-0.13,-0.23,-0.21,-0.17,-0.1625
1,1881,-0.19,-0.13,0.04,0.06,0.06,-0.18,0.01,-0.03,-0.15,-0.21,-0.18,-0.07,-0.080833
2,1882,0.17,0.14,0.04,-0.16,-0.14,-0.23,-0.16,-0.07,-0.14,-0.23,-0.16,-0.35,-0.1075
3,1883,-0.29,-0.36,-0.12,-0.18,-0.17,-0.08,-0.06,-0.13,-0.21,-0.1,-0.22,-0.1,-0.168333
4,1884,-0.12,-0.07,-0.35,-0.39,-0.33,-0.35,-0.29,-0.27,-0.26,-0.24,-0.33,-0.3,-0.275


### GDP Data
*We load data for Global GDP* from the World Bank
Link to data: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD?locations=1W 

In [8]:
#Load dataset
GDP = 'GDP.xls'
#%pip install xlrd
pd.read_excel(GDP, sheet_name=0).head()

Unnamed: 0,Data Source,World Development Indicators,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67
0,Last Updated Date,2024-03-28 00:00:00,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,Country Name,Country Code,Indicator Name,Indicator Code,1960.0,1961.0,1962.0,1963.0,1964.0,1965.0,...,2014.0,2015.0,2016.0,2017.0,2018.0,2019.0,2020.0,2021.0,2022.0,2023.0
3,Aruba,ABW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,2981501000.0,2962907000.0,3013858000.0,3226291000.0,3303132000.0,3227067000.0,2453133000.0,3131163000.0,3458630000.0,
4,Africa Eastern and Southern,AFE,GDP (constant 2015 US$),NY.GDP.MKTP.KD,154776400000.0,155170900000.0,167531600000.0,176156400000.0,184223200000.0,194072200000.0,...,905660100000.0,932513500000.0,953206100000.0,977722000000.0,1002081000000.0,1022529000000.0,993908200000.0,1036651000000.0,1072261000000.0,


In [9]:
#Delete first 3 rows
GDP = pd.read_excel(GDP, skiprows=3)
GDP.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Aruba,ABW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,2981501000.0,2962907000.0,3013858000.0,3226291000.0,3303132000.0,3227067000.0,2453133000.0,3131163000.0,3458630000.0,
1,Africa Eastern and Southern,AFE,GDP (constant 2015 US$),NY.GDP.MKTP.KD,154776400000.0,155170900000.0,167531600000.0,176156400000.0,184223200000.0,194072200000.0,...,905660100000.0,932513500000.0,953206100000.0,977722000000.0,1002081000000.0,1022529000000.0,993908200000.0,1036651000000.0,1072261000000.0,
2,Afghanistan,AFG,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,18860500000.0,19134220000.0,19566720000.0,20084650000.0,20323500000.0,21118470000.0,20621960000.0,16345200000.0,,
3,Africa Western and Central,AFW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,105891200000.0,107858400000.0,111927800000.0,120073100000.0,126572600000.0,131742700000.0,...,748211900000.0,769263200000.0,770356300000.0,787968700000.0,810337800000.0,836276000000.0,828430400000.0,861371400000.0,893813700000.0,
4,Angola,AGO,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,89650500000.0,90496420000.0,88161510000.0,88031780000.0,86872970000.0,86262880000.0,81399190000.0,82375340000.0,84884000000.0,


In [10]:
#Remove space in column names
GDP = GDP.rename(columns=lambda x: x.replace(' ', '_'))
GDP.head()

Unnamed: 0,Country_Name,Country_Code,Indicator_Name,Indicator_Code,1960,1961,1962,1963,1964,1965,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Aruba,ABW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,2981501000.0,2962907000.0,3013858000.0,3226291000.0,3303132000.0,3227067000.0,2453133000.0,3131163000.0,3458630000.0,
1,Africa Eastern and Southern,AFE,GDP (constant 2015 US$),NY.GDP.MKTP.KD,154776400000.0,155170900000.0,167531600000.0,176156400000.0,184223200000.0,194072200000.0,...,905660100000.0,932513500000.0,953206100000.0,977722000000.0,1002081000000.0,1022529000000.0,993908200000.0,1036651000000.0,1072261000000.0,
2,Afghanistan,AFG,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,18860500000.0,19134220000.0,19566720000.0,20084650000.0,20323500000.0,21118470000.0,20621960000.0,16345200000.0,,
3,Africa Western and Central,AFW,GDP (constant 2015 US$),NY.GDP.MKTP.KD,105891200000.0,107858400000.0,111927800000.0,120073100000.0,126572600000.0,131742700000.0,...,748211900000.0,769263200000.0,770356300000.0,787968700000.0,810337800000.0,836276000000.0,828430400000.0,861371400000.0,893813700000.0,
4,Angola,AGO,GDP (constant 2015 US$),NY.GDP.MKTP.KD,,,,,,,...,89650500000.0,90496420000.0,88161510000.0,88031780000.0,86872970000.0,86262880000.0,81399190000.0,82375340000.0,84884000000.0,


In [11]:
#Select only World
GDP = GDP.loc[GDP.Country_Name == 'World']

In [12]:
#Keep only years
drop_these = ['Country_Name','Country_Code', 'Indicator_Name', 'Indicator_Code']
GDP.drop(drop_these, axis=1, inplace=True)
GDP

Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
259,10918270000000.0,11330200000000.0,11939240000000.0,12559480000000.0,13383940000000.0,14130420000000.0,14941560000000.0,15557760000000.0,16479040000000.0,17432950000000.0,...,73041200000000.0,75283270000000.0,77381230000000.0,80009140000000.0,82630470000000.0,84771250000000.0,82179100000000.0,87297710000000.0,89994660000000.0,


In [13]:
#Transpose the table
GDP = GDP.transpose()
GDP['Year'] = GDP.index
GDP

Unnamed: 0,259,Year
1960,1.091827e+13,1960
1961,1.133020e+13,1961
1962,1.193924e+13,1962
1963,1.255948e+13,1963
1964,1.338394e+13,1964
...,...,...
2019,8.477125e+13,2019
2020,8.217910e+13,2020
2021,8.729771e+13,2021
2022,8.999466e+13,2022


In [14]:
#Rename the columns
GDP.rename(columns = {259:'GDP'}, inplace=True)
GDP.reset_index(inplace = True, drop = True)
GDP

Unnamed: 0,GDP,Year
0,1.091827e+13,1960
1,1.133020e+13,1961
2,1.193924e+13,1962
3,1.255948e+13,1963
4,1.338394e+13,1964
...,...,...
59,8.477125e+13,2019
60,8.217910e+13,2020
61,8.729771e+13,2021
62,8.999466e+13,2022


In [15]:
#Removing the year 2023, since there is no data
GDP.loc[GDP.GDP >= 0]

Unnamed: 0,GDP,Year
0,1.091827e+13,1960
1,1.133020e+13,1961
2,1.193924e+13,1962
3,1.255948e+13,1963
4,1.338394e+13,1964
...,...,...
58,8.263047e+13,2018
59,8.477125e+13,2019
60,8.217910e+13,2020
61,8.729771e+13,2021


## Explore each data set

#### Temperature Data

In [22]:
# Creating an interactive line plot with plotly
fig = px.line(GlobalTemp, x='Year', y='Mean', title='Global Mean Temperature over years')

# Adjusting layout to make the plot area taller and set y-axis range
fig.update_layout(
    height=800,  # Adjust height of the entire figure
    yaxis=dict(range=[-0.5, 1.25], title='Degrees celsius')  # Set y-axis range and title
)

# Add vertical lines
fig.add_shape(type="line", x0=1940, y0=-0.5, x1=1940, y1=1.25, line=dict(color="Red", width=1, dash="dash"), name="1940")
fig.add_shape(type="line", x0=1980, y0=-0.5, x1=1980, y1=1.25, line=dict(color="Green", width=1, dash="dash"), name="1980")

# Update hover template
fig.update_traces(mode='lines+markers', hovertemplate='Year: %{x}<br>Temperature: %{y}°C')

fig.show()

**Description of temperature data:**  
We can see that the average annual world temperature has been volatile throughout the entire period.  
We also see, that the temperature seems to have been somewhat stationary from the period 1880 to 1940 and 1940 to 1980 whereafter it been increasing slowly throughout the remaining period. The three periods are seperated in the figure by vertical dotted lines.  

**Note**: It is possible to hover over the figure to highlight specific yearly observations

#### GDP 

In [21]:
# Creating an interactive line plot with plotly
fig = px.line(GDP, x='Year', y='GDP', title='World GDP over years')

# Adjusting layout to make the plot area taller and set y-axis range
fig.update_layout(
    height=800,  # Adjust height of the entire figure
    yaxis=dict(title='Trillion 2015 USD')  # Set y-axis title
)


# Update hover template
fig.update_traces(mode='lines+markers', hovertemplate='Year: %{x}<br>Temperature: %{y}°C')

fig.show()

**Description of GDP data**  
We note that world GDP has been increasing in an almost linear fashion throughout the entire period.  
Also note that GDP takes a dive in both 2009 (financial crisis) and 2020 (Covid).

**Note**: It is possible to hover over the figure to highlight specific yearly observations

**Interactive plot** :

Explain what you see when moving elements of the interactive plot around. 

# Merge data sets

In [None]:

GDP['Year'] = GDP['Year'].astype(int)
GlobalTemp['Year'] = GlobalTemp['Year'].astype(int)
MERGED = pd.merge(GlobalTemp,GDP,on='Year', how='inner')
MERGED.head()

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Mean,GDP
0,1960,0.0,0.13,-0.35,-0.15,-0.08,-0.04,-0.04,0.02,0.07,0.06,-0.11,0.19,-0.025,10918270000000.0
1,1961,0.07,0.19,0.09,0.13,0.12,0.11,0.01,0.01,0.08,0.0,0.03,-0.16,0.056667,11330200000000.0
2,1962,0.05,0.15,0.1,0.05,-0.06,0.03,0.02,-0.01,0.0,0.01,0.06,-0.03,0.030833,11939240000000.0
3,1963,-0.03,0.18,-0.14,-0.07,-0.06,0.05,0.06,0.23,0.18,0.14,0.15,-0.03,0.055,12559480000000.0
4,1964,-0.09,-0.1,-0.21,-0.32,-0.25,-0.04,-0.04,-0.22,-0.29,-0.31,-0.21,-0.3,-0.198333,13383940000000.0


# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.