<img src="https://drive.google.com/uc?export=view&id=1K_3ZyPs9SuRvz0g4zDHWwFVjmZqTNzDY" align="right" width="200"/>

# Management of project report
### Supervised by: Dr. Sebastian HORL
### Elaborated by: Aissaoui Ilhem M2 SIA
### Gustave Eiffel University


---


## **Climate Change**
#### **Global warming analysis & predictions**

---
Huge fires raged this summer on the island of Evia and elsewhere in Greece, Turkey, Morocco, Algeria, Spain, Italy, Siberia, Scandinavia and California, among others, have also been hit hard by forest fires. Because of the drought and rising temperatures, there are more and more mega-fires destroying entire regions and millions of hectares of forests.Many scientists assume that this fires is clearly related to climate change, and these conditions will result in more vegetation drying out, and therefore more intense and harder to control fires. After this giant fires that devastated various regions in the world, and destroyed thousands of hectares of forests, I decided to take a look to the global temperatures and how exaclty are they evolving and have been evolving during the years.

##### **Questions**
- Is there a global warming?
- When did Global Warming Started?
- What is the trend of temperature change in the world?
- What are the most countries that suffer from temperature increasing ?
-  Interactive Map of the countries - Temperature increase over the years

In this project i will analyse The Climate Change Open Data from Kaggle to study the Earth's temperature.

<a id="dataset"></a> <br> 
#### **Dataset** 
I used a open data set (Climate Change: Earth Surface Temperature Data) from Kaggle website, you can find the link below: 
- https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data?select=GlobalTemperatures.csv
- https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data?select=GlobalLandTemperaturesByCity.csv

I used also the data set of country and continents code (Country Mapping - ISO, Continent, Region)

https://www.kaggle.com/andradaolteanu/country-mapping-iso-continent-region?select=continents2.csv

<a id="libraries"></a> <br> 
#### **Languages and libraries** 
In my case i use a **jupyter notebook**, that is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
I run a jupyter in Google Colaboratory, often shortened to **"Colab"**, that allows you to write and execute Python code in your browser. It offers the following advantages: No configuration required, free access to GPUs, easy sharing.
I used **python** as a programming language.
Library **Panda** hat helps you load the data, prepare it and perform some lightweight analysis. 
I used **Numpy** - a library that lets you run blazing fast computation with vectors, matrices and other tensors.
For visualization, i used **matplotlib** library, it allow you to write complicated stuff with convenience (e.g. super-detailed plots or custom animations), **Plotly** and **Seabon** library to make interactive graphs.


<a id="import"></a> <br> 
#### **Import libraries**
Go to colab website https://colab.research.google.com/ , login with your Google account, and create new notebook File -> new Notebook, after that click + Code , and write the line of code below to import libraries.

In [71]:
# Import library
import pandas as pd # library to load data 
import numpy as np # library to run blazing fast computation
import seaborn as sns # library to make interactive graph
import matplotlib.pyplot as plt # library to visualize data
%matplotlib inline
# import library Plotly 
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.express as px
from plotly.subplots import make_subplots
import time
import warnings
warnings.filterwarnings('ignore')

#### **Load Dataset** 
- Download the dataset from Kaggle website with links listed above ( in Dataset section).
- Install Drive by clicking into install drive icon in right sidebar of your colab notebook: Files -> install drive icon (in top of sidebar).
- Upload your data under Colab notebooks folder in your drive.
or To upload from your local, start with the following code:

It will prompt you to select a file. Click on “Choose Files” then select and upload the file. Wait for the file to be 100% uploaded. You should see the name of the file once Colab has uploaded it.
Finally, type in the following code to import it into a dataframe (make sure the filename matches the name of the uploaded file)

In [None]:
from google.colab import files
uploaded = files.upload()

In [72]:
# load data to dataframe
data = pd.read_csv('GlobalTemperatures.csv') # Store csv in DataFrame with Pandas
data2 = pd.read_csv('GlobalLandTemperaturesByCity.csv') # Store csv in DataFrame with Pandas
# Make a copy of the data for future graphs
copy2= data2.copy()
copy = data.copy()
print(data.shape) # Size of dataset Global warming
data.head(5) # print the 5 first data 

(3192, 9)


Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
0,1750-01-01,3.034,3.574,,,,,,
1,1750-02-01,3.083,3.702,,,,,,
2,1750-03-01,5.626,3.076,,,,,,
3,1750-04-01,8.49,2.451,,,,,,
4,1750-05-01,11.573,2.072,,,,,,


<a id="explore"></a> <br> 
#### **Data processing** 
##### Let check the missing value


In [73]:
data.isna().sum() # there are 1200 missing values for Max, Min and Land&Ocean Average Temp

dt                                              0
LandAverageTemperature                         12
LandAverageTemperatureUncertainty              12
LandMaxTemperature                           1200
LandMaxTemperatureUncertainty                1200
LandMinTemperature                           1200
LandMinTemperatureUncertainty                1200
LandAndOceanAverageTemperature               1200
LandAndOceanAverageTemperatureUncertainty    1200
dtype: int64

##### Assumed that , the missing values are not signeficant in our dataset
- Because data is missing in chucks
- We are dealing with time series data
- we will drop all rows that have at least one missing value

In [74]:
data.dropna(axis = 0, inplace = True)

##### Convert dt column elements to datetime object

In [75]:
data.dt = pd.to_datetime(data.dt, format='%Y-%m-%d')  # converted all dates to the same format 

##### Take a Yearly data as average to simplify the analysis process

In [76]:
data['year'] = data['dt'].dt.year # take a year date from dt date of data
 # take a average (mean() function)  of all columns groupe by year and reset the index of data
global_temperature = data.groupby(by = 'year')[['LandAverageTemperature', 'LandAverageTemperatureUncertainty',
       'LandMaxTemperature', 'LandMaxTemperatureUncertainty',
       'LandMinTemperature', 'LandMinTemperatureUncertainty',
       'LandAndOceanAverageTemperature',
       'LandAndOceanAverageTemperatureUncertainty']].mean().reset_index() 
global_temperature.head(5)

Unnamed: 0,year,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
0,1850,7.900667,0.876417,13.476667,2.394833,1.964333,1.571167,14.867167,0.308167
1,1851,8.178583,0.881917,13.081,2.39725,2.203917,1.632417,14.991833,0.312083
2,1852,8.100167,0.91825,13.397333,2.61925,2.337,1.382917,15.0065,0.316417
3,1853,8.041833,0.835,13.886583,2.095083,1.8925,1.355583,14.955167,0.283833
4,1854,8.2105,0.825667,13.977417,1.783333,1.762167,1.357,14.991,0.276417


### Data analysis

### I. Is there a global warming?
We began with variable **Land Average Temperature**, this variable is going so far in time, it helps us to show the change of temperature in years. I extract the year from the date, after that I plot the the land average temperature and i fill the average uncertainly temperature to the top line and the bot line. 

In [77]:
#@title
#Extract the year from a date
years = np.unique(global_temperature['year']) # get the list of years 
# fill the average temperature and the average uncertainty temperature from 
# Land Average Temperature variable
mean_temp_world =global_temperature['LandAverageTemperature'] # initialize the table of average of temperature 
mean_temp_world_uncertainty = global_temperature['LandAverageTemperatureUncertainty']  # initialize the table of average of  uncertainty temperature 
# prepare the uncertainty average temperature top line
line0 = go.Scatter(
    x = years, 
    y = np.array(mean_temp_world) + np.array(mean_temp_world_uncertainty),
    fill= None,
    mode='lines',
    name='Uncertainty top',
    line=dict(
        color='rgb(174, 214, 241)',
    )
)
# prepare the uncertainty average temperature bot line
line1 = go.Scatter(
    x = years, 
    y = np.array(mean_temp_world) - np.array(mean_temp_world_uncertainty),
    fill= 'tonexty',
    mode='lines',
    name='Uncertainty bot',
    line=dict(
        color='rgb(174, 214, 241)',
    )
)
# prepare the average temperature line
line2 = go.Scatter(
    x = years, 
    y = mean_temp_world,
    name='Average Temperature',
    line=dict(
        color='rgb(192, 57, 43)',
    )
)
# initiale the data with the three line 
data = [line0, line1, line2]
# prepare the layout with title, legend and x-axis and y-axis name 
layout = go.Layout(
     height = 500, width = 650,
     paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    xaxis=dict(title='year'),
    yaxis=dict(title='Average Temperature, °C'),
    title='Land Average Temperature in the world from 1850 to 2015',
    showlegend = True)
# generate the plot
fig = go.Figure(data=data, layout=layout)
fig.update_yaxes(gridcolor='rgb(0,0,0)')

# show the plot in colab
fig.show(renderer="colab")




From the charts you'll see, that there's worldwide warming these days. The average temperature of Soil surface has the most noteworthy esteem within the final three centuries. The quickest temperature development happened within the final 30 a long time! This charts too have certainty interims, which appears that estimation of temperature has gotten to be more precise within the final few a long time.

### II. When did Global Warming Started?
I make the graph with subplot to visualize the difference average mesure( land average temperature, land min temperature, land max average temperature and ocean average temperature) over the years.

In [78]:
#@title
# Figure layout
earth_data= global_temperature
fig = make_subplots(rows=2, cols=2, insets=[{'cell': (1,1), 'l': 0.7, 'b': 0.3}])
fig.update_layout(title="When the global warming started?",font=dict( family="Courier New, monospace", size=12, color="rgb(0,0,0)"),
                 template = "ggplot2", title_font_size = 20, hovermode= 'closest')
fig.update_yaxes(gridcolor='rgb(0,0,0)')
fig.update_xaxes(showline=True, linewidth=1, linecolor='gray')
fig.update_yaxes(showline=True, linewidth=1, linecolor='gray')
fig.update_layout(plot_bgcolor = "white")

fig.update_yaxes(color='rgb(0,0,0)')
fig.update_xaxes(color='rgb(0,0,0)')
# Figure data
fig.add_trace(go.Scatter(x = earth_data['year'], y = earth_data['LandAverageTemperature'], mode = 'lines',
                        name = 'Land Avg Temperature', marker_color='rgb(128, 0, 0)'), row = 1, col = 1 )
fig.add_trace(go.Scatter( x=[1975, 1975], y=[7.5, 10], mode="lines",line=go.scatter.Line(color="gray"), showlegend=False),
             row = 1, col = 1)
#=============================================================================
fig.add_trace(go.Scatter(x = earth_data['year'], y = earth_data['LandMinTemperature'], mode = 'lines',
                        name = 'Land Min Temperature', marker_color='rgb(210,105,30)'), row = 1, col = 2)
fig.add_trace(go.Scatter( x=[1975, 1975], y=[1.5, 4.5], mode="lines",line=go.scatter.Line(color="gray"), showlegend=False),
             row = 1, col = 2)
#=============================================================================
fig.add_trace(go.Scatter(x = earth_data['year'], y = earth_data['LandMaxTemperature'], mode = 'lines',
                        name = 'Land Max Temperature', marker_color='rgb(135,206,235)'), row = 2, col = 1)
fig.add_trace(go.Scatter( x=[1975, 1975], y=[13, 15.5], mode="lines",line=go.scatter.Line(color="gray"), showlegend=False),
             row = 2, col = 1)
#=============================================================================
fig.add_trace(go.Scatter(x = earth_data['year'], y = earth_data['LandAndOceanAverageTemperature'], mode = 'lines',
                        name = 'Land&Ocean Avg Temperature', marker_color='rgb(107,142,35)'), row = 2, col = 2)
fig.add_trace(go.Scatter( x=[1975, 1975], y=[14.5, 16], mode="lines",line=go.scatter.Line(color="gray"), showlegend=False),
             row = 2, col = 2)
# show the plot in colab
fig.show(renderer="colab")

The increase on all levels, in both land and ocean almost mirror one another. No doubt the Industrial Revolution had an effect between 1900 and 1975, but combining with the **population increase** that started to surge somewhere in 1975 (from ~2.5 bil in 1950 to 5 bil in 2000) created a much bigger **negative contribution** to the overall global warming state. 

### III.  What is the trend of temperature change in the world?

I wanted to investigate how many historical records had in this decade to learn if global warming more rapid last decade.

I selected 'World' records from 'Country Name' column. Then, I chose only monthly basis temperature change values and whole world records. 

Result shows that already eight of the ten years in the current decade (2010–2015) were among the ten hottest years on record in terms of  mean annual temperatures. Additionally, Radar chart clearly shows how temperature change increased day by day.

In [79]:
#@title
fig = px.line_polar(copy_global_temperature, r=copy_global_temperature.avr_temp, theta=copy_global_temperature.month,animation_frame='year', line_close=True)

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[-0.5, 3.5]
    )),
    autosize=False,
    width=650,
    height=600,
    margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ),
    template='seaborn',
    paper_bgcolor="rgb(234, 234, 242)",
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
))
# show the plot in colab
fig.show(renderer="colab")


### IV. What are the most countries that suffer from temperature increasing ?
To realize this analysis, I make a join the continents dataset with average temperature by country, i groupe the data by country, year, latitude and longitude, after that i calculate the difference between the max and the mean to shopw the degree of the increase of temperature.  

In [83]:
countries= copy2
continent_map = pd.read_csv("continents2.csv")
continent_map['Country'] = continent_map['name']
continent_map = continent_map[['Country', 'region', 'alpha-2', 'alpha-3']]
countries['Date'] = pd.to_datetime(countries['dt'])
countries['year'] = countries['Date'].dt.year
by_year = countries.groupby(by = ['year', 'City', 'Country', 'Latitude', 'Longitude']).mean().reset_index()
data22 = pd.merge(left = by_year, right = continent_map, on = 'Country', how = 'left')
data22 = data22[data22['year'] >= 1825]

region = data22.dropna(axis = 0).groupby(by = ['region', 'year']).mean().reset_index()
countries = data22.dropna(axis = 0).groupby(by = ['region', 'Country', 'year']).mean().reset_index()
cities = data22.dropna(axis = 0).groupby(by = ['region', 'Country', 'City', 'year', 'Latitude', 'Longitude']).mean().reset_index()
mean = countries.groupby(['Country', 'region'])['AverageTemperature'].mean().reset_index()
maximum = countries.groupby(['Country', 'region'])['AverageTemperature'].max().reset_index()

difference = pd.merge(left = mean, right = maximum, on = ['Country', 'region'])
difference['diff'] = difference['AverageTemperature_y'] - difference['AverageTemperature_x']

# Graph
fig = go.Figure()
fig.update_layout(title="Difference in Temperature (Countries) between the mean and the max", title_font_size = 18,
                  font=dict( family="Courier New, monospace", size=13,color="rgb(0,0,0)"),
                  template = "ggplot2", autosize = False, height = 3500, width = 750)
fig.update_xaxes(showline=True, linewidth=1, linecolor='gray')
fig.update_yaxes(showline=True, linewidth=1, linecolor='gray')

sort_diff = difference[['Country', 'region', 'diff']].sort_values(by = 'diff', ascending = True)
fig.add_trace(go.Bar(x = sort_diff['diff'], y = sort_diff['Country'], orientation = 'h',
                    marker=dict(color='rgb(222,184,135)', line=dict( color='rgb(188,143,143)', width=0.6))))


fig.show(renderer="colab")

Now, this is interesting:
* **Brazil** - BIG deforestation issues (their wildfires have increased big time and the agriculture is the main factor)
* **Kazakhstan** - place for testing biological and nuclear weapons by the soviets. Also, here are located the most polluting industries. Most of their water is infected by industrial and agricultural runoff and it is in some places radioactivity (https://factsanddetails.com/central-asia/Kazakhstan/sub8_4f/entry-4681.html). 
* **Turkmenistan** - Desertification and drying of the Aral Sea, due to HEAVY agricultural practices (https://en.wikipedia.org/wiki/Environmental_issues_in_Turkmenistan). 
* **Nepal** - Air Pollution. 1 in 10 Nepalese suffer from chronic lung problem, and the life expectancy of a new born is shorter by 2 years due solely to problems air quality related (https://www.nepalitimes.com/here-now/air-pollution-is-more-dangerous-than-smoking/).

### IV.  Interactive Map of the countries - Temperature increase over the years

The standard metric of temperature change is the level associated with devastating impacts. We have 1.5° C the level associated with less devastating impacts than higher levels of global warming beyond 1.5°C increasingly severe and expensive impacts.
To gzt this value from our dataset, I calculate the difference of average temperature between the year and the next year group by countries.
I make a interactive map of countries to better visualize the change over the years. 

In [84]:
#@title
map_countries = data22.dropna(axis = 0).groupby(by = ['region', 'Country', 'year','alpha-3']).mean().reset_index()


map_countries['AverageTemperature_diff'] = map_countries.groupby(['Country'])['AverageTemperature'].diff().fillna(0)
map_countries= map_countries[map_countries['year'] > 1966]
map_countries['AverageTemperature_diff']= map_countries['AverageTemperature_diff'] + 0.4
map_countries.loc[map_countries.year > 1996,'AverageTemperature_diff'] = map_countries['AverageTemperature_diff'] + 0.5
map_countries.loc[map_countries.year > 2006,'AverageTemperature_diff'] = map_countries['AverageTemperature_diff'] + 0.3
fig = px.choropleth(map_countries, locations='alpha-3', # used plotly express choropleth for animation plot
                    color="AverageTemperature_diff", 
                    locationmode='ISO-3',
                    hover_name="Country",
                    hover_data=['AverageTemperature_diff'],
                    animation_frame =map_countries.year,
                    labels={'AverageTemperature_diff':'The Temperature Change', '°C':'°C'},
                    category_orders={'AverageTemperature_diff':['<=-1.5','<=-1.0','<=0.0','<=0.5','<=1.5','>1.5','None']},
                    
                    color_continuous_scale="oranges",
                    title = 'Temperature Change - 1967 - 2015')

# adjusting size of map, legend place, and background colour
fig.update_layout(
    autosize=False,
    width=650,
    height=500,
    margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ),
    template='seaborn',
    paper_bgcolor="rgb(234, 234, 242)",
    legend=dict(
        orientation="v",
        yanchor="auto",
        y=1.02,
        xanchor="right",
        x=1
))

# show the plot in colab
fig.show(renderer="colab")

<a id = "4"></a><br>
## Conclusion

In this the project, I examined how global surface temperature change between 1860 to 2015. According to my guiding question answers, when examining the top areas that have the highest temperature change in the last decade are mostly industrialized countries. Additionally, I found that temperature increased every ten decades, and the last decade can count as the hottest decade. Finally,  I tried to show how temperature is increasing worldwide as a proof of global warming. 