About Dataset
The Global Environmental Emissions Dataset is a comprehensive collection of environmental indicators and emissions data from various countries around the world. This dataset is specifically designed for a hackathon challenge focused on understanding and addressing global environmental issues.

The dataset contains a wide range of columns, covering key aspects of greenhouse gas emissions, pollutant levels, and other environmental factors. It provides valuable insights into the impact of human activities on the environment and serves as a foundation for identifying potential solutions and policy interventions.

The dataset includes information such as carbon dioxide (CO2) emissions from different sectors, including energy production, manufacturing, transportation, and residential/commercial buildings. It also provides data on methane and nitrous oxide emissions, both from agricultural activities and the energy sector. Other greenhouse gases, such as Hydrofluorocarbons (HFCs), Perfluorocarbons (PFCs), and Sulfur Hexafluoride (SF6), are also covered in the dataset.

Furthermore, the dataset provides indicators related to adjusted savings, which include economic estimates of carbon dioxide damage in terms of Gross National Income (GNI) and current US dollars. This allows for the assessment of the economic implications of environmental damage caused by carbon dioxide emissions.

The dataset also includes information on emissions intensity, CO2 emissions per capita, and the percentage of different fuel types contributing to total emissions. Additionally, it provides data on greenhouse gas emissions and removals resulting from Land Use, Land-Use Change, and Forestry (LUCF) activities.

This dataset is a valuable resource for hackathon participants seeking to develop innovative solutions to mitigate climate change, reduce emissions, and promote sustainable practices. It enables participants to analyze historical trends, identify patterns, and uncover insights that can inform the development of effective strategies and policies for environmental conservation and sustainability.

Note: Participants are encouraged to explore and analyze the dataset creatively, using various statistical and machine-learning techniques to derive meaningful insights and propose data-driven solutions for a more sustainable future.

In [8]:
#Setting Up Notebook
#Importing Libs

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import zipfile
import plotly
import plotly.graph_objs as go
import plotly.offline as offline
from plotly.graph_objs import *
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

Import Dataset

In [10]:
df=pd.read_csv('D:\\vscode\\Global Environmental Emissions\\World CO2 Emission Data.csv')
df.head()

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,1960 [YR1960],1961 [YR1961],1962 [YR1962],1963 [YR1963],1964 [YR1964],1965 [YR1965],...,2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020],2021 [YR2021],2022 [YR2022]
0,Afghanistan,AFG,Adjusted savings: carbon dioxide damage (% of ...,NY.ADJ.DCO2.GN.ZS,..,..,..,..,..,..,...,1.36338448111679,1.28490840585691,1.44307895621429,1.44782682979766,1.4533391998403,1.59805470965195,1.55643923193288,1.40187835275711,..,..
1,Afghanistan,AFG,Adjusted savings: carbon dioxide damage (curre...,NY.ADJ.DCO2.CD,..,..,..,..,..,..,...,275631280.588653,263338827.184962,278618004.122999,264910281.973801,276138145.017514,291498572.368998,297253521.47533,284648920.796122,..,..
2,Afghanistan,AFG,Agricultural methane emissions (thousand metri...,EN.ATM.METH.AG.KT.CE,..,..,..,..,..,..,...,11284.75,11476.1975,10834.435,10617.2325,10314.9575,10549.4125,10222.785,10679.11,..,..
3,Afghanistan,AFG,Agricultural nitrous oxide emissions (thousand...,EN.ATM.NOXE.AG.KT.CE,..,..,..,..,..,..,...,4440.1106,4744.2494,4702.3804,4680.2688,4892.1766,4289.9782,4258.4498,4465.977,..,..
4,Afghanistan,AFG,CO2 emissions (kg per 2015 US$ of GDP),EN.ATM.CO2E.KD.GD,..,..,..,..,..,..,...,0.489964677117366,0.470845775464021,0.50292618070602,0.454516147902539,0.477468930733004,0.516563214552466,0.50918979595916,0.404094523787425,..,..


Information Class

In [11]:
class Info():

    def __init__(self):
       self.df = df

    def in_fo(self):
        infi = df.info()
        return  infi
    def null_values(self):
        null__values = df.isnull().sum()*100
        return null__values
df1=Info()
print(df1.in_fo())
print(df1.null_values())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8251 entries, 0 to 8250
Data columns (total 67 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Country Name   8248 non-null   object
 1   Country Code   8246 non-null   object
 2   Series Name    8246 non-null   object
 3   Series Code    8246 non-null   object
 4   1960 [YR1960]  8246 non-null   object
 5   1961 [YR1961]  8246 non-null   object
 6   1962 [YR1962]  8246 non-null   object
 7   1963 [YR1963]  8246 non-null   object
 8   1964 [YR1964]  8246 non-null   object
 9   1965 [YR1965]  8246 non-null   object
 10  1966 [YR1966]  8246 non-null   object
 11  1967 [YR1967]  8246 non-null   object
 12  1968 [YR1968]  8246 non-null   object
 13  1969 [YR1969]  8246 non-null   object
 14  1970 [YR1970]  8246 non-null   object
 15  1971 [YR1971]  8246 non-null   object
 16  1972 [YR1972]  8246 non-null   object
 17  1973 [YR1973]  8246 non-null   object
 18  1974 [YR1974]  8246 non-null

Rename Columns

In [12]:
anni = ["Country Name", "Country Code", "Series Name", "Series Code"]
anni += [str(anno) for anno in range(1960, 2023)]

df.columns = anni

Make a function to keep only numbers

In [13]:
def isnumber(x):
    try:
        float(x)
        return True
    except:
        return False

Save :

1 : Country Name

2 : Series Name

In [14]:
save_country_name=df[['Country Name']]
save_series_name=df[['Series Name']]
df=df[df.applymap(isnumber)]

In [15]:
df['Country Name']=save_country_name
df['Series Name']=save_series_name
df.head()

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,1960,1961,1962,1963,1964,1965,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,Afghanistan,,Adjusted savings: carbon dioxide damage (% of ...,,,,,,,,...,1.36338448111679,1.28490840585691,1.44307895621429,1.44782682979766,1.4533391998403,1.59805470965195,1.55643923193288,1.40187835275711,,
1,Afghanistan,,Adjusted savings: carbon dioxide damage (curre...,,,,,,,,...,275631280.588653,263338827.184962,278618004.122999,264910281.973801,276138145.017514,291498572.368998,297253521.47533,284648920.796122,,
2,Afghanistan,,Agricultural methane emissions (thousand metri...,,,,,,,,...,11284.75,11476.1975,10834.435,10617.2325,10314.9575,10549.4125,10222.785,10679.11,,
3,Afghanistan,,Agricultural nitrous oxide emissions (thousand...,,,,,,,,...,4440.1106,4744.2494,4702.3804,4680.2688,4892.1766,4289.9782,4258.4498,4465.977,,
4,Afghanistan,,CO2 emissions (kg per 2015 US$ of GDP),,,,,,,,...,0.489964677117366,0.470845775464021,0.50292618070602,0.454516147902539,0.477468930733004,0.516563214552466,0.50918979595916,0.404094523787425,,


Create a function to drop columns with too much nans.

In [16]:
columns_to_drop=[]
for element in df.columns:
    colonna = df[element]
    percentuale_na = colonna.isna().sum()/(len(df))*100
    if percentuale_na > 60:
        columns_to_drop.append(element)

In [17]:
df.drop(columns_to_drop,inplace=True,axis=1)

In [18]:
df.head()

Unnamed: 0,Country Name,Series Name,1990,1991,1992,1993,1994,1995,1996,1997,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
0,Afghanistan,Adjusted savings: carbon dioxide damage (% of ...,,,,,,,,,...,,,1.339220034831,1.56708303900069,2.05329938038888,1.63878128180552,1.36338448111679,1.28490840585691,1.44307895621429,1.44782682979766
1,Afghanistan,Adjusted savings: carbon dioxide damage (curre...,35990789.6764766,35244502.1600685,19296048.1118209,19173399.2334259,19176964.8576402,19225447.4694438,18919823.5726077,18222891.9453545,...,61270054.2528804,118144478.969927,165805569.8644,248925028.917687,365193644.716076,326751092.146682,275631280.588653,263338827.184962,278618004.122999,264910281.973801
2,Afghanistan,Agricultural methane emissions (thousand metri...,5361.61,5603.93,5668.51,5712.9575,5925.005,6246.1625,7039.0725,7710.3425,...,8796.81,9801.9825,9993.155,11514.525,11533.2525,11379.2825,11284.75,11476.1975,10834.435,10617.2325
3,Afghanistan,Agricultural nitrous oxide emissions (thousand...,2705.2738,2786.5384,2735.0142,2767.526,2592.9576,2704.8566,2938.3992,3242.389,...,3324.4582,3597.9626,3863.868,4273.8564,4369.0078,4398.5098,4440.1106,4744.2494,4702.3804,4680.2688
4,Afghanistan,CO2 emissions (kg per 2015 US$ of GDP),,,,,,,,,...,0.256024455726987,0.378522813158763,0.46741357131646,0.534410058262647,0.742222918218816,0.561765919038987,0.489964677117366,0.470845775464021,0.50292618070602,0.454516147902539


Make Columns in plotting table

In [19]:
df_long = pd.melt(df, id_vars=['Country Name', 'Series Name'], var_name='Year', value_name='Value')


In [20]:
df_long

Unnamed: 0,Country Name,Series Name,Year,Value
0,Afghanistan,Adjusted savings: carbon dioxide damage (% of ...,1990,
1,Afghanistan,Adjusted savings: carbon dioxide damage (curre...,1990,35990789.6764766
2,Afghanistan,Agricultural methane emissions (thousand metri...,1990,5361.61
3,Afghanistan,Agricultural nitrous oxide emissions (thousand...,1990,2705.2738
4,Afghanistan,CO2 emissions (kg per 2015 US$ of GDP),1990,
...,...,...,...,...
222772,,,2016,
222773,,,2016,
222774,,,2016,
222775,Data from database: World Development Indicators,,2016,


In [21]:
df_long = df_long[df_long["Series Name"]=='CO2 emissions (metric tons per capita)']
df_long.head(2)

Unnamed: 0,Country Name,Series Name,Year,Value
8,Afghanistan,CO2 emissions (metric tons per capita),1990,0.191389344873899
39,Albania,CO2 emissions (metric tons per capita),1990,1.84403546341413


In [22]:
#create emopty data slider container
data_slider = []

# for each year 
for year in df_long.Year.unique():
    df_sected_year = df_long[df_long["Year"] == year].copy()  # Take the year values
    
    for col in df_sected_year.columns:
        df_sected_year[col] = df_sected_year[col].astype(str)  # Transform every value to string
    
    df_sected_year['Value'] = pd.to_numeric(df_sected_year['Value'], errors='coerce')  # Convert 'Value' column to numeric
    
    # Replace invalid or missing values with NaN
    df_sected_year['Value'].replace([np.inf, -np.inf], np.nan, inplace=True)
    
    # data_one_year will be the plot data of each year
    data_one_year = dict(
    type='choropleth',
    locations=df_sected_year['Country Name'],
    z=df_sected_year['Value'],
    locationmode='country names',
    colorscale='YlOrRd',
     colorbar=dict(
            len= 0.7,    # length of the colorbar 
            ticklen=0.5, # length of the tick marks on the colorbar
            tickformat=".1f", # 1 decimal float point numbers
            thickness= 30        #thickness of color bar
        ))

      #thickness of color bar
        
    #add data to the data_slider dictionary 
    data_slider.append(data_one_year)

##  create the steps for the slider

steps = []

for i in range(len(data_slider)):
    step = dict(method='restyle',
                args=['visible', [False] * len(data_slider)],
                label='{}'.format(i + 1990)) # label to be displayed for each step (year)
    step['args'][1][i] = True
    steps.append(step)



## create the 'sliders' object from the 'steps' 

sliders = [dict(active=0, pad={"t": 1}, steps=steps)]
##  I create the steps for the slider

steps = []

for i in range(len(data_slider)):
    step = dict(method='restyle',
                args=['visible', [False] * len(data_slider)],
                label='{}'.format(i + 1990)) # label to be displayed for each step (year)
    step['args'][1][i] = True
    steps.append(step)



## create the 'sliders' object from the 'steps' 

sliders = [dict(active=0, pad={"t": 1}, steps=steps)]
layout = dict(
    geo=dict(
        scope='world',
        projection={'type': 'mercator'},
        lataxis_range=[-32, 80],
        lonaxis_range=[-168, 184],
        showlakes=False,
        showrivers=False
    ),
    title=dict(
        text='CO2 emissions (metric tons per capita)',
        font=dict(family="Times New Roman", size=40),
        x=0.02,
        y=0.90
    ),
    autosize=False,
    margin=dict(l=20, r=10, t=50, b=20),
    paper_bgcolor="white",
    width=800,
    height=600,
    sliders=[dict(active=0, pad={"t": 1}, steps= steps, x=0, y=0.1)],
)


fig = dict(data=data_slider, layout=layout)

# To plot in the notebook fixing the zoom
plotly.offline.iplot(fig, config={'scrollZoom': False, 'modeBarButtonsToRemove': ['pan2d', 'zoomIn2d', 'zoomOut2d', 'autoScale2d', 'resetScale2d']})


In [23]:
#THANK YOU

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=e1b1da35-a5ff-44ff-a007-1890a7341fac' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>