# **Global Methane Flux Analysis**

## Objectives

* Write your notebook objective here, for example, "Fetch data from Kaggle and save as raw data", or "engineer features for modelling"<br />

Kaggle dataset called "Methane Emissions" was used as raw data and saved under the raw folder. [here]( https://www.kaggle.com/datasets/ashishraut64/global-methane-emissions) is the link to the Kaggle website.

## Inputs

* Write down which data or information you need to run the notebook <br />


## Outputs

* Write here which files, code or artefacts you generate by the end of the notebook 

## Additional Comments

* If you have any additional comments that don't fit in the previous bullets, please state them here. 



---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [2]:
import os
current_dir = os.getcwd()
current_dir

'/Users/danielledelouw/Documents/code_institute/vscode-projects/Global_Methane_Flux_Analysis/Global_Methane_Flux_Analysis/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [3]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [4]:
current_dir = os.getcwd()
current_dir

'/Users/danielledelouw/Documents/code_institute/vscode-projects/Global_Methane_Flux_Analysis/Global_Methane_Flux_Analysis'

# Section 1

Section 1 Extraction:
Load the Dataset

In [8]:
import pandas as pd

df = pd.read_csv('Dataset/raw/Methane_final.csv')
df

Unnamed: 0.1,Unnamed: 0,region,country,emissions,type,segment,reason,baseYear,notes
0,0,Africa,Algeria,257.611206,Agriculture,Total,All,2019-2021,Average based on United Nations Framework Conv...
1,1,Africa,Algeria,0.052000,Energy,Bioenergy,All,2022,Estimates from end-uses are for 2020 or 2021 (...
2,2,Africa,Algeria,130.798996,Energy,Gas pipelines and LNG facilities,Fugitive,2022,Not available
3,3,Africa,Algeria,69.741898,Energy,Gas pipelines and LNG facilities,Vented,2022,Not available
4,4,Africa,Algeria,213.987000,Energy,Onshore gas,Fugitive,2022,Not available
...,...,...,...,...,...,...,...,...,...
1543,1543,World,World,3102.500000,Energy,Satellite-detected large oil and gas emissions,All,2022,Not available
1544,1544,World,World,30296.500000,Energy,Steam coal,All,2022,Not available
1545,1545,World,World,133350.984375,Energy,Total,All,2022,Estimates from end-uses are for 2020 or 2021 (...
1546,1546,World,World,9737.874023,Other,Total,All,2019-2021,Average based on United Nations Framework Conv...


Check the Shape of dataset:

In [11]:
df.shape

(1548, 9)

Check Information of dataset:

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1548 entries, 0 to 1547
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  1548 non-null   int64  
 1   region      1548 non-null   object 
 2   country     1548 non-null   object 
 3   emissions   1548 non-null   float64
 4   type        1548 non-null   object 
 5   segment     1548 non-null   object 
 6   reason      1548 non-null   object 
 7   baseYear    1548 non-null   object 
 8   notes       1548 non-null   object 
dtypes: float64(1), int64(1), object(7)
memory usage: 109.0+ KB


Check for missing values:

In [None]:
df.isnull().sum() # the output reveals that there are no missing values

Unnamed: 0    0
region        0
country       0
emissions     0
type          0
segment       0
reason        0
baseYear      0
notes         0
dtype: int64

Check for duplicates:

In [None]:
df.duplicated().sum()
#no duplicated data in rows

0

Check for unique values:

In [None]:
df.nunique()

Unnamed: 0    1548
region           9
country        105
emissions     1531
type             4
segment         12
reason           4
baseYear         2
notes            3
dtype: int64

Check the contents of a few columns:


In [None]:
contents_of_columns = ['region','country','type','segment','reason','baseYear','notes']
for col in contents_of_columns:
    print(f"\n{col} ({df[col].nunique()} unique values):")
    print(df[col].unique())

    #comments:

    #"region" consists of continents and 1 general value called "World" and "Other"
    #"country" returned 105 countries, 193 countries in the world so there is a desprepency. 
    # "country" There are 4 variables that are not a country  'Other countries in Europe' 'Other EU17 countries'
     #'Other EU7 countries' and 'World'.
    #"types" consists of group name origin of methane emission, stand out is "other"
    #"segment" consists of specific origins of methan emissions,
    #'coking coal' also known as metallurgical coal, is a type of coal used to produce coke, a key ingredient in the steelmaking process
    #"reason" consists of unintended releases of gas, 'fugitive' is uncontrolled and 'venting' & 'flared' is controlled and 'all' is a mix
    #"baseyear" displays two variables, can be used as categorical data types, need to convert from object to category
    #"notes" is a reference to UN Framework or Greenhouse Gas Data Interface with link. Good to drop column
    #9th column not listed is "Unnamed: 0" is an internal index, we have an index so this will be dropped


region (9 unique values):
['Africa' 'Asia Pacific' 'Central and South America' 'Europe'
 'Middle East' 'North America' 'Other' 'Russia & Caspian' 'World']

country (105 unique values):
['Algeria' 'Angola' 'Benin' 'Botswana' 'Cameroon'
 'Central African Republic' 'Chad' 'Congo' "Cote d'Ivoire"
 'Democratic Republic of Congo' 'Egypt' 'Equatorial Guinea' 'Eritrea'
 'Ethiopia' 'Gabon' 'Gambia' 'Ghana' 'Guinea' 'Guinea-Bissau' 'Kenya'
 'Liberia' 'Libya' 'Morocco' 'Mozambique' 'Namibia' 'Niger' 'Nigeria'
 'Senegal' 'Seychelles' 'Sierra Leone' 'Somalia' 'South Africa'
 'South Sudan' 'Sudan' 'Tanzania' 'Togo' 'Tunisia' 'Australia'
 'Bangladesh' 'Brunei' 'China' 'India' 'Indonesia' 'Japan' 'Korea'
 'Malaysia' 'Mongolia' 'New Zealand' 'Other countries in Southeast Asia'
 'Pakistan' 'Philippines' 'Thailand' 'Vietnam' 'Argentina' 'Bolivia'
 'Brazil' 'Colombia' 'Cuba' 'Ecuador' 'Guyana' 'Paraguay' 'Peru'
 'Trinidad and Tobago' 'Uruguay' 'Venezuela' 'Denmark' 'Estonia'
 'European Union' 'France' 'G

Transform:

---

# Section 2

Section 2 content

---

NOTE

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

# Push files to Repo

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create your folder here
  # os.makedirs(name='')
except Exception as e:
  print(e)
