<div align="center"> <h1>Country Other-Agricultural-Soil-Emissions Emissions CSV</h1></div>

### About This Notebook
This notebook explores the contents of the csv file.
Various exploratory techniques will be used to analyze the content of the csv and perform any necessary cleanups.

In [1]:
# Import necessary libraries to use for the exploration & analysis
import csv 
import pandas as pd
import matplotlib.pyplot as plt

# Set up the number of columns to use when showing a data frame using pandas
pd.set_option('display.max_columns', 200)

In [2]:
# Import the csv file and convert it to a pandas data frame 
df = pd.read_csv("../data/agriculture/country_other-agricultural-soil-emissions_emissions.csv")

In [3]:
# Show the shape of the data frame (rows, columns)
# The returned value is the number of rows and columns in the dataframe
df.shape

(8785, 10)

In [4]:
# Show the first 5 rows by using the head command
df.head()

Unnamed: 0,iso3_country,start_time,end_time,original_inventory_sector,gas,emissions_quantity,emissions_quantity_units,temporal_granularity,created_date,modified_date
0,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2,0.0,tonnes,,2022-09-06 12:39:52.76017,
1,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,n2o,0.0,tonnes,,2022-09-06 12:39:52.76017,
2,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,ch4,0.0,tonnes,,2022-09-06 12:39:52.76017,
3,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2e_20yr,0.0,tonnes,,2022-09-06 12:39:52.76017,
4,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,0.0,tonnes,,2022-09-06 12:39:52.76017,


In [5]:
# List out the columns 
columns = df.columns

index = 0 
print("Column Index\t Column Name")
for column in columns:
    print(f"{index}\t\t {column}")
    index += 1

Column Index	 Column Name
0		 iso3_country
1		 start_time
2		 end_time
3		 original_inventory_sector
4		 gas
5		 emissions_quantity
6		 emissions_quantity_units
7		 temporal_granularity
8		 created_date
9		 modified_date


In [6]:
# Check the data type of the column's content
df.dtypes

iso3_country                  object
start_time                    object
end_time                      object
original_inventory_sector     object
gas                           object
emissions_quantity           float64
emissions_quantity_units      object
temporal_granularity         float64
created_date                  object
modified_date                 object
dtype: object

#### Here are some of the keys for understanding pandas data types

object => String

float64 => Float

In [7]:
# Check on the overal distribution of all numerical data columns
df.describe()

Unnamed: 0,emissions_quantity,temporal_granularity
count,7357.0,0.0
mean,1783291.0,
std,8110952.0,
min,0.0,
25%,0.0,
50%,991.2,
75%,351278.4,
max,103844800.0,


In [9]:
# Check on the summarized technical information of the pandas dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8785 entries, 0 to 8784
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   iso3_country               8785 non-null   object 
 1   start_time                 8785 non-null   object 
 2   end_time                   8785 non-null   object 
 3   original_inventory_sector  8785 non-null   object 
 4   gas                        8785 non-null   object 
 5   emissions_quantity         7357 non-null   float64
 6   emissions_quantity_units   8785 non-null   object 
 7   temporal_granularity       0 non-null      float64
 8   created_date               8785 non-null   object 
 9   modified_date              1194 non-null   object 
dtypes: float64(2), object(8)
memory usage: 686.5+ KB


In [10]:
# Drop all columns with null values
# Assign the dataframe with the dropped column to a new variable to save changes.
df = df.drop(["temporal_granularity"], axis=1)

In [11]:
# Confirm columns or Series (as they are called in pandas) are dropped
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8785 entries, 0 to 8784
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   iso3_country               8785 non-null   object 
 1   start_time                 8785 non-null   object 
 2   end_time                   8785 non-null   object 
 3   original_inventory_sector  8785 non-null   object 
 4   gas                        8785 non-null   object 
 5   emissions_quantity         7357 non-null   float64
 6   emissions_quantity_units   8785 non-null   object 
 7   created_date               8785 non-null   object 
 8   modified_date              1194 non-null   object 
dtypes: float64(1), object(8)
memory usage: 617.8+ KB


In [12]:
# Display the top [insert number here] rows to have a glimpse of how the dataframe looks like
# The number passed to the `df.head()` is the number of rows that will be returned
df.head(10)

Unnamed: 0,iso3_country,start_time,end_time,original_inventory_sector,gas,emissions_quantity,emissions_quantity_units,created_date,modified_date
0,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2,0.0,tonnes,2022-09-06 12:39:52.76017,
1,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,n2o,0.0,tonnes,2022-09-06 12:39:52.76017,
2,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,ch4,0.0,tonnes,2022-09-06 12:39:52.76017,
3,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2e_20yr,0.0,tonnes,2022-09-06 12:39:52.76017,
4,SHN,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,0.0,tonnes,2022-09-06 12:39:52.76017,
5,SLB,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2,0.0,tonnes,2022-09-06 12:39:52.76017,
6,SLB,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,n2o,34.5,tonnes,2022-09-06 12:39:52.76017,
7,SLB,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,ch4,0.0,tonnes,2022-09-06 12:39:52.76017,
8,SLB,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2e_20yr,9108.0,tonnes,2022-09-06 12:39:52.76017,
9,SLB,2019-01-01 00:00:00,2019-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,9142.5,tonnes,2022-09-06 12:39:52.76017,


In [13]:
# Display the bottom [insert number here] rows to have a glimpse of how the dataframe looks like
# The number passed to the `df.tail()` is the number of rows that will be returned
df.tail(10)

Unnamed: 0,iso3_country,start_time,end_time,original_inventory_sector,gas,emissions_quantity,emissions_quantity_units,created_date,modified_date
8775,TUV,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,,tonnes,2022-09-07 09:56:37.90632,
8776,UMI,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_20yr,,tonnes,2022-09-07 09:56:37.90632,
8777,UMI,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,,tonnes,2022-09-07 09:56:37.90632,
8778,VAT,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_20yr,,tonnes,2022-09-07 09:56:37.90632,
8779,VAT,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,,tonnes,2022-09-07 09:56:37.90632,
8780,WLF,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2,,tonnes,2022-09-07 09:56:37.90632,
8781,WLF,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,n2o,,tonnes,2022-09-07 09:56:37.90632,
8782,WLF,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,ch4,,tonnes,2022-09-07 09:56:37.90632,
8783,WLF,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_20yr,,tonnes,2022-09-07 09:56:37.90632,
8784,WLF,2021-01-01 00:00:00,2021-12-31 00:00:00,other-agricultural-soil-emissions,co2e_100yr,,tonnes,2022-09-07 09:56:37.90632,
