<div align="center"> <h1>Asset Enteric-Fermentation Emissions CSV</h1></div>

### About This Notebook
This notebook explores the contents of the csv file.
Various exploratory techniques will be used to analyze the content of the csv and perform any necessary cleanups.

In [35]:
# Import necessary libraries to use for the exploration & analysis
import csv 
import pandas as pd
import matplotlib.pyplot as plt

# Set up the number of columns to use when showing a data frame using pandas
pd.set_option('display.max_columns', 200)

In [36]:
# Import the csv file and convert it to a pandas data frame 
df = pd.read_csv("../data/agriculture/asset_enteric-fermentation_emissions.csv")

In [37]:
# Show the shape of the data frame (rows, columns)
# The returned value is the number of rows and columns in the dataframe
df.shape

(15305, 21)

In [38]:
# Show the first 5 rows by using the head command
df.head()

Unnamed: 0,asset_id,iso3_country,original_inventory_sector,start_time,end_time,temporal_granularity,lat_lon,gas,emissions_quantity,emissions_factor,emissions_factor_units,capacity,capacity_units,capacity_factor,activity,activity_units,created_date,modified_date,asset_name,asset_type,st_astext
0,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,,co2,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,,9000,total_animal_head_count,2022-09-13 19:15:49.594347,2023-04-11 18:56:31.292708,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
1,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,,ch4,1152.0,0.128,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,,9000,total_animal_head_count,2022-09-13 19:15:49.594347,2023-04-11 18:56:31.292708,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
2,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,,n2o,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,,9000,total_animal_head_count,2022-09-13 19:15:49.594347,2023-04-11 18:56:31.292708,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
3,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,,co2e_100yr,51163.2,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,,9000,total_animal_head_count,2022-09-13 19:15:49.594347,2023-04-11 18:56:31.292708,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
4,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,,co2e_20yr,151984.8,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,,9000,total_animal_head_count,2022-09-13 19:15:49.594347,2023-04-11 18:56:31.292708,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)


In [39]:
# List out the columns 
columns = df.columns

index = 0 
print("Column Index\t Column Name")
for column in columns:
    print(f"{index}\t\t {column}")
    index += 1

Column Index	 Column Name
0		 asset_id
1		 iso3_country
2		 original_inventory_sector
3		 start_time
4		 end_time
5		 temporal_granularity
6		 lat_lon
7		 gas
8		 emissions_quantity
9		 emissions_factor
10		 emissions_factor_units
11		 capacity
12		 capacity_units
13		 capacity_factor
14		 activity
15		 activity_units
16		 created_date
17		 modified_date
18		 asset_name
19		 asset_type
20		 st_astext


In [40]:
# Check the data type of the column's content
df.dtypes

asset_id                       int64
iso3_country                  object
original_inventory_sector     object
start_time                    object
end_time                      object
temporal_granularity          object
lat_lon                      float64
gas                           object
emissions_quantity           float64
emissions_factor             float64
emissions_factor_units        object
capacity                     float64
capacity_units                object
capacity_factor              float64
activity                       int64
activity_units                object
created_date                  object
modified_date                 object
asset_name                    object
asset_type                    object
st_astext                     object
dtype: object

#### Here are some of the keys for understanding pandas data types
int64 => Integer

object => String

float64 => Float

In [41]:
# Check on the overal distribution of all numerical data columns
df.describe()

Unnamed: 0,asset_id,lat_lon,emissions_quantity,emissions_factor,capacity,capacity_factor,activity
count,15305.0,0.0,15305.0,12266.0,15305.0,0.0,15305.0
mean,7492681.0,,8079.948491,0.022444,0.337352,,5830.718719
std,66147240.0,,32241.502397,0.043159,0.627582,,14354.706544
min,5099010.0,,0.0,0.0,6e-06,,100.0
25%,5099427.0,,0.0,0.0,0.002141,,785.0
50%,5099871.0,,140.8,0.0,0.020713,,1600.0
75%,5100636.0,,4550.656,0.0,0.961318,,3605.0
max,1836078000.0,,851531.0,0.128,10.540186,,188000.0


In [42]:
# Check on the summarized technical information of the pandas dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15305 entries, 0 to 15304
Data columns (total 21 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   asset_id                   15305 non-null  int64  
 1   iso3_country               15305 non-null  object 
 2   original_inventory_sector  15305 non-null  object 
 3   start_time                 15305 non-null  object 
 4   end_time                   15305 non-null  object 
 5   temporal_granularity       15305 non-null  object 
 6   lat_lon                    0 non-null      float64
 7   gas                        15305 non-null  object 
 8   emissions_quantity         15305 non-null  float64
 9   emissions_factor           12266 non-null  float64
 10  emissions_factor_units     15285 non-null  object 
 11  capacity                   15305 non-null  float64
 12  capacity_units             15305 non-null  object 
 13  capacity_factor            0 non-null      flo

In [43]:
# Drop all columns with null values
# Assign the dataframe with the dropped column to a new variable to save changes.
df = df.drop(["lat_lon", "capacity_factor"], axis=1)

In [44]:
# Confirm columns or Series (as they are called in pandas) are dropped
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15305 entries, 0 to 15304
Data columns (total 19 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   asset_id                   15305 non-null  int64  
 1   iso3_country               15305 non-null  object 
 2   original_inventory_sector  15305 non-null  object 
 3   start_time                 15305 non-null  object 
 4   end_time                   15305 non-null  object 
 5   temporal_granularity       15305 non-null  object 
 6   gas                        15305 non-null  object 
 7   emissions_quantity         15305 non-null  float64
 8   emissions_factor           12266 non-null  float64
 9   emissions_factor_units     15285 non-null  object 
 10  capacity                   15305 non-null  float64
 11  capacity_units             15305 non-null  object 
 12  activity                   15305 non-null  int64  
 13  activity_units             15305 non-null  obj

In [45]:
# Drop any and all columns that are unnecessary 
df = df.drop(["modified_date"], axis=1)

In [46]:
# Confirm unnecessary column(s) have been successfuly removed
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15305 entries, 0 to 15304
Data columns (total 18 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   asset_id                   15305 non-null  int64  
 1   iso3_country               15305 non-null  object 
 2   original_inventory_sector  15305 non-null  object 
 3   start_time                 15305 non-null  object 
 4   end_time                   15305 non-null  object 
 5   temporal_granularity       15305 non-null  object 
 6   gas                        15305 non-null  object 
 7   emissions_quantity         15305 non-null  float64
 8   emissions_factor           12266 non-null  float64
 9   emissions_factor_units     15285 non-null  object 
 10  capacity                   15305 non-null  float64
 11  capacity_units             15305 non-null  object 
 12  activity                   15305 non-null  int64  
 13  activity_units             15305 non-null  obj

In [47]:
# Display the top [insert number here] rows to have a glimpse of how the dataframe looks like
# The number passed to the `df.head()` is the number of rows that will be returned
df.head(10)

Unnamed: 0,asset_id,iso3_country,original_inventory_sector,start_time,end_time,temporal_granularity,gas,emissions_quantity,emissions_factor,emissions_factor_units,capacity,capacity_units,activity,activity_units,created_date,asset_name,asset_type,st_astext
0,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,9000,total_animal_head_count,2022-09-13 19:15:49.594347,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
1,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,ch4,1152.0,0.128,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,9000,total_animal_head_count,2022-09-13 19:15:49.594347,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
2,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,n2o,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,9000,total_animal_head_count,2022-09-13 19:15:49.594347,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
3,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_100yr,51163.2,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,9000,total_animal_head_count,2022-09-13 19:15:49.594347,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
4,5100086,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_20yr,151984.8,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,9000,total_animal_head_count,2022-09-13 19:15:49.594347,USA_CA_dairy_1296,enteric_fermentation_dairy,POINT(-119.76448059 36.16134652)
5,5100088,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,10776,total_animal_head_count,2022-09-13 19:15:49.594383,USA_CA_dairy_1298,enteric_fermentation_dairy,POINT(-119.39844329 35.97133785)
6,5100088,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,ch4,1379.328,0.128,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,10776,total_animal_head_count,2022-09-13 19:15:49.594383,USA_CA_dairy_1298,enteric_fermentation_dairy,POINT(-119.39844329 35.97133785)
7,5100088,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,n2o,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,10776,total_animal_head_count,2022-09-13 19:15:49.594383,USA_CA_dairy_1298,enteric_fermentation_dairy,POINT(-119.39844329 35.97133785)
8,5100088,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_100yr,61259.4048,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,10776,total_animal_head_count,2022-09-13 19:15:49.594383,USA_CA_dairy_1298,enteric_fermentation_dairy,POINT(-119.39844329 35.97133785)
9,5100088,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_20yr,181976.4672,0.0,tonnes_gas_per_animal_head,1.0,area_sq_meters_of_intensive_farm,10776,total_animal_head_count,2022-09-13 19:15:49.594383,USA_CA_dairy_1298,enteric_fermentation_dairy,POINT(-119.39844329 35.97133785)


In [48]:
# Display the bottom [insert number here] rows to have a glimpse of how the dataframe looks like
# The number passed to the `df.tail()` is the number of rows that will be returned
df.tail(10)

Unnamed: 0,asset_id,iso3_country,original_inventory_sector,start_time,end_time,temporal_granularity,gas,emissions_quantity,emissions_factor,emissions_factor_units,capacity,capacity_units,activity,activity_units,created_date,asset_name,asset_type,st_astext
15295,5099329,USA,enteric-fermentation,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,co2,0.0,,tonnes_gas_per_animal_head,0.000539,area_sq_km_of_intensive_farm,1286,total_animal_head_count,2022-09-13 19:15:49.577877,USA_TX_beef_136,enteric_fermentation_beef,POINT(-102.6015472 34.45882395)
15296,5101236,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,n2o,0.0,0.0,tonnes_gas_per_animal_head,0.007023,area_sq_km_of_intensive_farm,1169,total_animal_head_count,2022-09-13 19:15:49.624807,USA_TX_dairy_85,enteric_fermentation_dairy,POINT(-102.5121073 36.23801092)
15297,5101236,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_100yr,4069.9904,0.0,tonnes_gas_per_animal_head,0.007023,area_sq_km_of_intensive_farm,1169,total_animal_head_count,2022-09-13 19:15:49.624807,USA_TX_dairy_85,enteric_fermentation_dairy,POINT(-102.5121073 36.23801092)
15298,5101236,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_20yr,12090.2656,0.0,tonnes_gas_per_animal_head,0.007023,area_sq_km_of_intensive_farm,1169,total_animal_head_count,2022-09-13 19:15:49.624807,USA_TX_dairy_85,enteric_fermentation_dairy,POINT(-102.5121073 36.23801092)
15299,5099868,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,n2o,0.0,0.0,tonnes_gas_per_animal_head,1.0,area_sq_km_of_intensive_farm,115,total_animal_head_count,2022-09-13 19:15:49.590226,USA_CA_dairy_11,enteric_fermentation_dairy,POINT(-124.173725 40.524349)
15300,5099868,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_100yr,400.384,0.0,tonnes_gas_per_animal_head,1.0,area_sq_km_of_intensive_farm,115,total_animal_head_count,2022-09-13 19:15:49.590226,USA_CA_dairy_11,enteric_fermentation_dairy,POINT(-124.173725 40.524349)
15301,5099868,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,co2e_20yr,1189.376,0.0,tonnes_gas_per_animal_head,1.0,area_sq_km_of_intensive_farm,115,total_animal_head_count,2022-09-13 19:15:49.590226,USA_CA_dairy_11,enteric_fermentation_dairy,POINT(-124.173725 40.524349)
15302,5099316,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,ch4,63.706,0.053,tonnes_gas_per_animal_head,0.00045,area_sq_km_of_intensive_farm,1202,total_animal_head_count,2022-09-13 19:15:49.577669,USA_TX_beef_124,enteric_fermentation_beef,POINT(-100.3350449 35.99106317)
15303,5100715,USA,enteric-fermentation,2021-01-01 00:00:00,2021-12-31 00:00:00,annual,ch4,141.312,0.128,tonnes_gas_per_animal_head,1.0,area_sq_km_of_intensive_farm,1104,total_animal_head_count,2022-09-13 19:15:49.618992,USA_CA_dairy_694,enteric_fermentation_dairy,POINT(-121.01243 37.4324)
15304,5099525,USA,enteric-fermentation,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,co2e_100yr,7765.8992,0.0,tonnes_gas_per_animal_head,0.014637,area_sq_km_of_intensive_farm,5387,total_animal_head_count,2022-09-13 19:15:49.580862,USA_TX_beef_312,enteric_fermentation_beef,POINT(-102.3937642 35.1093388)
