<div align="center"> <h1>Asset Manure-Management Emissions CSV</h1></div>

### About This Notebook
This notebook explores the contents of the csv file.
Various exploratory techniques will be used to analyze the content of the csv and perform any necessary cleanups.

In [1]:
# Import necessary libraries to use for the exploration & analysis
import csv 
import pandas as pd
import matplotlib.pyplot as plt

# Set up the number of columns to use when showing a data frame using pandas
pd.set_option('display.max_columns', 200)

In [2]:
# Import the csv file and convert it to a pandas data frame 
df = pd.read_csv("../data/agriculture/asset_manure-management_emissions.csv")

In [3]:
# Show the shape of the data frame (rows, columns)
# The returned value is the number of rows and columns in the dataframe
df.shape

(15245, 21)

In [4]:
# Show the first 5 rows by using the head command
df.head()

Unnamed: 0,asset_id,iso3_country,original_inventory_sector,start_time,end_time,temporal_granularity,lat_lon,gas,emissions_quantity,emissions_factor,emissions_factor_units,capacity,capacity_units,capacity_factor,activity,activity_units,created_date,modified_date,asset_name,asset_type,st_astext
0,5096624,USA,manure-management,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,,co2,0.0,0.0,tonnes_gas_per_animal_head,0.001023,area_sq_meters_of_intensive_farm,,1654,total_animal_head_count,2022-09-13 19:15:27.534582,2022-10-14 21:11:31.952288,USA_CA_beef_100,manure_management_beef,POINT(-121.14280701 37.69197101)
1,5096624,USA,manure-management,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,,ch4,3.308,0.002,tonnes_gas_per_animal_head,0.001023,area_sq_meters_of_intensive_farm,,1654,total_animal_head_count,2022-09-13 19:15:27.534582,2022-10-14 21:11:31.952288,USA_CA_beef_100,manure_management_beef,POINT(-121.14280701 37.69197101)
2,5096624,USA,manure-management,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,,n2o,16.375634,0.009901,tonnes_gas_per_animal_head,0.001023,area_sq_meters_of_intensive_farm,,1654,total_animal_head_count,2022-09-13 19:15:27.534582,2022-10-14 21:11:31.952288,USA_CA_beef_100,manure_management_beef,POINT(-121.14280701 37.69197101)
3,5096624,USA,manure-management,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,,co2e_100yr,4415.79131,0.009901,tonnes_gas_per_animal_head,0.001023,area_sq_meters_of_intensive_farm,,1654,total_animal_head_count,2022-09-13 19:15:27.534582,2022-10-14 21:11:31.952288,USA_CA_beef_100,manure_management_beef,POINT(-121.14280701 37.69197101)
4,5096624,USA,manure-management,2020-01-01 00:00:00,2020-12-31 00:00:00,annual,,co2e_20yr,4617.414944,0.009901,tonnes_gas_per_animal_head,0.001023,area_sq_meters_of_intensive_farm,,1654,total_animal_head_count,2022-09-13 19:15:27.534582,2022-10-14 21:11:31.952288,USA_CA_beef_100,manure_management_beef,POINT(-121.14280701 37.69197101)


In [5]:
# List out the columns 
columns = df.columns

index = 0 
print("Column Index\t Column Name")
for column in columns:
    print(f"{index}\t\t {column}")
    index += 1

Column Index	 Column Name
0		 asset_id
1		 iso3_country
2		 original_inventory_sector
3		 start_time
4		 end_time
5		 temporal_granularity
6		 lat_lon
7		 gas
8		 emissions_quantity
9		 emissions_factor
10		 emissions_factor_units
11		 capacity
12		 capacity_units
13		 capacity_factor
14		 activity
15		 activity_units
16		 created_date
17		 modified_date
18		 asset_name
19		 asset_type
20		 st_astext


In [6]:
# Check the data type of the column's content
df.dtypes

asset_id                       int64
iso3_country                  object
original_inventory_sector     object
start_time                    object
end_time                      object
temporal_granularity          object
lat_lon                      float64
gas                           object
emissions_quantity           float64
emissions_factor             float64
emissions_factor_units        object
capacity                     float64
capacity_units                object
capacity_factor              float64
activity                       int64
activity_units                object
created_date                  object
modified_date                 object
asset_name                    object
asset_type                    object
st_astext                     object
dtype: object

#### Here are some of the keys for understanding pandas data types
int64 => Integer

object => String

float64 => Float

In [7]:
# Check on the overal distribution of all numerical data columns
df.describe()

Unnamed: 0,asset_id,lat_lon,emissions_quantity,emissions_factor,capacity,capacity_factor,activity
count,15245.0,0.0,15245.0,15245.0,15245.0,0.0,15245.0
mean,18309050.0,,7626.359461,0.014695,0.322632,,5486.555592
std,154973700.0,,26798.956391,0.019653,0.544975,,13196.685771
min,5096622.0,,0.0,0.0,6e-06,,100.0
25%,5097043.0,,4.532,0.002,0.002122,,784.0
50%,5097501.0,,59.285,0.009901,0.02044,,1593.0
75%,5098268.0,,5704.71524,0.015458,0.915607,,3582.0
max,1836078000.0,,851531.0,0.128,4.694419,,188000.0


In [8]:
# Check on the summarized technical information of the pandas dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15245 entries, 0 to 15244
Data columns (total 21 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   asset_id                   15245 non-null  int64  
 1   iso3_country               15245 non-null  object 
 2   original_inventory_sector  15245 non-null  object 
 3   start_time                 15245 non-null  object 
 4   end_time                   15245 non-null  object 
 5   temporal_granularity       15245 non-null  object 
 6   lat_lon                    0 non-null      float64
 7   gas                        15245 non-null  object 
 8   emissions_quantity         15245 non-null  float64
 9   emissions_factor           15245 non-null  float64
 10  emissions_factor_units     15135 non-null  object 
 11  capacity                   15245 non-null  float64
 12  capacity_units             15135 non-null  object 
 13  capacity_factor            0 non-null      flo