<style>
*
{
	text-align: justify;
	line-height: 1.5;
	font-family: "Arial", sans-serif;
	font-size: 12px;
}

h2, h3, h4, h5, h6
{
	font-family: "Arial", sans-serif;
	font-size: 12px;
	font-weight: bold;
}
h2
{
	font-size: 14px;
}
h1
{
	font-family: "Wingdings", sans-serif;
	font-size: 16px;
}
</style>

## EDA of the cattle and beef exports (1930 - 2020)

<!--
import data_analytics.github as github
print(github.create_jupyter_notebook_header("markcrowe-com", "agriculture-data-analytics", "notebooks/notebook-1-01-eda-irish-beef-exports.ipynb", "master"))
-->
<table style="margin: auto;"><tr><td><a href="https://mybinder.org/v2/gh/markcrowe-com/agriculture-data-analytics/master?filepath=notebooks/notebook-1-01-eda-irish-beef-exports.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder"/></a></td><td>online editors</td><td><a href="https://colab.research.google.com/github/markcrowe-com/agriculture-data-analytics/blob/master/notebooks/notebook-1-01-eda-irish-beef-exports.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a></td></tr></table>

### Objective
The objective is to provide an Exploratory Data Analysis (EDA) of the `cso-tsa04-exports-of-cattle-and-beef-1930-2020-2022-01Jan-13.csv` file provided by the <a href="https://data.cso.ie/table/TSA04" target="_new">CSO: TSA04 Table</a>. The EDA is performed to investigate and clean the data, to spot anomalies.  
### Setup
Import required third party Python libraries, import supporting functions and sets up data source file paths.

In [1]:
# Local
#!pip install -r script/requirements.txt --quiet
# Remote option
#!pip install -r https://github.com/markcrowe-com/data-analytics-project-template/blob/master/notebooks/script/requirements.txt --quiet

In [2]:
from agriculture_data_analytics.project_manager import ProjectArtifactManager, ProjectAssetManager
import data_analytics.github as github
import data_analytics.exploratory_data_analysis_reports as eda_reports
import os
import pandas

In [3]:
artifact_manager = ProjectArtifactManager()
asset_manager = ProjectAssetManager()
artifact_manager.is_remote = asset_manager.is_remote = True
github.display_jupyter_notebook_data_sources([asset_manager.get_cattle_beef_exports_filepath()])
artifact_manager.is_remote = asset_manager.is_remote = False

https://github.com/markcrowe-com/agriculture-data-analytics/assets/cso-tsa04-exports-of-cattle-and-beef-1930-2020-2022-01Jan-13.csv?raw=true


### Working with population estimates CSV file
#### Create Data Frames

In [4]:
beef_export_dataframe = pandas.read_csv(asset_manager.get_cattle_beef_exports_filepath())
beef_export_dataframe.sample(5)

Unnamed: 0,Statistic,Year,State,UNIT,VALUE
83,Exports of Cattle,2013,State,Thousand,180.9
88,Exports of Cattle,2018,State,Thousand,191.98
173,Exports of Beef,2012,State,000 Tonnes,298.0
17,Exports of Cattle,1947,State,Thousand,482.77
101,Exports of Beef,1940,State,000 Tonnes,0.3


#### Renaming Columns

In [7]:
# rename the columns
old_to_new_column_names_dictionary = {
                                      "VALUE" : "Thousand Cattle"
                                     }
beef_export_dataframe = beef_export_dataframe.rename(columns = old_to_new_column_names_dictionary)
beef_export_dataframe.head(0)

Unnamed: 0,Statistic,Year,State,UNIT,Thousand Cattle


### Data Type Analysis Quick View
Print an analysis report of each dataset.  
- Show the top five rows of the data frame as a quick sample.
- Show the data types of each column.
- Report the count of any duplicate rows.
- Report the counts of any missing values.

In [5]:
filename = os.path.basename(asset_manager.get_cattle_beef_exports_filepath())
eda_reports.print_dataframe_analysis_report(beef_export_dataframe, filename)

Unnamed: 0,Statistic,Year,State,UNIT,VALUE
0,Exports of Cattle,1930,State,Thousand,857.88
1,Exports of Cattle,1931,State,Thousand,765.95
2,Exports of Cattle,1932,State,Thousand,645.18
3,Exports of Cattle,1933,State,Thousand,589.86
4,Exports of Cattle,1934,State,Thousand,511.1


Statistic     object
Year           int64
State         object
UNIT          object
VALUE        float64
dtype: object

Statistic    0
Year         0
State        0
UNIT         0
VALUE        0
dtype: int64


In [11]:
beef_export_dataframe = beef_export_dataframe.drop(["UNIT"], axis="columns")

### Restructure table

### Save Artifact
Saving the output of the notebook.

In [13]:
population_dataframe.to_csv(artifact_manager.get_population_eda_filepath(), index=None)

Author &copy; 2021 <a href="https://github.com/markcrowe-com" target="_parent">Mark Crowe</a>. All rights reserved.