### This is a small example of how the data cleaning works. It includes examples for one csv

In [2]:
%load_ext autoreload

%autoreload 2

In [3]:
from fao_ada.pre_processing.load import load_and_clean_df
DIR = "../"
csv_file = DIR +"data/emissions_agriculture/Emissions_Agriculture_Rice_Cultivation_E_All_Data_(Normalized).csv"
#item_groups = DIR + "data/item_groups/Emissions_Agriculture_Crop_residues.csv"
country_groups = DIR  + "data/country_groups.csv"

df = load_and_clean_df(csv_file, country_groups, None)

In [5]:
df.to_csv("../data_cleaned/emissions_agriculture/Emissions_Agriculture_Rice_Cultivation_E_All_Data_(Normalized).csv")

This dataframe now only contains `itemcode` that are not `itemgroups`, and all the `itemcodes` are within at least one `itemgroup`.
It also does not contain any `countrygroup` information.

### 1. Obtaining item groups
 The grouping is done by adding all items that belong to an itemgroup. However, for some elements, this might be a problem as they are not really "addable". For this, you can pass an optional list of elementcodes to drop. 
 The column flag can have two possible values :
 - `M`: means that not all the items of the group are present in this value (i.e. maybe missing value)
 - `C`: Means all the items of the group are present for this datapoint and have a value

In [4]:
from fao_ada.pre_processing.load import load_dataframe
from fao_ada.pre_processing.grouping import groupby_country_groups, groupby_item_groups

In [5]:
grouped_by_item = groupby_item_groups(df, load_dataframe(item_groups))

In [6]:
df.head()

Unnamed: 0,areacode,area,itemcode,item,elementcode,element,year,unit,value
0,2,Afghanistan,44,Barley,72392,Residues (Crop residues),1961,kg of nutrients,5925706.0
1,2,Afghanistan,44,Barley,72392,Residues (Crop residues),1962,kg of nutrients,5925706.0
2,2,Afghanistan,44,Barley,72392,Residues (Crop residues),1963,kg of nutrients,5925706.0
3,2,Afghanistan,44,Barley,72392,Residues (Crop residues),1964,kg of nutrients,5946927.0
4,2,Afghanistan,44,Barley,72392,Residues (Crop residues),1965,kg of nutrients,5946927.0


In [7]:
grouped_by_item

Unnamed: 0,areacode,area,elementcode,element,unit,year,value,itemcode,item,flag
0,1,Armenia,72292,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1992,8.590000e-02,1712,All Crops,M
1,1,Armenia,72292,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1993,8.610000e-02,1712,All Crops,M
2,1,Armenia,72292,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1994,8.600000e-02,1712,All Crops,M
3,1,Armenia,72292,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1995,8.590000e-02,1712,All Crops,M
4,1,Armenia,72292,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1996,8.590000e-02,1712,All Crops,M
...,...,...,...,...,...,...,...,...,...,...
83843,351,China,72392,Residues (Crop residues),kg of nutrients,2015,6.564785e+09,1712,All Crops,C
83844,351,China,72392,Residues (Crop residues),kg of nutrients,2016,6.540011e+09,1712,All Crops,C
83845,351,China,72392,Residues (Crop residues),kg of nutrients,2017,6.569586e+09,1712,All Crops,C
83846,351,China,72392,Residues (Crop residues),kg of nutrients,2030,5.446948e+09,1712,All Crops,C


### 2. Obtaining country groups
Similarly to item grouping, this does essentially the same thing, but on country groups. The same optional `drop_elements` can be passed to drop some elements that are not addable

In [8]:
grouped_by_country = groupby_country_groups(df, load_dataframe(country_groups))

In [9]:
grouped_by_country.head()

Unnamed: 0,itemcode,item,elementcode,element,unit,year,value,areacode,area,flag,index
0,15.0,Wheat,72292.0,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1961.0,0.2334,420,Sub-Saharan Africa,M,
1,15.0,Wheat,72292.0,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1962.0,0.2331,420,Sub-Saharan Africa,M,
2,15.0,Wheat,72292.0,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1963.0,0.2331,420,Sub-Saharan Africa,M,
3,15.0,Wheat,72292.0,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1964.0,0.2332,420,Sub-Saharan Africa,M,
4,15.0,Wheat,72292.0,Implied emission factor for N2O (Crop residues),kg N2O-N/kg N,1965.0,0.2334,420,Sub-Saharan Africa,M,


The same flagging system is in place here