# Harmonized subnational crop statistics of the EU

## Data access

The data is available from [Agri4Cast Data Portal](https://agri4cast.jrc.ec.europa.eu/DataPortal/RequestDataResource.aspx?idResource=36&o=&r=n) of the European Commission's Joint Research Centre. You need to create an account to access the data.

## Data exploration

In [5]:
import pandas as pd

path_to_data = "C:/Users/paude006/Documents/AgML/Data"
filename = "crop_statistics_EU.csv"

crop_stats_df = pd.read_csv(path_to_data + "/" + filename,
                            delimiter=";",
                            header=0)
crop_stats_df = crop_stats_df.drop(columns=["REGION_TRANSFORMATION", "CROP_TRANSFORMATION", "CALCULATED_VALUE"])
crop_area_df = crop_stats_df[crop_stats_df["TYPE"] == "Area"]
print(crop_area_df.head(5).to_string())
print("\n")

crop_yield_df = crop_stats_df[crop_stats_df["TYPE"] == "Yield"]
crop_yield_df = crop_yield_df.dropna(subset=["VALUE"])
print(crop_yield_df.head(5).to_string())
print("\n")

countries = crop_yield_df["IDREGION"].str[:2].unique()
for cn in countries:
  crop_yield_cn_df = crop_yield_df[crop_yield_df["IDREGION"].str[:2] == cn]
  if (len(crop_yield_cn_df.index) <= 1):
    continue

  min_year = crop_yield_cn_df["YEAR"].min()
  max_year = crop_yield_cn_df["YEAR"].max()
  num_regions = crop_yield_cn_df[crop_yield_cn_df["YEAR"] == max_year]["IDREGION"].count()
  num_data_points = crop_yield_cn_df["YEAR"].count()
  print(cn, min_year, max_year, num_regions, num_data_points)
  print(crop_yield_cn_df[crop_yield_cn_df['YEAR'] == max_year].head(5).to_string())

   YEAR IDREGION CROP_NAME  TYPE    VALUE            SOURCE COHERENCE_BETWEEN_A_P_Y ZERO_SET_AS_NULL COHERENCE_TOTAL_WHEAT
0  1990     AT11     C1110  Area  36703.0          National                       Y              NaN                   NaN
1  1991     AT11     C1110  Area  35572.0          National                       Y              NaN                   NaN
2  1992     AT11     C1110  Area  28300.0  Eurostat (REGIO)                       N              NaN                     Y
3  1993     AT11     C1110  Area  29300.0  Eurostat (REGIO)                       N              NaN                     Y
4  1994     AT11     C1110  Area  28800.0  Eurostat (REGIO)                       N              NaN                     Y


    YEAR IDREGION CROP_NAME   TYPE  VALUE    SOURCE COHERENCE_BETWEEN_A_P_Y ZERO_SET_AS_NULL COHERENCE_TOTAL_WHEAT
56  1990     AT11     C1110  Yield   4.59  National                       Y              NaN                   NaN
57  1991     AT11     C1110  Y

## Data preparation

Filter based on
* Coherence tests (e.g. COHERENCE_BETWEEN_A_P_Y, COHERENCE_TOTAL_WHEAT)
* Crops of interest
* Countries of interest (data size too small)

In [6]:
crop_yield_df = crop_yield_df[(crop_yield_df["COHERENCE_BETWEEN_A_P_Y"] == "Y") &
                              (crop_yield_df["COHERENCE_TOTAL_WHEAT"] == "Y")]
countries = crop_yield_df["IDREGION"].str[:2].unique()
for cn in countries:
  crop_yield_cn_df = crop_yield_df[crop_yield_df["IDREGION"].str[:2] == cn]
  if (len(crop_yield_df.index) <= 1):
    continue

  min_year = crop_yield_cn_df["YEAR"].min()
  max_year = crop_yield_cn_df["YEAR"].max()
  num_regions = crop_yield_cn_df[crop_yield_cn_df["YEAR"] == max_year]["IDREGION"].count()
  num_data_points = crop_yield_cn_df["YEAR"].count()
  print(cn, min_year, max_year, num_regions, num_data_points)
  print(crop_yield_cn_df[crop_yield_cn_df['YEAR'] == max_year].head(5).to_string())

AT 1993 2017 1 62
     YEAR IDREGION CROP_NAME   TYPE  VALUE    SOURCE COHERENCE_BETWEEN_A_P_Y ZERO_SET_AS_NULL COHERENCE_TOTAL_WHEAT
167  2017     AT13     C1110  Yield   3.86  National                       Y              NaN                     Y
BE 1993 2017 8 261
      YEAR IDREGION CROP_NAME   TYPE  VALUE            SOURCE COHERENCE_BETWEEN_A_P_Y ZERO_SET_AS_NULL COHERENCE_TOTAL_WHEAT
970   2017     BE10     C1110  Yield   8.77  Eurostat (REGIO)                       Y              NaN                     Y
1035  2017     BE22     C1110  Yield   8.85          National                       Y              NaN                     Y
1075  2017     BE21     C1110  Yield   7.66          National                       Y              NaN                     Y
1188  2017     BE24     C1110  Yield   8.81          National                       Y              NaN                     Y
1343  2017     BE31     C1110  Yield   9.09          National                       Y              NaN    