# Aggregation


Using data from file **product_prices_cleaned.csv**, aggregate the data for each product by month and determine the statistics: `min, max, median, mean, std` for prices (**value** column):

1. skip the national data in the analysis,
1. directly on the object from `groupby`,
1. write a loop that will calculate these values for individual provinces.

Use the `agg` method, and aggregate the data with the `'product','date'` column to complete the exercise.

In [1]:
import pandas as pd
df = pd.read_csv(
                    '../../01_Data/product_prices_cleaned.csv', 
                    sep=';',
                    decimal=','
)


df['value'] = pd.to_numeric(df['value'], errors='coerce')

from IPython.display import display

df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product2,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg,Italian head cheese - per 1kg


In [2]:
# Filter out NATIONAL data 

df_out_poland = df.loc[df['province'] != 'POLAND']

display(df_out_poland)

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product2,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg,Italian head cheese - per 1kg
...,...,...,...,...,...,...,...,...,...
128498,SILESIA,,PLN,2,smoked bacon with ribs - per 1kg,15.95,2015-9,smoked bacon with ribs - per 1kg,smoked bacon with ribs - per 1kg
128499,SILESIA,,PLN,2,barley groats sausage - per 1kg,4.50,2004-8,barley groats sausage - per 1kg,barley groats sausage - per 1kg
128500,KUYAVIA-POMERANIA,,PLN,2,pork meat (raw bacon) - per 1kg,12.15,2016-11,pork meat (raw bacon) - per 1kg,pork meat (raw bacon) - per 1kg
128501,ŁÓDŹ,"beet sugar white, bagged - per 1kg",PLN,3,,0.00,2012-5,"beet sugar white, bagged - per 1kg","beet sugar white, bagged - per 1kg"


In [3]:
aggregation_statistics = df_out_poland.groupby(['product', 'province'])['value'].agg(["min", "max", "median", "mean", "std"])

display(aggregation_statistics)

Unnamed: 0_level_0,Unnamed: 1_level_0,min,max,median,mean,std
product,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
30% tomato concentrate - per 1kg,GREATER POLAND,1.14,15.59,10.230,10.357809,2.477641
30% tomato concentrate - per 1kg,HOLY CROSS,0.00,3.81,0.000,0.080518,0.435785
30% tomato concentrate - per 1kg,KUYAVIA-POMERANIA,0.00,9.57,0.000,1.861753,2.809606
30% tomato concentrate - per 1kg,LESSER POLAND,0.00,9.62,7.410,6.177888,3.040344
30% tomato concentrate - per 1kg,LOWER SILESIA,0.00,7.09,0.000,1.461315,2.503964
...,...,...,...,...,...,...
whole pickled cucumbers 0.9l - per 1pc.,SILESIA,0.38,3.37,2.725,2.673452,0.440439
whole pickled cucumbers 0.9l - per 1pc.,SUBCARPATHIA,1.72,3.92,2.575,2.689405,0.481247
whole pickled cucumbers 0.9l - per 1pc.,WARMIA-MASURIA,0.00,2.11,0.000,0.097262,0.352599
whole pickled cucumbers 0.9l - per 1pc.,WEST POMERANIA,0.00,2.63,0.000,0.408770,0.824889


In [4]:
provinces = df_out_poland['province'].unique()
provinces

array(['SUBCARPATHIA', 'ŁÓDŹ', 'KUYAVIA-POMERANIA', 'LOWER SILESIA',
       'WARMIA-MASURIA', 'HOLY CROSS', 'WEST POMERANIA', 'PODLASKIE',
       'GREATER POLAND', 'POMERANIA', 'LESSER POLAND', 'SILESIA',
       'MASOVIA', 'LUBLIN', 'LUBUSZ', 'OPOLE'], dtype=object)