# Grouping products

Using the data from the  **product_prices_cleaned.csv** file for provinces, do the following:

1. What was the average monthly price of each commodity?
1. Which product had the highest price volatility over the years?

Use the **product** and **value** columns for analysis.

Additionally:

1. Consider whether any more assumptions are needed for the tasks.
1. Why can this task be done only now, after cleaning the data? Do you think there are any more operations needed?

In [3]:
import pandas as pd
df = pd.read_csv(
                    '../../01_Data/product_prices_cleaned.csv', 
                    sep=';',
                    decimal=','
)

df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product2,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg,Italian head cheese - per 1kg


In [7]:
df['value'] = pd.to_numeric(df['value'], errors='coerce')

In [9]:
average_monthly_prices = df.groupby(['product', 'date'])['value'].mean()
average_monthly_prices

product                                  date   
30% tomato concentrate - per 1kg         1999-1     3.455882
                                         1999-10    3.157059
                                         1999-11    3.272353
                                         1999-12    3.031765
                                         1999-2     3.276471
                                                      ...   
whole pickled cucumbers 0.9l - per 1pc.  2019-5     2.254118
                                         2019-6     2.134706
                                         2019-7     1.823529
                                         2019-8     2.107647
                                         2019-9     2.145882
Name: value, Length: 7559, dtype: float64

In [16]:
price_volatility = df.groupby(['product'])['value'].agg(["min", "max", "mean", "std"]).sort_values(by="std", ascending=False)
price_volatility 

Unnamed: 0_level_0,min,max,mean,std
product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
natural chocolate plain - per 1kg,0.0,25.96,6.063683,7.958616
boneless beef (sirloin) - per 1kg,10.27,36.68,22.039316,7.044737
haddock fillets frozen - per 1kg,0.0,23.03,4.846429,6.503833
beef with bone (rump steak) - per 1kg,7.4,33.05,18.092197,6.313232
Hunter's sausage dried - per 1kg,0.0,29.82,18.433434,6.275892
fresh non-dressed trout - per 1kg,0.0,17.81,4.125824,5.903562
30% tomato concentrate - per 1kg,0.0,15.59,4.019039,4.302276
"salted herring, non-dressed - per 1kg",0.0,12.59,4.070728,3.830786
fresh non-dressed carp - per 1kg,0.0,8.78,2.575275,3.41011
smoked bacon with ribs - per 1kg,0.0,20.94,12.906772,3.206077
