# Grouping products

Using the data from the  **product_prices_cleaned.csv** file for provinces, do the following:

1. What was the average monthly price of each commodity?
1. Which product had the highest price volatility over the years?

Use the **product** and **value** columns for analysis.

Additionally:

1. Consider whether any more assumptions are needed for the tasks.
1. Why can this task be done only now, after cleaning the data? Do you think there are any more operations needed?

In [10]:
import pandas as pd

In [11]:
data = pd.read_csv(
  '../../01_Data/product_prices_cleaned.csv',
  sep=';',
  encoding='UTF-8',
  decimal='.'
)

In [12]:
data_by_date = data.groupby('date')

In [13]:
data_by_date

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000019F667A2D20>

In [14]:
for (key, data_date) in data_by_date:
    print(key)
    print(data_date)

1999-1
                 province                    product_types currency  \
133        WARMIA-MASURIA                              NaN      PLN   
301     KUYAVIA-POMERANIA                              NaN      PLN   
1109            PODLASKIE                              NaN      PLN   
1514               POLAND                              NaN      PLN   
1636                OPOLE                              NaN      PLN   
...                   ...                              ...      ...   
126270      LOWER SILESIA                              NaN      PLN   
126697               ŁÓDŹ                              NaN      PLN   
126707              OPOLE  fresh chicken eggs - per 10pcs.      PLN   
127579      LOWER SILESIA      apple juice, boxed - per 1l      PLN   
127721         HOLY CROSS                              NaN      PLN   

        product_group_id                              product_line  value  \
133                    2                 pork ham cooked - per 

**Display the available groups:**

In [15]:
data_by_date.groups.keys()

dict_keys(['1999-1', '1999-10', '1999-11', '1999-12', '1999-2', '1999-3', '1999-4', '1999-5', '1999-6', '1999-7', '1999-8', '1999-9', '2000-1', '2000-10', '2000-11', '2000-12', '2000-2', '2000-3', '2000-4', '2000-5', '2000-6', '2000-7', '2000-8', '2000-9', '2001-1', '2001-10', '2001-11', '2001-12', '2001-2', '2001-3', '2001-4', '2001-5', '2001-6', '2001-7', '2001-8', '2001-9', '2002-1', '2002-10', '2002-11', '2002-12', '2002-2', '2002-3', '2002-4', '2002-5', '2002-6', '2002-7', '2002-8', '2002-9', '2003-1', '2003-10', '2003-11', '2003-12', '2003-2', '2003-3', '2003-4', '2003-5', '2003-6', '2003-7', '2003-8', '2003-9', '2004-1', '2004-10', '2004-11', '2004-12', '2004-2', '2004-3', '2004-4', '2004-5', '2004-6', '2004-7', '2004-8', '2004-9', '2005-1', '2005-10', '2005-11', '2005-12', '2005-2', '2005-3', '2005-4', '2005-5', '2005-6', '2005-7', '2005-8', '2005-9', '2006-1', '2006-10', '2006-11', '2006-12', '2006-2', '2006-3', '2006-4', '2006-5', '2006-6', '2006-7', '2006-8', '2006-9', '2007

**get_group() - selecting the subset we are interested in.**

In [16]:
data_by_date.get_group('1999-1')

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
133,WARMIA-MASURIA,,PLN,2,pork ham cooked - per 1kg,15.43,1999-1,pork ham cooked - per 1kg
301,KUYAVIA-POMERANIA,,PLN,4,buckwheat groats roasted whole - per 1kg,0.00,1999-1,buckwheat groats roasted whole - per 1kg
1109,PODLASKIE,,PLN,4,plain mixed bread (wheat-rye) - per 1kg,,1999-1,plain mixed bread (wheat-rye) - per 1kg
1514,POLAND,,PLN,4,buckwheat groats roasted whole - per 1kg,2.99,1999-1,buckwheat groats roasted whole - per 1kg
1636,OPOLE,,PLN,2,Hunter's sausage dried - per 1kg,16.61,1999-1,Hunter's sausage dried - per 1kg
...,...,...,...,...,...,...,...,...
126270,LOWER SILESIA,,PLN,2,barley groats sausage - per 1kg,3.27,1999-1,barley groats sausage - per 1kg
126697,ŁÓDŹ,,PLN,2,fresh non-dressed carp - per 1kg,0.00,1999-1,fresh non-dressed carp - per 1kg
126707,OPOLE,fresh chicken eggs - per 10pcs.,PLN,3,,0.29,1999-1,fresh chicken eggs - per 10pcs.
127579,LOWER SILESIA,"apple juice, boxed - per 1l",PLN,1,,1.84,1999-1,"apple juice, boxed - per 1l"


In [17]:
data

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-3,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-2,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-2,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-3,Italian head cheese - per 1kg
...,...,...,...,...,...,...,...,...
128515,SILESIA,,PLN,2,smoked bacon with ribs - per 1kg,15.95,2015-9,smoked bacon with ribs - per 1kg
128516,SILESIA,,PLN,2,barley groats sausage - per 1kg,4.50,2004-8,barley groats sausage - per 1kg
128517,KUYAVIA-POMERANIA,,PLN,2,pork meat (raw bacon) - per 1kg,12.15,2016-11,pork meat (raw bacon) - per 1kg
128518,ŁÓDŹ,"beet sugar white, bagged - per 1kg",PLN,3,,0.00,2012-5,"beet sugar white, bagged - per 1kg"


In [18]:
data_by_product = data.groupby(by=["product","date"])

In [19]:
data_by_product.groups.keys()

dict_keys([('30% tomato concentrate - per 1kg', '1999-1'), ('30% tomato concentrate - per 1kg', '1999-10'), ('30% tomato concentrate - per 1kg', '1999-11'), ('30% tomato concentrate - per 1kg', '1999-12'), ('30% tomato concentrate - per 1kg', '1999-2'), ('30% tomato concentrate - per 1kg', '1999-3'), ('30% tomato concentrate - per 1kg', '1999-4'), ('30% tomato concentrate - per 1kg', '1999-5'), ('30% tomato concentrate - per 1kg', '1999-6'), ('30% tomato concentrate - per 1kg', '1999-7'), ('30% tomato concentrate - per 1kg', '1999-8'), ('30% tomato concentrate - per 1kg', '1999-9'), ('30% tomato concentrate - per 1kg', '2000-1'), ('30% tomato concentrate - per 1kg', '2000-10'), ('30% tomato concentrate - per 1kg', '2000-11'), ('30% tomato concentrate - per 1kg', '2000-12'), ('30% tomato concentrate - per 1kg', '2000-2'), ('30% tomato concentrate - per 1kg', '2000-3'), ('30% tomato concentrate - per 1kg', '2000-4'), ('30% tomato concentrate - per 1kg', '2000-5'), ('30% tomato concentrat

In [20]:
data_by_product.get_group(('30% tomato concentrate - per 1kg', '1999-1'))['value'].mean()

3.4558823529411766

In [21]:
data_by_product.get_group(("Backpacker's canned pork meat - per 300 g", '2008-4'))['value'].median()

2.26

In [22]:
for (key, data_by) in data_by_product:
    print(key, data_by['value'].mean()) 

('30% tomato concentrate - per 1kg', '1999-1') 3.4558823529411766
('30% tomato concentrate - per 1kg', '1999-10') 3.157058823529412
('30% tomato concentrate - per 1kg', '1999-11') 3.2723529411764707
('30% tomato concentrate - per 1kg', '1999-12') 3.031764705882353
('30% tomato concentrate - per 1kg', '1999-2') 3.2764705882352945
('30% tomato concentrate - per 1kg', '1999-3') 3.0511764705882354
('30% tomato concentrate - per 1kg', '1999-4') 2.9288235294117646
('30% tomato concentrate - per 1kg', '1999-5') 3.2405882352941173
('30% tomato concentrate - per 1kg', '1999-6') 2.9058823529411764
('30% tomato concentrate - per 1kg', '1999-7') 3.07
('30% tomato concentrate - per 1kg', '1999-8') 3.0358823529411763
('30% tomato concentrate - per 1kg', '1999-9') 3.192941176470588
('30% tomato concentrate - per 1kg', '2000-1') 2.831764705882353
('30% tomato concentrate - per 1kg', '2000-10') 2.420588235294118
('30% tomato concentrate - per 1kg', '2000-11') 2.6805882352941177
('30% tomato concentrate

In [23]:
data_by_product[['value']].agg(['min', 'max', 'mean'])

Unnamed: 0_level_0,Unnamed: 1_level_0,value,value,value
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max,mean
product,date,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
30% tomato concentrate - per 1kg,1999-1,0.0,8.03,3.455882
30% tomato concentrate - per 1kg,1999-10,0.0,7.46,3.157059
30% tomato concentrate - per 1kg,1999-11,0.0,8.21,3.272353
30% tomato concentrate - per 1kg,1999-12,0.0,7.87,3.031765
30% tomato concentrate - per 1kg,1999-2,0.0,7.73,3.276471
...,...,...,...,...
whole pickled cucumbers 0.9l - per 10pcs.,2019-5,0.0,4.83,2.254118
whole pickled cucumbers 0.9l - per 10pcs.,2019-6,0.0,5.37,2.134706
whole pickled cucumbers 0.9l - per 10pcs.,2019-7,0.0,3.76,1.823529
whole pickled cucumbers 0.9l - per 10pcs.,2019-8,0.0,3.92,2.107647


In [24]:
max_price_per_product = data_by_product['value'].max()

In [25]:
max_price_product = max_price_per_product.idxmax()

In [26]:
max_price_product

('30% tomato concentrate - per 1kg', '2003-1')

In [27]:
max_price_value = max_price_per_product.max()

In [28]:
max_price_value

3000.0

In [29]:
print(f"Highest price: {max_price_value} - product: {max_price_product}")

Highest price: 3000.0 - product: ('30% tomato concentrate - per 1kg', '2003-1')


In [30]:
price_volatility = data_by_product['value'].std()

In [31]:
volatility_by_product = data.groupby('product')['value'].std()

In [32]:
most_volatile_product = volatility_by_product.idxmax()
highest_volatility = volatility_by_product.max()

In [39]:
print(f"Highest volatility: {highest_volatility} -  product: {most_volatile_product}")

Highest volatility: 188.42514141537558 -  product: 30% tomato concentrate - per 1kg
