# Kontingenční tabulky (Pivot tables)

Kontingenční tabulka je známý nástroj z tabulkových procesorů (Excel, Google Sheets). Umožňuje přetvořit data do přehlednější formy vhodné pro analýzu.

Kontingenční tabulka dokáže agregovat data pomocí průměru, součtu nebo jiné statistiky.

## Pivot v Pandas

Pandas nabízí dvě metody pro práci s kontingenčními tabulkami:

1. **`pivot`** - transformuje DataFrame pomocí hodnot pro index/sloupce. Nepodporuje agregaci, pouze reorganizuje data.

2. **`pivot_table`** - zobecněná verze pivot, podobná té z tabulkových procesorů. Podporuje agregace, ale pracuje pouze s číselnými hodnotami.

---
## Načtení dat

In [1]:
import pandas as pd

In [4]:
# Načtení dat ze souboru
df = pd.read_csv(
    '../Data/product_prices_cleaned.csv',  # cesta k souboru s daty
    sep=';',  # oddělovač sloupců
)

In [5]:
# Převod sloupce date na datetime
df['date'] = pd.to_datetime(df['date'], format='%Y-%m')

In [6]:
df.head()

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg
2,KUYAVIA-POMERANIA,,PLN,2,barley groats sausage - per 1kg,3.55,2019-12-01,barley groats sausage - per 1kg
3,LOWER SILESIA,,PLN,2,dressed chickens - per 1kg,6.14,2019-02-01,dressed chickens - per 1kg
4,WARMIA-MASURIA,,PLN,2,Italian head cheese - per 1kg,5.63,2002-03-01,Italian head cheese - per 1kg


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128503 entries, 0 to 128502
Data columns (total 8 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   province          128503 non-null  object        
 1   product_types     34255 non-null   object        
 2   currency          128503 non-null  object        
 3   product_group_id  128503 non-null  int64         
 4   product_line      94248 non-null   object        
 5   value             119935 non-null  float64       
 6   date              128503 non-null  datetime64[ns]
 7   product           128503 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(5)
memory usage: 7.8+ MB


---
## Metoda `pivot`

Metoda `pivot` slouží k transformaci dat. Nepodporuje agregaci, ale může pracovat s libovolným typem dat.

**Parametry:**
- `data` - DataFrame, který chceme transformovat
- `index` - sloupce, které budou použity jako řádky/indexy
- `columns` - sloupce, které budou použity jako názvy sloupců
- `values` - sloupce, které se zobrazí jako hodnoty (na průsečíku řádku a sloupce)

**Důležité:** Sloupce předané do `index` a `columns` musí dohromady tvořit unikátní klíč (žádné duplicity). Jinak dostaneme `ValueError`.

### Příklad: Zobrazení hodnoty produktu v provincii podle data

In [11]:
# Zobrazení hodnoty produktu v provincii v čase
pd.pivot(
    data=df,  # DataFrame, který chceme "otočit"
    index=['province', 'product'],  # sloupce dat, které budou zobrazeny jako řádky
    columns=['date'],  # sloupec dat, který bude zobrazen jako sloupce
    values=['value']  # co zobrazit jako hodnoty
)

Unnamed: 0_level_0,Unnamed: 1_level_0,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value
Unnamed: 0_level_1,date,1999-01-01,1999-02-01,1999-03-01,1999-04-01,1999-05-01,1999-06-01,1999-07-01,1999-08-01,1999-09-01,1999-10-01,...,2019-03-01,2019-04-01,2019-05-01,2019-06-01,2019-07-01,2019-08-01,2019-09-01,2019-10-01,2019-11-01,2019-12-01
province,product,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
GREATER POLAND,30% tomato concentrate - per 1kg,7.56,7.09,7.32,7.84,6.84,7.41,7.28,8.27,7.23,7.02,...,6.78,5.67,1.14,3.69,9.71,5.78,4.96,6.12,8.33,6.03
GREATER POLAND,Backpacker's canned pork meat - per 300 g,2.92,2.88,2.92,2.85,2.81,3.05,3.20,3.06,2.76,3.08,...,2.97,2.24,2.50,3.06,2.67,3.25,2.54,3.26,2.57,2.87
GREATER POLAND,Hunter's sausage dried - per 1kg,15.50,16.31,15.91,15.80,15.52,16.49,15.92,16.27,16.27,16.42,...,21.59,17.41,21.49,17.95,19.25,19.69,19.45,22.95,18.12,20.48
GREATER POLAND,Italian head cheese - per 1kg,6.14,5.78,5.84,6.00,5.97,6.07,6.02,6.02,6.33,5.67,...,10.13,5.56,8.75,7.90,9.52,7.97,7.56,9.28,8.94,10.67
GREATER POLAND,Masurian barley groats - per 1kg,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,2.44,1.57,3.59,2.33,2.39,1.84,1.96,1.19,1.56,3.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ŁÓDŹ,pork meat with bone (shoulder) - per 1kg,8.07,8.62,8.39,8.28,8.29,8.49,8.21,8.64,8.44,8.11,...,10.24,10.57,10.52,8.79,9.99,9.28,11.14,10.49,9.39,9.67
ŁÓDŹ,"salted herring, non-dressed - per 1kg",0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,5.32,5.28,6.99,4.91,5.94,4.98,7.18,5.36,5.25,7.32
ŁÓDŹ,smoked bacon with ribs - per 1kg,8.01,8.93,9.01,8.12,8.74,8.36,8.91,8.98,8.23,8.94,...,16.80,14.94,12.34,7.98,7.38,7.16,7.71,14.79,7.96,7.67
ŁÓDŹ,white table salt bagged - per 1kg,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00


### Otázka k zamyšlení

Co se stane, když kombinace `index` a `columns` nebude unikátní? Jakou chybu Pandas vyhodí?

---
## Metoda `pivot_table`

Metoda podobná kontingenčním tabulkám z tabulkových procesorů. Umožňuje vypočítat statistiky - souhrny DataFramu.

Můžeme použít pouze číselné sloupce, ale na rozdíl od `pivot` nemusí být dvojice (index, columns) unikátní.

**Parametry:**
- `data` - DataFrame, který chceme zpracovat
- `values` - sloupce, které chceme agregovat
- `index` - sloupce, které budou použity jako řádky/indexy
- `columns` - sloupce, které budou použity jako názvy sloupců
- `aggfunc` - funkce pro agregaci hodnot, výchozí je průměr (`'mean'`)

### Příklad: Průměrná cena produktu v provinciích

In [12]:
# Určení průměrné ceny produktu v provinciích
pd.pivot_table(
    data=df,
    index=['product'],
    columns=['province'],
    values=['value']
)

Unnamed: 0_level_0,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value
province,GREATER POLAND,HOLY CROSS,KUYAVIA-POMERANIA,LESSER POLAND,LOWER SILESIA,LUBLIN,LUBUSZ,MASOVIA,OPOLE,PODLASKIE,POLAND,POMERANIA,SILESIA,SUBCARPATHIA,WARMIA-MASURIA,WEST POMERANIA,ŁÓDŹ
product,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
30% tomato concentrate - per 1kg,10.357809,0.080518,1.861753,6.177888,1.461315,0.0,0.118486,8.097689,0.0,1.652869,9.203745,7.732191,5.289482,7.934303,2.722948,0.0,5.632669
Backpacker's canned pork meat - per 300 g,2.203135,0.594127,1.250675,0.725952,0.911905,2.605159,0.269802,2.044484,0.185595,2.463968,2.51246,0.926667,1.889802,2.372222,2.765714,1.103135,2.444365
Hunter's sausage dried - per 1kg,19.209762,12.305635,19.539405,19.095,20.095238,19.861389,18.509325,22.085556,19.293571,21.580714,20.179325,22.230397,18.401667,0.914802,20.232897,20.227897,19.605794
Italian head cheese - per 1kg,8.272698,8.178968,8.205992,7.780159,8.894841,7.812659,8.394484,9.08754,9.156349,8.154206,8.407262,8.787579,9.005079,8.956627,7.518135,8.107579,8.812778
Masurian barley groats - per 1kg,1.311667,1.577024,1.086468,1.021151,0.195556,1.597937,0.072381,1.289405,0.0,1.503651,1.557421,1.000397,0.029762,0.022302,0.095754,1.60369,0.078492
"Poznan wheat flour, bagged - per 1kg",0.833056,0.585159,0.461429,1.079484,0.535913,0.992222,0.334881,0.197063,0.058175,0.019048,1.031349,0.687302,0.041984,0.745,0.815794,0.607381,0.567817
"apple juice, boxed - per 1l",0.342897,0.045119,0.228254,1.606429,0.343452,0.065754,0.040952,2.373016,0.0,0.155675,1.913929,0.359683,1.450714,0.051349,0.039841,0.245675,1.199365
barley groats sausage - per 1kg,6.053611,5.195635,5.633095,5.518532,6.397103,6.060992,6.018016,6.553135,6.547024,6.571746,6.028651,6.079008,6.106468,5.196905,5.954167,6.128056,5.96254
beef with bone (rump steak) - per 1kg,19.776587,15.786548,18.552222,16.687976,17.608452,18.37627,18.77627,18.408571,18.699563,16.713849,18.134286,17.736548,17.759127,17.563413,19.441111,20.450714,17.095833
"beet sugar white, bagged - per 1kg",2.100952,0.822063,2.343452,0.856706,2.359286,1.177738,0.422897,1.541627,1.078175,0.484563,2.338492,0.499246,1.128929,1.111429,0.034405,0.590437,1.086825


### Příklad: Medián cen v provinciích

In [13]:
# Určení mediánu cen v provinciích
pd.pivot_table(
    data=df,
    index=['product'],
    columns=['province'],
    values=['value'],
    aggfunc='median'
)

Unnamed: 0_level_0,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value
province,GREATER POLAND,HOLY CROSS,KUYAVIA-POMERANIA,LESSER POLAND,LOWER SILESIA,LUBLIN,LUBUSZ,MASOVIA,OPOLE,PODLASKIE,POLAND,POMERANIA,SILESIA,SUBCARPATHIA,WARMIA-MASURIA,WEST POMERANIA,ŁÓDŹ
product,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
30% tomato concentrate - per 1kg,10.23,0.0,0.0,7.41,0.0,0.0,0.0,8.93,0.0,0.0,9.17,9.19,5.72,8.53,0.0,0.0,6.39
Backpacker's canned pork meat - per 300 g,2.17,0.0,0.185,0.0,0.0,3.15,0.0,2.325,0.0,2.325,2.53,0.0,2.435,2.69,2.65,0.0,2.535
Hunter's sausage dried - per 1kg,18.725,18.15,20.925,20.1,20.085,19.4,18.285,21.75,18.96,22.075,19.79,22.06,17.825,0.0,20.025,20.02,20.345
Italian head cheese - per 1kg,7.965,7.97,7.47,7.22,7.98,7.785,7.72,9.05,8.745,8.215,7.925,8.655,8.165,8.285,7.05,7.49,8.925
Masurian barley groats - per 1kg,0.325,1.595,0.955,1.38,0.0,1.915,0.0,1.27,0.0,1.345,1.415,1.21,0.0,0.0,0.0,1.61,0.0
"Poznan wheat flour, bagged - per 1kg",0.93,0.835,0.405,1.05,0.61,1.0,0.0,0.0,0.0,0.0,1.02,0.81,0.0,0.91,1.02,0.735,0.76
"apple juice, boxed - per 1l",0.0,0.0,0.0,1.7,0.0,0.0,0.0,2.355,0.0,0.0,1.92,0.0,1.62,0.0,0.0,0.0,1.615
barley groats sausage - per 1kg,5.97,5.205,5.25,5.18,6.375,5.855,5.785,6.675,6.7,6.035,6.135,5.96,5.755,4.85,6.12,5.565,6.155
beef with bone (rump steak) - per 1kg,20.39,16.315,18.79,17.105,16.775,17.64,20.06,18.66,18.785,17.285,18.22,17.305,17.375,17.51,19.84,19.735,17.48
"beet sugar white, bagged - per 1kg",2.18,0.0,2.31,0.0,2.35,1.315,0.0,2.105,0.405,0.0,2.29,0.0,0.53,0.335,0.0,0.0,0.0


---
## Úloha: Oprav chybu v kódu

V následujícím kódu je chyba. Najdi ji a oprav.

In [17]:
# OPRAV CHYBU: Chceme vytvořit pivot tabulku s průměrnou cenou produktů podle roku
pd.pivot_table(
    data=df,
    index='product', 
    columns='province', 
    values='value',  
    aggfunc='mean'
)

province,GREATER POLAND,HOLY CROSS,KUYAVIA-POMERANIA,LESSER POLAND,LOWER SILESIA,LUBLIN,LUBUSZ,MASOVIA,OPOLE,PODLASKIE,POLAND,POMERANIA,SILESIA,SUBCARPATHIA,WARMIA-MASURIA,WEST POMERANIA,ŁÓDŹ
product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
30% tomato concentrate - per 1kg,10.357809,0.080518,1.861753,6.177888,1.461315,0.0,0.118486,8.097689,0.0,1.652869,9.203745,7.732191,5.289482,7.934303,2.722948,0.0,5.632669
Backpacker's canned pork meat - per 300 g,2.203135,0.594127,1.250675,0.725952,0.911905,2.605159,0.269802,2.044484,0.185595,2.463968,2.51246,0.926667,1.889802,2.372222,2.765714,1.103135,2.444365
Hunter's sausage dried - per 1kg,19.209762,12.305635,19.539405,19.095,20.095238,19.861389,18.509325,22.085556,19.293571,21.580714,20.179325,22.230397,18.401667,0.914802,20.232897,20.227897,19.605794
Italian head cheese - per 1kg,8.272698,8.178968,8.205992,7.780159,8.894841,7.812659,8.394484,9.08754,9.156349,8.154206,8.407262,8.787579,9.005079,8.956627,7.518135,8.107579,8.812778
Masurian barley groats - per 1kg,1.311667,1.577024,1.086468,1.021151,0.195556,1.597937,0.072381,1.289405,0.0,1.503651,1.557421,1.000397,0.029762,0.022302,0.095754,1.60369,0.078492
"Poznan wheat flour, bagged - per 1kg",0.833056,0.585159,0.461429,1.079484,0.535913,0.992222,0.334881,0.197063,0.058175,0.019048,1.031349,0.687302,0.041984,0.745,0.815794,0.607381,0.567817
"apple juice, boxed - per 1l",0.342897,0.045119,0.228254,1.606429,0.343452,0.065754,0.040952,2.373016,0.0,0.155675,1.913929,0.359683,1.450714,0.051349,0.039841,0.245675,1.199365
barley groats sausage - per 1kg,6.053611,5.195635,5.633095,5.518532,6.397103,6.060992,6.018016,6.553135,6.547024,6.571746,6.028651,6.079008,6.106468,5.196905,5.954167,6.128056,5.96254
beef with bone (rump steak) - per 1kg,19.776587,15.786548,18.552222,16.687976,17.608452,18.37627,18.77627,18.408571,18.699563,16.713849,18.134286,17.736548,17.759127,17.563413,19.441111,20.450714,17.095833
"beet sugar white, bagged - per 1kg",2.100952,0.822063,2.343452,0.856706,2.359286,1.177738,0.422897,1.541627,1.078175,0.484563,2.338492,0.499246,1.128929,1.111429,0.034405,0.590437,1.086825


---
## Úloha: Doplň kód

Doplň chybějící parametry v následujícím kódu tak, aby pivot tabulka zobrazila **minimální ceny** **produktů** v jednotlivých **provinciích**.

In [19]:
df.columns

Index(['province', 'product_types', 'currency', 'product_group_id',
       'product_line', 'value', 'date', 'product'],
      dtype='object')

In [None]:
# DOPLŇ: Vytvoř pivot tabulku s minimální cenou produktů v provinciích
pd.pivot_table(
    data=df,
    index=___,  # produkty jako řádky
    columns=___,  # provincie jako sloupce
    values=___,  # hodnota ceny
    aggfunc=___  # minimální hodnota
)


In [21]:
pd.pivot_table(
    data = df,
    index = 'product',
    columns = 'province',
    values = 'value',
    aggfunc= 'min'
)

province,GREATER POLAND,HOLY CROSS,KUYAVIA-POMERANIA,LESSER POLAND,LOWER SILESIA,LUBLIN,LUBUSZ,MASOVIA,OPOLE,PODLASKIE,POLAND,POMERANIA,SILESIA,SUBCARPATHIA,WARMIA-MASURIA,WEST POMERANIA,ŁÓDŹ
product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
30% tomato concentrate - per 1kg,1.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.89,0.0,0.0,0.0,0.0,0.0,0.0
Backpacker's canned pork meat - per 300 g,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,1.9,0.0,0.0
Hunter's sausage dried - per 1kg,15.12,0.0,0.0,0.0,14.53,14.68,14.88,17.29,3.02,15.43,16.13,15.32,0.61,0.0,15.64,16.56,14.42
Italian head cheese - per 1kg,5.56,5.14,5.53,4.54,5.43,4.92,4.94,5.49,5.55,4.84,5.47,4.8,5.42,5.27,0.0,5.11,5.46
Masurian barley groats - per 1kg,0.0,1.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.01,0.98,0.0,0.0,0.0,0.0,0.89,0.0
"Poznan wheat flour, bagged - per 1kg",0.0,0.0,0.0,0.71,0.0,0.0,0.0,0.0,0.0,0.0,0.63,0.0,0.0,0.0,0.0,0.0,0.0
"apple juice, boxed - per 1l",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.68,0.0,0.0,1.34,0.0,0.0,0.0,0.0,0.0,0.0
barley groats sausage - per 1kg,3.18,3.13,3.32,3.18,2.96,3.13,3.35,3.41,3.27,3.28,3.12,3.16,3.46,2.88,3.1,3.48,3.47
beef with bone (rump steak) - per 1kg,8.32,7.42,7.74,8.41,8.4,8.49,7.53,8.07,8.4,7.42,8.13,7.4,7.76,8.2,7.79,8.17,8.11
"beet sugar white, bagged - per 1kg",0.0,0.0,1.46,0.0,1.48,0.0,0.0,0.0,0.0,0.0,1.48,0.0,0.0,0.0,0.0,0.0,0.0


---
## Praktické úlohy

Použij data ze souboru **product_prices_cleaned.csv** a vyřeš následující úlohy:

### Úloha 1

Vytvoř kontingenční tabulku, kde řádky budou **roky** a sloupce názvy zboží = **product**. Jako hodnotu použij **průměrnou** cenu produktu.

**Nápověda:** Pro extrakci roku z data můžeš použít `df['date'].dt.year`

In [22]:
# Tvůj kód zde:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128503 entries, 0 to 128502
Data columns (total 8 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   province          128503 non-null  object        
 1   product_types     34255 non-null   object        
 2   currency          128503 non-null  object        
 3   product_group_id  128503 non-null  int64         
 4   product_line      94248 non-null   object        
 5   value             119935 non-null  float64       
 6   date              128503 non-null  datetime64[ns]
 7   product           128503 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(5)
memory usage: 7.8+ MB


In [24]:
df['year'] = df['date'].dt.year

In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128503 entries, 0 to 128502
Data columns (total 9 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   province          128503 non-null  object        
 1   product_types     34255 non-null   object        
 2   currency          128503 non-null  object        
 3   product_group_id  128503 non-null  int64         
 4   product_line      94248 non-null   object        
 5   value             119935 non-null  float64       
 6   date              128503 non-null  datetime64[ns]
 7   product           128503 non-null  object        
 8   year              128503 non-null  int32         
dtypes: datetime64[ns](1), float64(1), int32(1), int64(1), object(5)
memory usage: 8.3+ MB


In [26]:
df.head(2)

Unnamed: 0,province,product_types,currency,product_group_id,product_line,value,date,product,year
0,SUBCARPATHIA,,PLN,2,pork ham cooked - per 1kg,21.37,2013-03-01,pork ham cooked - per 1kg,2013
1,ŁÓDŹ,,PLN,4,bread - per 1kg,,2018-02-01,bread - per 1kg,2018


In [29]:
pd.pivot_table(
    data = df,
    index = 'year',
    columns = 'product',
    values = 'value'
)

product,30% tomato concentrate - per 1kg,Backpacker's canned pork meat - per 300 g,Hunter's sausage dried - per 1kg,Italian head cheese - per 1kg,Masurian barley groats - per 1kg,"Poznan wheat flour, bagged - per 1kg","apple juice, boxed - per 1l",barley groats sausage - per 1kg,beef with bone (rump steak) - per 1kg,"beet sugar white, bagged - per 1kg",...,plain mixed bread (wheat-rye) - per 1kg,pork meat (raw bacon) - per 1kg,pork with bone (center-cut pork chop) - per 1kg,pork belly cooked - per 1kg,pork ham cooked - per 1kg,pork meat with bone (shoulder) - per 1kg,"salted herring, non-dressed - per 1kg",smoked bacon with ribs - per 1kg,white table salt bagged - per 1kg,whole pickled cucumbers 0.9l - per 1pc.
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1999,3.134902,2.450294,15.425245,5.783922,0.58902,0.68402,1.243676,3.525147,8.45098,1.716324,...,,7.323971,11.557549,14.304412,15.812892,8.61701,1.454804,8.450588,0.064461,1.503627
2000,2.673333,2.530049,17.312843,6.268725,0.556912,0.738235,1.121275,3.9425,9.179853,2.083284,...,1.940098,8.219657,12.521618,15.14049,16.610784,9.45951,1.324363,9.283529,0.065931,1.3825
2001,3.011814,2.456324,17.284363,6.392108,0.385392,0.634657,0.539804,4.262304,9.526667,1.858039,...,2.332598,8.724853,12.619363,15.310392,16.90902,9.800392,1.307598,10.262843,0.060735,1.438971
2002,3.223529,2.216912,16.579412,6.135539,0.418578,0.563676,0.287549,4.100882,9.685784,1.847451,...,2.281127,8.22299,11.639412,14.188725,15.961373,8.547647,1.403186,10.274363,0.056176,1.322745
2003,3.299465,1.950343,17.163627,6.012598,0.512549,0.574069,0.409951,4.130441,11.117157,1.715294,...,2.15402,8.122451,11.87402,13.984755,15.504216,8.181863,1.818627,10.446324,0.053382,1.378824
2004,3.956029,1.84848,16.726176,6.390196,0.581078,0.568971,0.576029,4.484118,13.681422,1.690588,...,2.220343,8.83,12.650245,14.700147,16.209461,9.105098,2.206324,11.441667,0.05201,1.61348
2005,3.830049,1.738284,15.726716,6.615931,0.491961,0.426912,0.673088,4.555735,15.042206,1.530833,...,2.295196,8.866569,12.201765,14.38799,15.986667,8.969363,2.932892,11.482549,0.067255,1.703725
2006,3.531176,1.692794,16.168775,6.557206,0.482353,0.568824,0.59098,4.467745,15.861765,1.638578,...,2.37299,8.731961,12.095294,14.325784,16.057598,8.727255,3.163235,10.883824,0.066127,1.598676
2007,3.964216,1.565294,17.688529,6.851667,0.647549,0.736765,0.643578,4.785196,16.501814,1.458235,...,2.762353,9.100637,12.589902,14.702451,16.50701,8.71049,3.197647,11.099167,0.07201,1.678088
2008,3.841618,1.674657,19.051029,7.636667,0.791863,0.764118,0.687941,5.545588,17.692892,1.048088,...,3.107206,10.135637,13.24902,15.429118,17.344853,9.465539,4.438676,12.146373,0.075049,1.978088


### Úloha 2

Kód s využitím `pivot` funkcie pretransformuj na řešení s pomocí `lambda` a `pivot_table`.

**Kód:**
```python
pd.pivot(data=df, index=['province', 'product'], columns=['date'], values=['value'])
```

Analyzuj výsledek rekonstrukce - co můžeš říct o tom, co bylo předáno funkci?

In [31]:
# Tvůj kód zde:
pd.pivot_table(data = df,
              index = ['province', 'product'],
              columns = ['date'],
              values = ['value'],
              aggfunc = lambda x : x)

Unnamed: 0_level_0,Unnamed: 1_level_0,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value,value
Unnamed: 0_level_1,date,1999-01-01,1999-02-01,1999-03-01,1999-04-01,1999-05-01,1999-06-01,1999-07-01,1999-08-01,1999-09-01,1999-10-01,...,2019-03-01,2019-04-01,2019-05-01,2019-06-01,2019-07-01,2019-08-01,2019-09-01,2019-10-01,2019-11-01,2019-12-01
province,product,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
GREATER POLAND,30% tomato concentrate - per 1kg,7.56,7.09,7.32,7.84,6.84,7.41,7.28,8.27,7.23,7.02,...,6.78,5.67,1.14,3.69,9.71,5.78,4.96,6.12,8.33,6.03
GREATER POLAND,Backpacker's canned pork meat - per 300 g,2.92,2.88,2.92,2.85,2.81,3.05,3.20,3.06,2.76,3.08,...,2.97,2.24,2.50,3.06,2.67,3.25,2.54,3.26,2.57,2.87
GREATER POLAND,Hunter's sausage dried - per 1kg,15.50,16.31,15.91,15.80,15.52,16.49,15.92,16.27,16.27,16.42,...,21.59,17.41,21.49,17.95,19.25,19.69,19.45,22.95,18.12,20.48
GREATER POLAND,Italian head cheese - per 1kg,6.14,5.78,5.84,6.00,5.97,6.07,6.02,6.02,6.33,5.67,...,10.13,5.56,8.75,7.90,9.52,7.97,7.56,9.28,8.94,10.67
GREATER POLAND,Masurian barley groats - per 1kg,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,2.44,1.57,3.59,2.33,2.39,1.84,1.96,1.19,1.56,3.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ŁÓDŹ,pork meat with bone (shoulder) - per 1kg,8.07,8.62,8.39,8.28,8.29,8.49,8.21,8.64,8.44,8.11,...,10.24,10.57,10.52,8.79,9.99,9.28,11.14,10.49,9.39,9.67
ŁÓDŹ,"salted herring, non-dressed - per 1kg",0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,5.32,5.28,6.99,4.91,5.94,4.98,7.18,5.36,5.25,7.32
ŁÓDŹ,smoked bacon with ribs - per 1kg,8.01,8.93,9.01,8.12,8.74,8.36,8.91,8.98,8.23,8.94,...,16.80,14.94,12.34,7.98,7.38,7.16,7.71,14.79,7.96,7.67
ŁÓDŹ,white table salt bagged - per 1kg,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00


### Úloha 3

Pomocí `pivot_table` zjisti, jak se vyvíjely průměrné a mediánové ceny produktů v jednotlivých regiónech v průběhu let.

**Nápověda:** Do parametru `aggfunc` lze předat seznam funkcí, např. `['mean', 'median']`

In [37]:
# Tvůj kód zde:
pd.pivot_table(
    data = df,
    index = ['product', 'province'],
    columns = ['year'],
    values = ['value'],
    aggfunc=['mean', 'median']
).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,...,median,median,median,median,median,median,median,median,median,median
Unnamed: 0_level_1,Unnamed: 1_level_1,value,value,value,value,value,value,value,value,value,value,...,value,value,value,value,value,value,value,value,value,value
Unnamed: 0_level_2,year,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
product,province,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
30% tomato concentrate - per 1kg,GREATER POLAND,7.50,7.97,9.06,9.98,9.43,8.32,8.62,8.39,8.49,9.49,...,13.62,14.48,15.30,13.36,10.79,11.33,11.82,11.15,10.58,5.90
30% tomato concentrate - per 1kg,HOLY CROSS,1.68,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
30% tomato concentrate - per 1kg,KUYAVIA-POMERANIA,6.69,2.98,0.00,0.00,0.00,2.92,3.26,2.94,3.59,0.00,...,8.72,5.18,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
30% tomato concentrate - per 1kg,LESSER POLAND,3.14,0.00,3.33,3.39,0.00,3.65,6.39,6.52,6.64,7.50,...,7.62,8.26,8.34,8.78,8.98,8.28,8.85,8.09,9.04,3.20
30% tomato concentrate - per 1kg,LOWER SILESIA,6.04,6.44,6.18,5.55,2.90,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,3.56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
whole pickled cucumbers 0.9l - per 1pc.,SILESIA,2.21,2.41,2.79,2.48,2.29,2.31,2.57,2.25,2.55,2.90,...,3.02,2.71,2.70,2.76,3.13,3.05,3.21,2.96,3.00,1.74
whole pickled cucumbers 0.9l - per 1pc.,SUBCARPATHIA,2.46,2.51,2.58,2.19,1.93,2.26,2.57,2.16,2.17,2.47,...,2.66,2.41,2.58,2.84,3.21,3.20,3.24,3.17,3.56,3.45
whole pickled cucumbers 0.9l - per 1pc.,WARMIA-MASURIA,1.12,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1.27
whole pickled cucumbers 0.9l - per 1pc.,WEST POMERANIA,1.35,0.00,0.00,0.00,1.04,2.36,2.45,1.38,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00


---
## Přehled použitých metod a funkcí

| Metoda/Funkce | Popis |
|---------------|-------|
| `pd.read_csv()` | Načtení dat z CSV souboru do DataFrame |
| `pd.to_datetime()` | Převod sloupce na datetime formát |
| `df.head()` | Zobrazení prvních řádků DataFrame |
| `df.info()` | Zobrazení informací o DataFrame (typy, paměť) |
| `pd.pivot()` | Transformace dat bez agregace, vyžaduje unikátní kombinaci index+columns |
| `pd.pivot_table()` | Kontingenční tabulka s agregací, podporuje duplicitní klíče |
| `df['col'].dt.year` | Extrakce roku z datetime sloupce |