# Week 9 Lab: Grouping Data
The data for today's lab session came from the [ONS Consumer Price Index (CPIH)](https://www.ons.gov.uk/datasets/cpih01/editions/time-series/versions/63):

> The Consumer Prices Index including owner occupiersâ€™ housing costs (CPIH) and Consumer Prices Index (CPI) are consumer inflation or pure price indices defined as an average measure of change in the prices of goods and services bought within the domestic territory for consumption by households in the UK and foreign visitors to the UK. 
>  - average measure: a single figure that combines, or averages, all the price changes covered 
>  - change: its purpose is to measure how prices change over time rather than the absolute level of prices at a point in time

In this dataset, changes in prices are measured over time. The unit of measurement used in the `cpih` column is the percentage of change from the index value for the same category in 2015.


## CPIH Categories
The aggregate id (or [COICOP](https://en.wikipedia.org/wiki/Classification_of_Individual_Consumption_According_to_Purpose)) of each category of the CPI figure follows a hierarchical structure like the following diagram from the ONS Technical Manual which demonstrates how the prices of apples are aggregated into different categories used in the dataset:




Have a look at the [methodology](https://www.ons.gov.uk/economy/inflationandpriceindices/methodologies/consumerpriceinflationincludesall3indicescpihcpiandrpiqmi) for the CPIH measure of inflation before you contintue so that you can understand how to use the data. You can also read the [technical manual](https://www.ons.gov.uk/economy/inflationandpriceindices/methodologies/consumerpricesindicestechnicalmanual2019) to learn more about how the ONS measures inflation.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Start by importing `cpi-time-series-63.csv`

In [None]:
# Write your code here

## Grouping
Use a `groupby` to calculate the mean value of the `cpih` column for each aggregate in the dataset

In [None]:
# Write your code here

## Pivoting
Plots with multiple lines are created using the wide-data format, each individual variable you want to plot should be on a seperate column.

 - Use a `pivot_table()` to convert the DataFrame from long to wide format. 
   - Use the `"aggregate"` column for the `columns` attribute 
   - Set the `"yyyy-mm"` column as the `index` attribute
   - Se the `values` attribute as `"cpih"`
   - Assign the name `wide` to this pivoted DataFrame.

In [None]:
# Write your code here

Plot all of the columns as a line chart.
- Hint: set the `legend` attribute to `False`
- Remember that the index starts with 2015 values at `100`

In [None]:
# Write your code here

It might be more useful to select some of the data that you can compare. Select the `Overall Index` and `04 Housing, water, electricity, gas and other fuels` and create a line plot comaring the two

In [None]:
# Write your code here

Which of the aggregate measure are most closely correlated with `04 Housing, water, electricity, gas and other fuels`? 
- Use the `corr()` method to calculate the correlation matrix
- Select the row with `"04 Housing, water, electricity, gas and other fuels"` using `loc`
- Sort the values

In [None]:
# Write your code here

## Aggregating
Calculate the yearly mean for the `Overall Index` category.
- Ensure that you are grouping the original DataFrame, not the `wide` one you created in task two
- Start by creating a new `year` column in the DataFrame which extracts the first four characters as a string
- There is a method to automatically group by aggregating dates, but we will cover this later in the course, for now we can stick to string manipulation

In [None]:
# Write your code here

- Select the rows that match `"Overall Index"` from the `"Aggregate"` column
- Group the matching rows using the `"year"` column using a `groupby`
- Select the `"cpih"` column and calculate its mean
- Create a plot of the Series

In [None]:
# Write your code here

## Aggregating and Using a MultiIndex
Group the data by both the `year` column you created in the last exercise, and the `Aggregate` column and calculate the mean of the `cpih` and assign it the name `grouped_means`.

In [None]:
# Write your code here

You can access the mean data for each year from the row ined (ie `["2018"]` will return the means for the year `2018`. Compare the difference between years `2024` and `2015` to show the differences in prices between these years.
- Set the `kind` of the plot to `barh`

In [None]:
# Write your code here

## Percentage Change
We can pivot a DataFrame with a MultiIndex using the `unstack()` method. Use the unstack method on the `grouped_means` DataFrame you created in the last task, applying it to the `aggregate` column.

In [None]:
# Write your code here

Use the pivoted table to calculate the `pct_change` for each column.

In [None]:
# Write your code here

Use this data to create a bar chart which shows the percentage change in the `01 Food and non-alcoholic beverages` column
- Hint: set `kind="bar"` inside the plot

In [None]:
# Write your code here