**Package imports**

In [1]:
# Set up our Notebook with required packages
import pandas as pd
import altair as alt

## Altair has a default limit of 5000 rows for rendering charts (if our dataframe has more than this, we'll get an error when making the chart).
##  To override this limit, we can set the max_rows option.
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

# Seminar: the Long-Run Prices Dataset

### BONUS EXAMPLES

A notebook using the LRPD, described [here](https://cep.lse.ac.uk/pubs/download/occasional/op055.pdf).

</br></br></br></br>


**What is our goal?**

- Take a large dataset, extract and aggregate down to a level of data we can visualise.

Example workflow:
1. Load `prices` and `items` datasets (item description is separated from prices observations to reduce the size of the data files, we use a numerical `item_id` to match between them.)
2. Search the `items` dataset to find an item (or multiple items) that we want to analyse, record the `item_id` value(s).
3. Take the `item_id` value(s) and filter the `prices` dataframe

<br>

### Step 1. Loading data

#### 1a. Load prices dataset 

Note: the file linked to the cloud storage here is the same `db_prices` file available in the DropBox and linked in the lecture slides. Don't worry about the different `.parquet` file type here - it's just another file type like CSV & JSON, and works well with large datasets. The only change is to use `pd.read_parquet()` instead of `read_csv()` - the loaded dataframe is the same.

In [2]:
prices_df = pd.read_parquet('https://autocpi-public.s3.eu-west-2.amazonaws.com/lrpd/db_prices.parquet')

# What columns do we have?
print(prices_df.columns, "\n")

# Get summary statistics and round the results for better readability
prices_df.describe().round()

Index(['quote_date', 'shop_code', 'item_id_raw', 'region', 'price',
       'indicator_box', 'item_id'],
      dtype='object') 



Unnamed: 0,quote_date,shop_code,item_id_raw,region,price,item_id
count,48368958.0,48368958.0,48368973.0,48368958.0,48368958.0,48368973.0
mean,200776.0,477.0,388041.0,7.0,50.0,388398.0
std,1060.0,1532.0,146756.0,3.0,206.0,146672.0
min,198802.0,1.0,210101.0,1.0,0.0,210101.0
25%,199811.0,39.0,212917.0,3.0,1.0,212918.0
50%,200805.0,88.0,430128.0,7.0,5.0,430132.0
75%,201707.0,802.0,510406.0,9.0,20.0,510407.0
max,202510.0,20071.0,640406.0,13.0,44000.0,640406.0


View the dataframe

In [3]:
# Print 5 random rows from the dataframe
prices_df.sample(5)

Unnamed: 0,quote_date,shop_code,item_id_raw,region,price,indicator_box,item_id
27151785,199110.0,61.0,430410,9.0,2.5,,430410
37115597,201807.0,9.0,510419,8.0,3.5,,510419
1765570,200704.0,4.0,210413,2.0,6.59,,210413
32871653,200401.0,937.0,510212,2.0,25.0,C,510212
438350,200209.0,941.0,210202,6.0,0.53,,210202


To make visalising the data easier later, we can convert the `quote_date` column into the standard format. 
> We give the function the current date format, so 202510 is YYYYMM, which corresponds to `%Y%m` in date notation (`%Y`=YYYY, `%m`=MM). The notation is the same as formatting dates in Vega-Lite, and uses [D3 time formats](https://d3js.org/d3-time-format).

In [4]:
# Using pandas `to_datetime` function to convert 'quote_date' column to datetime format
prices_df['date'] = pd.to_datetime(prices_df['quote_date'], format='%Y%m')
prices_df.head(1)

Unnamed: 0,quote_date,shop_code,item_id_raw,region,price,indicator_box,item_id,date
0,200102.0,808.0,210101,12.0,0.35,Q,210101,2001-02-01


<br>
<br>
<br>

#### **1b.** Load items data

In [5]:
items_df = pd.read_parquet('https://autocpi-public.s3.eu-west-2.amazonaws.com/lrpd/db_item.parquet')

# Print the number of rows (i.e. products)
print(f"Number of rows in items_df: {len(items_df):,} \n")

items_df.head()

Number of rows in items_df: 1,387 



Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
0,210101,LARGE LOAF-WHITE-SLICED-800G,198802,200401,36039
1,210102,LARGE LOAF-WHITE-UNSLICED-800G,198802,202510,56917
2,210105,LARGE WHOLEMEAL LOAF-UNSLICED,198802,200301,27161
3,210106,SIX BREAD ROLLS-WHITE/BROWN,198802,202510,67469
4,210107,"BROWN LOAF,400G,SLICED-GRAN",198903,200401,29361


The items data contains a description for each `item_id`, along with information on number of price observations, the earliest and last date that product has a price record.

There are 1,387 (as of Nov 2025) unique items in the long-run price dataset (keeping this out of the main prices dataset avoids unneccesarily duplicating this information for every price observation).

> NOTE: Statistical agencies update the basket of goods used to sample inflation as consumer spending habits change over time - often the result of technological change. This is why not all items have data up to today.

**Optional** Filter to only 'current' items, i.e. their latest observation matches 

In [6]:
# Filter items_df to only include rows where 'date_quote_e' is equal to the maximum value of 'date_quote_e'
# (By searching for the max, we don't need to hard-code a date, which may become outdated over time)
items_df_current = items_df[items_df['date_quote_e'] == items_df['date_quote_e'].max()].copy()
# same as items_df[items_df['date_quote_e'] == 202510]

print(f"Number of current items: {len(items_df_current):,} \n")
items_df_current.head()

Number of current items: 584 



Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
1,210102,LARGE LOAF-WHITE-UNSLICED-800G,198802,202510,56917
3,210106,SIX BREAD ROLLS-WHITE/BROWN,198802,202510,67469
8,210111,WHITE SLICED LOAF BRANDED 750G,200402,202510,36848
10,210113,WHOLEMEAL SLICED LOAF BRANDED,200402,202510,35696
11,210114,CHILLED GARLIC BREAD,201002,202510,41672


<br>

#### **1c.** Searching for a product

Look in the items data for some products we're interested in.

In [7]:
# Print unique values in the 'description' column
items_df_current['description'].unique()

array(['LARGE LOAF-WHITE-UNSLICED-800G', 'SIX BREAD ROLLS-WHITE/BROWN',
       'WHITE SLICED LOAF BRANDED 750G', 'WHOLEMEAL SLICED LOAF BRANDED',
       'CHILLED GARLIC BREAD', 'WRAP / TORTILLA PACK 6-8',
       'GLUTEN FREE BREAD LF 300-550G', 'FLOUR-SELF-RAISING-1.5KG',
       'DRY SPAGHETTI OR PASTA 500G', 'CORN SNACK SINGLE PACK MAX 50G',
       'BASMATI RICE 500G-1KG', 'BREAKFAST CEREAL 1',
       'BREAKFAST CEREAL 2', 'CEREAL BAR', 'HOT OAT CEREAL',
       'RICE MICRO POUCH/TRAY 220-280G', 'BREAKFAST CEREAL GLUTEN FREE',
       'COUSCOUS PLAIN/FLAVOURED', 'RICE CAKES PACK 100-180G',
       'CREAM CRACKERS PACK 200G-300G', 'PLAIN BISCUITS-200-300G',
       'WHOLE SPONGE CAKE NOT FROZEN', 'PACK OF 5-6 INDIVIDUAL CAKES',
       'BISCUITS HALF CHOC 260-400G', 'CRUMPETS PACK 6-9 SPEC NUMBER',
       'HOME KILLED BEEF-LEAN MINCE KG', 'HOME KLD BEEF-RUMP/POPES STEAK',
       'FROZEN BEEFBURGERS PACK OF 4', 'BEEF ROASTING JOINT PER KG',
       'HK LAMB LOIN CHOP/STEAK PER KG', 'HOME KILL

It's hard to check hundreds of values like this. Instead, we can perform a string search for rows that match some text we chose.

In [37]:
# Use the 'str.contains()' function to filter the 'description' column for the keyword 'milk', 
# Adding `case=False` makes the search case-insensitive.

items_df_current[items_df_current['description'].str.contains('frozen', case=False)]

Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
43,210320,WHOLE SPONGE CAKE NOT FROZEN,200002,202510,39549
53,210414,FROZEN BEEFBURGERS PACK OF 4,199202,202510,50005
118,211106,FROZEN PRAWNS PER KG,200402,202510,36945
125,211210,FROZEN FISH FINGERS 8-12 PK,200402,202510,53776
126,211211,FROZEN BREAD/BAT FISH 400-550G,201002,202510,40004
246,212405,FROZEN CHIPS 900G-1.5KG,199202,202510,89845
288,212609,FROZEN GARDEN PEAS 800G-1KG,199402,202510,61860
291,212612,FROZEN PRE-PREPARED VEGTABLES,202102,202510,17685
332,212809,BERRIES FROZEN PACK,202302,202510,10118
367,212943,YORKSHIRE PUDDING FROZEN,202202,202510,9752


<br>

### Step 2. Aggregating

Before plotting, we need to filter aggregate the data.

#### 2a. Filter prices on `item_id`.

Let's chart the price of frozen prawns - from the search above, we know the item ID is `211106`

<br>

How do we filter?
`df[df['columnx'] == 'xyz']`:
- `df['columnx'] == 'xyz'` -> Call a column and test against a condition (this returns a column of true/false values)
- `df[...]` -> Wrap the test with the main dataframe to filter to rows where our condition is true.

In [None]:
# Filter prices_df to only include rows where 'item_id' is equal to 211106 and save to new dataframe
prawn_prices = prices_df[prices_df['item_id'] == 211106].copy()

# NOTE: You might also see dataframe filtering / querying done using the `.query()` method, e.g.:
# prawn_prices = prices_df.query('item_id == 211106').copy()

# View
prawn_prices

Unnamed: 0,quote_date,shop_code,item_id_raw,region,price,indicator_box,item_id,date
3901601,201205.0,814.0,211106,9.0,22.500000,,211106,2012-05-01
3901602,201610.0,961.0,211106,6.0,9.250000,,211106,2016-10-01
3901603,202006.0,941.0,211106,3.0,19.959999,,211106,2020-06-01
3901604,201104.0,802.0,211106,12.0,17.450001,,211106,2011-04-01
3901605,202011.0,953.0,211106,5.0,15.280000,S,211106,2020-11-01
...,...,...,...,...,...,...,...,...
3938541,201909.0,941.0,211106,7.0,19.959999,,211106,2019-09-01
3938542,200805.0,803.0,211106,3.0,6.730000,,211106,2008-05-01
3938543,201810.0,802.0,211106,3.0,19.440001,,211106,2018-10-01
3938544,202208.0,953.0,211106,5.0,18.330000,R,211106,2022-08-01


What happens if we visualise this?

In [39]:
alt.Chart(prawn_prices).mark_point().encode(
    x=alt.X('date:T'),
    y=alt.Y('price:Q')
)

Trying to chart all observations (nearly 37000 for frozen prawns), is going to result in a messy chart. 

<br>
<br>

#### 2b. Aggregate

What are you interested in? If we just care about the average price, we could calculate a mean/median across all the monthly observations. If we care about a distribution, we might want to calculate some percentile. 

Let's calculate a median monthly price. To do this, we need to use the `.groupby()` method (to group observations by the monthly date), then call an aggregation method to do something with each set of monthly prices.

In [42]:
# Group by `date`, calculate mean `price`. Reset index to turn the grouped index back into a column.
avg_prawn_prices = prawn_prices.groupby(['date']).agg({'price': 'mean'}).reset_index()
avg_prawn_prices

Unnamed: 0,date,price
0,2004-02-01,10.657721
1,2004-03-01,10.847142
2,2004-04-01,10.319797
3,2004-05-01,10.100473
4,2004-06-01,10.010844
...,...,...
256,2025-06-01,18.726025
257,2025-07-01,18.371227
258,2025-08-01,18.500553
259,2025-09-01,18.120621


Since we selected one item and only performed one aggregation, we don't need any additional steps before charting our data -- such as transforming the data to long (`tidy`) format.

<br>

### Step 3. Visualise

In [None]:
chart = alt.Chart(avg_prawn_prices).mark_line().encode(
    x=alt.X('date:T'),
    y=alt.Y('price:Q')
).properties(
    width=400,
    height=250,
    title=alt.TitleParams(
        text="Frozen Prawns: Price History",
        subtitle="Mean average monthly price"
    )
)

chart.display()

# chart.save("w8_chart_prawn_prices.json")

<br>
<br>
<br>

---

### Bonus content: Additional examples of data aggregation and charting

### Chart examples

Multiple examples:
1. One item ID, two aggregations.

To help searching for items without scrolling back to top

In [35]:
# If we add '|' to the string, it acts as an OR operator, so we can search for multiple keywords at once.
items_df_current[items_df_current['description'].str.contains('bread', case=False)]

Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
3,210106,SIX BREAD ROLLS-WHITE/BROWN,198802,202510,67469
11,210114,CHILLED GARLIC BREAD,201002,202510,41672
13,210116,GLUTEN FREE BREAD LF 300-550G,202402,202510,2205
126,211211,FROZEN BREAD/BAT FISH 400-550G,201002,202510,40004


In [28]:
items_df_current[items_df_current['description'].str.contains('frozen', case=False)]

Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
43,210320,WHOLE SPONGE CAKE NOT FROZEN,200002,202510,39549
53,210414,FROZEN BEEFBURGERS PACK OF 4,199202,202510,50005
118,211106,FROZEN PRAWNS PER KG,200402,202510,36945
125,211210,FROZEN FISH FINGERS 8-12 PK,200402,202510,53776
126,211211,FROZEN BREAD/BAT FISH 400-550G,201002,202510,40004
246,212405,FROZEN CHIPS 900G-1.5KG,199202,202510,89845
288,212609,FROZEN GARDEN PEAS 800G-1KG,199402,202510,61860
291,212612,FROZEN PRE-PREPARED VEGTABLES,202102,202510,17685
332,212809,BERRIES FROZEN PACK,202302,202510,10118
367,212943,YORKSHIRE PUDDING FROZEN,202202,202510,9752


In [25]:
items_df_current[items_df_current['description'].str.contains('chicken', case=False)]

Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
77,210905,FRESH/CHILLED CHICKEN PER KG,198903,202510,105251
82,210910,FRESH BONELESS CHICKEN BREAST,200302,202510,65366
103,211019,FROZ CHICKEN NUGGETS 220-600G,200502,202510,28024
109,211026,CHICKEN KIEV 2 PACK 240-325G,201502,202510,26070
427,220326,TAKEAWAY CHICKEN & CHIPS,201202,202510,19909


In [51]:
items_df_current[items_df_current['description'].str.contains('beef', case=False)]

Unnamed: 0,item_id,description,date_quote_s,date_quote_e,n_obs
48,210403,HOME KILLED BEEF-LEAN MINCE KG,198802,202510,121745
50,210406,HOME KLD BEEF-RUMP/POPES STEAK,198802,202510,117178
53,210414,FROZEN BEEFBURGERS PACK OF 4,199202,202510,50005
55,210416,BEEF ROASTING JOINT PER KG,202002,202510,12885


In [None]:
# 210106 Bread, 211210 frozen fish, 212405 frozen chips, 210905 Fresh chicken, 210406 Beef Steak

<br>
<br>
<br>

### Example 1. Mean & Median

Item: 220107 - "PUB -HOT MEAL"

In [None]:
# 1. Filter prices
meal_prices = prices_df[prices_df['item_id'] == 220107].copy()

# 2. Group by date, and calculate mean and median price
meal_price_stats = meal_prices.groupby('date').agg({'price': ['mean', 'median']}).reset_index()

# 3. Renaming columns (to flatten MultiIndex columns names)
meal_price_stats.head()   # Uncomment to check columns
meal_price_stats.columns = ['date', 'mean', 'median']       # NOTE: order matters here and must match number of columns

# 4. Transform the dataframe to long format using `pd.melt()`
meal_price_stats_long = pd.melt(meal_price_stats, id_vars=['date'], value_vars=['mean', 'median'], var_name='statistic', value_name='price')
meal_price_stats_long

Unnamed: 0,date,statistic,price
0,1990-02-01,mean,2.463983
1,1990-03-01,mean,2.451328
2,1990-04-01,mean,2.475906
3,1990-05-01,mean,2.526981
4,1990-06-01,mean,2.521971
...,...,...,...
837,2025-06-01,median,12.500000
838,2025-07-01,median,12.500000
839,2025-08-01,median,12.500000
840,2025-09-01,median,12.495000


**Visualise (long-format)**

In [63]:
alt.Chart(meal_price_stats_long).mark_line().encode(
    x=alt.X('date:T').title(''),
    y=alt.Y('price:Q').title('Price (GBP)'),
    color=alt.Color('statistic:N')
).properties(
    width=400,
    height=250,
    title={
        "text": "Pub Meal Prices",
        "subtitle": ["Mean and median prices for pub Meals", "Source: ONS microdata via Davies (2021)"],
        "fontSize": 16
    }
)

<br>
<br>

**Visualise (wide-format)**

If we kept the data in wide-format, it is still possible to visualise it, but we need to use **layers**:

In [None]:
# NOTE, we didn't overwrite the wide-format `meal_price_stats` dataframe, so we can use it here.
c1_mean = alt.Chart(meal_price_stats).mark_line(color='blue').encode(
    x=alt.X('date:T'),
    y=alt.Y('mean:Q')
)

c2_median = alt.Chart(meal_price_stats).mark_line(color='red').encode(
    x=alt.X('date:T'),
    y=alt.Y('median:Q')
)

c1_mean + c2_median     # Layering is simple with the `+` operator. Or use `alt.layer(c1_mean, c2_median)`

Notice how we have to manually set colours of our individual lines, as there is nothing to *encode* colour on. Because of this, we also don't get a legend to distinguish the lines. As such, better to plot from a single long-format dataset where possible, and use layering when we want to visualise different data structures on the same chart.

<br>

---

<br>
<br>
<br>

### Example 2. Two items.

Items:
- 220301 - "FISH & CHIPS TAKEAWAY"
- 220316 - "PIZZA TAKEAWAY OR DELIVERED"

Our flow is the same, with a couple differences in the filtering and groupby aggregation:
- Using `.isin()` we filter prices based on a list of item_ids. 
- Group by date **and** item id, before calling our price aggregation.
    - So each *group* contains price observations for a single item_id in a single month. We then calculcate the mean of prices in each of these groups.

In [44]:
# 1. Filter prices based on TWO item IDs. 
# (we can use `.isin()` to filter by checking against a multiple values in list)
takeaway_prices = prices_df[prices_df['item_id'].isin([210106, 211210, 212405, 210905, 210403])].copy()

# 2. Group by date and item_id, calculate mean price, store in new dataframe
takeaway_price_stats = takeaway_prices.groupby(['date', 'item_id']).agg({'price': 'mean'}).reset_index()

# We have columns 'date', 'item_id', 'price'
takeaway_price_stats

Unnamed: 0,date,item_id,price
0,1988-02-01,210106,0.477151
1,1988-02-01,210403,2.806472
2,1988-03-01,210106,0.469620
3,1988-03-01,210403,2.891031
4,1988-04-01,210106,0.474181
...,...,...,...
2003,2025-10-01,210106,1.272340
2004,2025-10-01,210403,11.601636
2005,2025-10-01,210905,4.610573
2006,2025-10-01,211210,3.027621


**Visualise**

In [45]:
alt.Chart(takeaway_price_stats).mark_line().encode(
    x=alt.X("date:T"),
    y=alt.Y("price"),
    color=alt.Color("item_id:N")
).properties(
    width=400,
    height=250,
    title=alt.TitleParams(
        text="Takeaway Items: Price History",
        subtitle=["Mean average price", "Source: ONS microdata via Davies (2021)"],
        anchor="start",
        frame='group'
    )
)

<br>

**How do we add replace our item_id with the descriptions?**
We could manually set the values in the chart spec, but better to either:
1. Merge the `description` from items_df into our filtered prices
2. Set our own values for each item_id. And add new column mapping item_id to some description we've set.

In [47]:
# Option 1. Call `.merge` method on our filtered dataframe to add descriptions. 
#  match `on` item_id column in each dataframe. `how='left'` to keep all rows in takeaway_price_stats.
takeaway_price_stats.merge(items_df[['item_id', 'description']], on='item_id', how='left')

Unnamed: 0,date,item_id,price,description
0,1988-02-01,210106,0.477151,SIX BREAD ROLLS-WHITE/BROWN
1,1988-02-01,210403,2.806472,HOME KILLED BEEF-LEAN MINCE KG
2,1988-03-01,210106,0.469620,SIX BREAD ROLLS-WHITE/BROWN
3,1988-03-01,210403,2.891031,HOME KILLED BEEF-LEAN MINCE KG
4,1988-04-01,210106,0.474181,SIX BREAD ROLLS-WHITE/BROWN
...,...,...,...,...
2003,2025-10-01,210106,1.272340,SIX BREAD ROLLS-WHITE/BROWN
2004,2025-10-01,210403,11.601636,HOME KILLED BEEF-LEAN MINCE KG
2005,2025-10-01,210905,4.610573,FRESH/CHILLED CHICKEN PER KG
2006,2025-10-01,211210,3.027621,FROZEN FISH FINGERS 8-12 PK


In [None]:
# Option 2. Define our own mapping of item_id to description, and add new column.
item_labels = {210106: 'Bread', 211210: 'frozen fish', 212405: 'frozen chips', 210905: 'Fresh chicken', 210403: 'Beef'} 

# Use the `.map()` method to create new column 'label' based on mapping dictionary. It finds the value in the dictionary for each item_id.
takeaway_price_stats['label'] = takeaway_price_stats['item_id'].map(item_labels)
takeaway_price_stats

Unnamed: 0,date,item_id,price,label
0,1988-02-01,210106,0.477151,Bread
1,1988-02-01,210403,2.806472,Beef
2,1988-03-01,210106,0.469620,Bread
3,1988-03-01,210403,2.891031,Beef
4,1988-04-01,210106,0.474181,Bread
...,...,...,...,...
2003,2025-10-01,210106,1.272340,Bread
2004,2025-10-01,210403,11.601636,Beef
2005,2025-10-01,210905,4.610573,Fresh chicken
2006,2025-10-01,211210,3.027621,frozen fish


<br>

Visualise again, encoding colour on our new label / description column

In [53]:
alt.Chart(takeaway_price_stats).mark_line().encode(
    x=alt.X("date:T"),
    y=alt.Y("price"),
    color=alt.Color("label:N")
).properties(
    width=400,
    height=250,
    title=alt.TitleParams(
        text="Takeaway Items: Price History",
        subtitle=["Mean average monthly price", "Source: ONS microdata via Davies (2021)"],
        anchor="start",
        frame='group'
    )
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


<br>
<br>

Improving the chart 1.

In [63]:
alt.Chart(takeaway_price_stats).transform_filter(
    # Filter to after 2005
    alt.datum.date >= pd.to_datetime('2005-01-01')  # Convert string to datetime format for comparison
).mark_line().encode(
    x=alt.X("date:T"),
    y=alt.Y("price"),
    color=alt.Color("label:N")
).properties(
    width=400,
    height=250,
    title=alt.TitleParams(
        text="Takeaway Items: Price History",
        subtitle=["Mean average monthly price", "Source: ONS microdata via Davies (2021)"],
        anchor="start",
        frame='group'
    )
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [61]:
lines = alt.Chart(takeaway_price_stats).transform_filter(
    # Filter to after 2005
    alt.datum.date >= pd.to_datetime('2005-01-01')  # Convert string to datetime format for comparison
)mark_line().encode(
    x=alt.X("date:T").title('').axis(grid=False),
    y=alt.Y("price").title('').axis(
        labelExpr="'£' + datum.value"
    ),
    color=alt.Color("label:N").legend(title='').scale(
        range=['#70B0FA', '#F54927']  # Custom colors for lines
    )
).properties(
    width=400,
    height=250,
    title=alt.TitleParams(
        text="Takeaway Items: Price History",
        subtitle=["Mean average monthly price", "Source: ONS microdata via Davies (2021)"],
        anchor="start",
        frame='group'
    )
).configure_view(
    strokeWidth=0  # Remove border around chart area
)

lines.display()

SyntaxError: invalid syntax (1749315355.py, line 4)

<br>
<br>

Improving the chart 2.

In [64]:
### Same as before
lines = alt.Chart(takeaway_price_stats).transform_filter(
    # Filter to after 2005
    alt.datum.date >= pd.to_datetime('2005-01-01')  # Convert string to datetime format for comparison
).mark_line().encode(
    x=alt.X("date:T").title('').axis(grid=False),
    y=alt.Y("price:Q").title('').axis(
        labelExpr="'£' + datum.value"
    ),
    color=alt.Color("label:N").legend(None).scale(  # NEW, set legend to None
        range=['#70B0FA', '#F54927']  # Custom colors for lines
    )
).properties(
    width=400,
    height=250,
    title=alt.TitleParams(
        text="Takeaway Items: Price History",
        subtitle=["Mean average monthly price", "Source: ONS microdata via Davies (2021)"],
        anchor="start",
        frame='group'
    )
)


### NEW

# Add a text layer with our layer values. Using an aggregate, we'll place the text at the end of the lines
text = lines.mark_text(
    align='left',
    dx=5,  # Nudges text to right so it doesn't overlap with line endpoint
    fontSize=12,
    fontWeight='bold'
).encode(
    x=alt.X("date:T").aggregate('max'),
    y=alt.Y("price:Q").aggregate({'argmax': 'date'}),
    text=alt.Text('label:N')
)

chart = (lines + text).configure_view(
    strokeWidth=0  # Remove border around chart area (have to applied config after layering chart)
)
chart.display()

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
