# UK Food CPI (MoM%) Forecast


## Project Summary

This Analysis utilizes the **UK Consumer Price Index (CPI) - Food and Non-Alcoholic Beverages** monthly rae time series. The data was cleaned and **EDA** performed on it, to give a **short-term predicion** using some selected models **(Naive, Seasonal-Naive, SARIMA,** and **Prophet).**
Prediction outputs would be saved and visualized in **Streamlit** app.

## Skills Demonstrated 
- Data Cleaning (Using Official statistics/monhly time series)
- Exploratory Data Analysis **(EDA).**
- Prediction **(Naive, Seasonal, SARIMA,** and **Prophet)**
- Model evaluation **(MAPE, RSME)**
- Logical communication and thinking ()

## 1) Load and clean the dataset to a straightforward data for usability purpose

The dataset will be cleaned and saved into a new CSV file to avoid alteration of the original dataset. The new CSV file will contain date (in proper formats), category and value(MoM%) for the **UK Consumer Price Index (CPI) - Food and Non-Alcoholic Beverages.**

In [1]:
# Importing neccessary libraries
import pandas as pd, numpy as np
import re
import matplotlib.pyplot as plt
from pathlib import Path

In [4]:
# Assigns the original file to a variable
original_file = "series-270925.csv"

df_original = pd.read_csv(original_file, names = ['label', 'value'], header = 0)

print (df_original.head(10))

               label            value
0               CDID             D7JH
1  Source dataset ID             MM23
2            PreUnit              NaN
3               Unit                %
4       Release date       17-09-2025
5       Next release  22 October 2025
6    Important notes              NaN
7           1988 FEB              0.6
8           1988 MAR              0.3
9           1988 APR              0.6


In [15]:
# The re(regex) module is used to ge rows that matches the YYY MON format
mask = df_original['label'].str.match(r"^\d{4}\s+[A-Z]{3}$", na = False) 

# Gets the first index that starts with he proper required data
first_idx = mask.idxmax()
print (f'First index data row = row {first_idx}')

# The work daaframe starts from there
cpi_data = df_original.loc[first_idx:].copy()

First index data row = row 7


In [16]:
# The date column is cleaned using the following code
cpi_data['date'] = pd.to_datetime(cpi_data['label'], format='%Y %b')

# The value column is coverted to a numeric format
cpi_data['value'] = pd.to_numeric(cpi_data['value'], errors = 'coerce')

# The od column attributes labels are dropped
cpi_data = cpi_data.drop(columns = ['label'])

In [17]:
# A new labelling will be inputed to clarify what these series represent
cpi_data['category'] = 'Consumer Food (MoM%)'

# Columns are re arranged to a standard usable format
cpi_data = cpi_data[['date', 'category', 'value']]

# Rows are sorted by date
cpi_data = cpi_data.sort_values('date').reset_index(drop = True)

In [18]:
# A new labelling will be inputed to clarify what these series represent
output_path = 'cleaned_cpi.csv'
cpi_data.to_csv(output_path, index = False)

print (f'Cleaning Completed and saved as, {output_path}')
print (cpi_data.head())

Cleaning Completed and saved as, cleaned_cpi.csv
        date              category  value
0 1988-02-01  Consumer Food (MoM%)    0.6
1 1988-03-01  Consumer Food (MoM%)    0.3
2 1988-04-01  Consumer Food (MoM%)    0.6
3 1988-05-01  Consumer Food (MoM%)    0.3
4 1988-06-01  Consumer Food (MoM%)    0.2
