# London Housing Market Analysis (1995â€“2025)

## Project Overview

This project analyses the evolution of the London housing market over the last three decades using official UK House Price Index (HPI) data (https://data.london.gov.uk/dataset/uk-house-price-index-2zwx6/).

The objective is to:
- Understand long-term price trends across London and the UK
- Compare property types and regional performance
- Evaluate the relationship between transaction volume and property prices
- Identify boroughs with the highest long-term price appreciation

The analysis is designed to demonstrate **practical data analysis skills** using Python, beyond theoretical coursework.


## Data Sources

The dataset is provided by the UK House Price Index (HPI) and contains multiple Excel sheets:

- **By type**:  
  Monthly average prices and price indices by property type (Detached, Flat, etc.) for London and the UK.

- **Average price**:  
  Monthly average property prices for each London borough.

- **Sales Volume**:  
  Monthly transaction volumes for each London borough.

Each sheet has a different structure, requiring custom preprocessing before analysis.


## Data Cleaning & Preparation Strategy

Before analysis, the data undergoes a structured cleaning process:

1. Column names are standardised to remove whitespace inconsistencies.
2. Dates are normalised:
   - The **By type** sheet requires combining Year and Month into a datetime object.
   - Other sheets already contain a Date column but require parsing and validation.
3. Non-numeric values (e.g. codes, placeholders) are coerced into NaN and removed.
4. Data is reshaped from wide to long format to enable flexible grouping and merging.
5. Only valid observations with complete price and volume information are retained.

This ensures consistency, reproducibility, and analytical reliability.


## Construction of the Unified London Dataset

To analyse borough-level dynamics, average prices and sales volumes are merged into a single dataset:

- Merge keys: **Date** and **Borough**
- Merge type: inner join to ensure aligned observations
- Final variables:
  - Date
  - Borough
  - Average_Price
  - Sales_Volume

This unified dataset forms the core of the exploratory and statistical analysis.


## Price vs Transaction Volume Relationship

To investigate the relationship between housing prices and transaction activity:

- Extremely large aggregates (e.g. Greater London totals) are excluded by filtering unusually high sales volumes.
- A second-order polynomial regression is used to capture potential non-linear behaviour.
- Scatter transparency is applied to reduce overplotting.

This approach allows a clearer interpretation of market dynamics without introducing misleading linear assumptions.


## Correlation Analysis

Correlation is computed at two levels:

- **Global correlation**:  
  Measures the overall relationship between transaction volume and average price across all boroughs.

- **Borough-level correlation**:  
  Identifies boroughs where price movements are most sensitive to changes in transaction volume.

It is important to note that correlation does not imply causation; results are interpreted as descriptive indicators rather than predictive signals.


## London vs UK Housing Price Comparison

This section compares London housing prices with the UK average:

- Detached properties and flats are analysed separately.
- UK prices are treated as a benchmark reference.
- Long-term trends are prioritised over short-term volatility.

The goal is to highlight structural price divergence rather than cyclical fluctuations.


## Long-Term Price Appreciation by Borough

To evaluate long-term investment performance:

1. Prices are extracted for the earliest and latest available months.
2. Percentage growth is calculated for each borough.
3. Boroughs are ranked based on total price appreciation.

This approach provides an intuitive and transparent measure of long-term housing market performance.


## Key Insights & Limitations

### Key Insights
- London property prices have significantly outperformed the UK average over the long term.
- Certain boroughs exhibit exceptionally strong appreciation driven by structural changes.
- Transaction volume shows a weak but non-linear relationship with prices.

### Limitations
- Inflation is not adjusted for; all prices are nominal.
- Borough boundary changes over time are not considered.
- The analysis is descriptive and not intended for forecasting.

Future work could include inflation adjustment, affordability metrics, or predictive modelling.


## Conclusion

This project demonstrates an end-to-end data analysis workflow:

- Data ingestion and cleaning
- Feature engineering and reshaping
- Exploratory data analysis
- Statistical interpretation
- Professional visual communication

It complements formal training by showcasing practical analytical reasoning and implementation using Python.
