# The Challenge Questions

Firstly, I will need to load my cleaned data into this notebook.

In [8]:
import polars as pl
# load parquet into clean_df
clean_df = pl.read_parquet("../data/clean_property_data.parquet")
# validate that has worked
clean_df

id,price,date,postcode,property_type,new,duration,paon,saon,street,locality,town_city,district,county,lat,long,year,postal_sector
str,f64,date,str,str,str,str,str,str,str,str,str,str,str,f64,f64,i32,str
"""{D93B27B1-CBD4…",140000.0,2022-02-16,"""SK16 4DT""","""Terraced""","""Old Build""","""Freehold""","""70""","""""","""CHAPEL STREET""","""""","""DUKINFIELD""","""TAMESIDE""","""GREATER MANCHE…",53.477466,-2.091735,2022,"""SK16 4"""
"""{2ACACE8D-02B4…",278000.0,2024-11-29,"""SK6 1QW""","""Semi Detached""","""Old Build""","""Freehold""","""1""","""""","""BRIAR GROVE""","""WOODLEY""","""STOCKPORT""","""STOCKPORT""","""GREATER MANCHE…",53.427355,-2.102642,2024,"""SK6 1"""
"""{01EB45EF-F1B0…",345448.0,2022-02-11,"""M1 2EY""","""Flat""","""New Build""","""Leasehold""","""72""","""FLAT 902""","""CHAPELTOWN STR…","""""","""MANCHESTER""","""MANCHESTER""","""GREATER MANCHE…",53.478898,-2.224708,2022,"""M1 2"""
"""{879537F9-FB6B…",93000.0,2004-09-22,"""BL8 2RR""","""Terraced""","""Old Build""","""Leasehold""","""43""","""""","""NEWBOLD STREET…","""BURY""","""BURY""","""BURY""","""GREATER MANCHE…",53.594024,-2.318071,2004,"""BL8 2"""
"""{7011B10A-2B91…",80000.0,2018-05-03,"""BL3 4HE""","""Terraced""","""Old Build""","""Freehold""","""243""","""""","""WILLOWS LANE""","""""","""BOLTON""","""BOLTON""","""GREATER MANCHE…",53.56682,-2.458227,2018,"""BL3 4"""
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""{B83BB2E2-7A4C…",94130.0,2013-05-31,"""M14 4EZ""","""Terraced""","""Old Build""","""Freehold""","""34""","""""","""GREAT SOUTHERN…","""""","""MANCHESTER""","""MANCHESTER""","""GREATER MANCHE…",53.454883,-2.233151,2013,"""M14 4"""
"""{42A5A709-54F7…",142000.0,2016-09-30,"""BL4 8NU""","""Detached""","""Old Build""","""Leasehold""","""35""","""""","""GREENMOUNT PAR…","""KEARSLEY""","""BOLTON""","""BOLTON""","""GREATER MANCHE…",53.54187,-2.375203,2016,"""BL4 8"""
"""{98C75471-F554…",265000.0,2019-10-04,"""M16 0HU""","""Terraced""","""Old Build""","""Leasehold""","""70""","""""","""WARWICK ROAD S…","""""","""MANCHESTER""","""TRAFFORD""","""GREATER MANCHE…",53.452352,-2.281641,2019,"""M16 0"""
"""{43A28C2A-CC22…",74950.0,2001-11-23,"""M15 5TE""","""Terraced""","""Old Build""","""Leasehold""","""15""","""""","""OLD YORK STREE…","""HULME""","""MANCHESTER""","""MANCHESTER""","""GREATER MANCHE…",53.466086,-2.253427,2001,"""M15 5"""


### Q1. What are the top 10 most expensive detached homes sold as Freehold in Manchester City after 2010?

The goal of this question is to find the top 10 most expensive detached properties sold as Freehold in Manchester City after 2010. I will filter the dataframe for detached, freehold properties, in manchester after 2010. Then I will sort price by decsending and limit by 10. 

(think about how i want to present this finding, what data is relevant)

In [44]:

# Filter for the constraints given in the question
filtered_df = clean_df.filter(
    (pl.col("property_type") == "Detached") &
    (pl.col("duration") == "Freehold") &
    (pl.col("town_city").str.to_lowercase() == "manchester") &
    (pl.col("year") > 2010)
)

# Sort by price and show top 10
top_10_df = filtered_df.sort("price", descending=True).head(10)
top_10_df

q1_clean = top_10_df.select([
    "price", "date", "postcode", "street", "town_city", "district", "county", "paon", "id"
])
q1_clean

q1_clean.write_csv("../data/q1_answer.csv")



### Q2. What are the top 5 most expensive postal sectors (postal sectors are the section of the postal code after the area sector: examples include M1 3, M1 4) in the Salford district between March 2012 and September 2015?

To answer this question I will filter the data set for isolated properties in the Salford district between the dates March 1st 2012 and September 30th 2015.  I will group by the derived postal sector column I created during cleaning and calculate the average price per sector, rounded to two decimal places. Finally, return the top 5 sectors. 

This outlines the most expensive postal sector in Salford by average price during the timeframe provided. 

used cast(str) method to turn float into str and map_elements(lamba s: f"£{s}") to prepend the pound sign. Making it display ready.

In [45]:

from datetime import date

# Filter for Salford district and date range
salford_df = clean_df.filter(
    (pl.col("district").str.to_lowercase() == "salford") &
    (pl.col("date") >= date(2012, 3, 1)) &
    (pl.col("date") <= date(2015, 9, 30))
)

# Group by postal sector and calculate average price
top_sectors = (
    salford_df.group_by("postal_sector")
    .agg(pl.col("price").mean().round(2).cast(str).map_elements(lambda s: f"£{s}").alias("avg_price"))
    .sort("avg_price", descending=True)
    .head(5)
)

top_sectors
#top_sectors.write_csv("../data/q2_answer.csv")


postal_sector,avg_price
str,str
"""M30 0""","""£97422.03"""
"""M5 5""","""£90654.8"""
"""M50 1""","""£87926.18"""
"""M38 0""","""£86810.21"""
"""M6 6""","""£85652.37"""


### Q3: Where in Greater Manchester Are the Most Residential Homes Being Built Over the Past 10 Years?

To address this question, I filtered the dataset for transactions between **2014 and 2024**, focusing on properties marked as `"New Build"` in the `new` column. I excluded `"Other"` property types to approximate residential homes. Grouping by `district` and counting the number of new build transactions provides a reasonable proxy for where residential construction has been most active across Greater Manchester over the past decade.


In [38]:

from datetime import date

# Filter for new residential builds sold between 2014 and 2024
new_builds_10yr = clean_df.filter(
    (pl.col("date") >= date(2014, 1, 1)) &
    (pl.col("date") <= date(2024, 12, 31)) &
    (pl.col("new") == "New Build") &
    (pl.col("property_type") != "Other")
)

# Count new build transactions per district
top_build_districts = (
    new_builds_10yr.group_by("district")
    .agg(pl.len().alias("new_build_sales"))
    .sort("new_build_sales", descending=True)
)

top_build_districts.head(10)

top_build_districts.write_csv("../data/q3_answer.csv")


### Q4: What was the most expensive non-residential property sold in Greater Manchester in 2017? Find and share context about this sale.

For this question, I focused on transactions from **2017**, specifically looking at non-residential properties categorized as "Other" in the Greater Manchester area. I sorted the prices in descending order and identified "The Printworks" as the top property. After conducting some research, I discovered that it was sold to DTZ investors in 2017 for £108 billion.

In [40]:
from datetime import date

# Filter for non-residential properties sold in Greater Manchester in 2017
non_res_2017 = clean_df.filter(
    (pl.col("year") == 2017) &
    (pl.col("county").str.to_lowercase() == "greater manchester") &
    (pl.col("property_type") == "Other (Non Residential)")
)

# Find the most expensive sale
top_non_res = (
    non_res_2017.sort("price", descending=True)
    .head(1)
)

q4_answer = top_non_res.select([
    "price", "date", "postcode", "street", "paon", "town_city", "district"
])

q4_answer.write_csv("../data/q4_answer.csv")


### Q5: Which Residential Building Types Are Experiencing the Most Significant Proportional Increase in Prices?

This question is looking at trend analysis. It's not focused on which property type is most expensive but instead which types have grown most in value, it's about relative change, not price. 

To address this question, I Excluded `"Other"` property types to concentrate solely on residential buildings, grouped the data by `property_type` and `year` to compute the **average price per type for each year**, pivoted the data for a clearer comparison of prices across different years, measured the **proportional change** in average price between the earliest and most recent years, and sorted the outcomes to determine which building types experienced the largest relative growth.

This analysis highlights which residential categories, such as `"Flat"`, `"Detached"`, and `"Terraced"` have appreciated the most over time.

In [46]:

# Filter out non-residential properties
res_df = clean_df.filter(pl.col("property_type") != "Other (Non residential)")

# Group by property type and year, calculate average price
avg_price_by_type_year = (
    res_df.group_by(["property_type", "year"])
    .agg(pl.col("price").mean().alias("avg_price"))
)

# Pivot to wide format
pivoted = avg_price_by_type_year.pivot(
    values="avg_price",
    index="property_type",
    columns="year"
)


# Extract year columns (they may be strings!)
year_cols = [col for col in pivoted.columns if str(col).isdigit()]
year_cols = sorted([int(col) for col in year_cols])

# Calculate proportional change
start_year, end_year = year_cols[0], year_cols[-1]

price_change = pivoted.with_columns(
    (
        (pl.col(str(end_year)) - pl.col(str(start_year))) / pl.col(str(start_year))
    ).alias("proportional_change")
)

# Sort and display
price_change.sort("proportional_change", descending=True).select(["property_type", "proportional_change"])

price_change.write_csv("../data/q5_answer.csv")
price_change



property_type,2021,2013,2011,2018,2014,1997,2019,2007,2000,2020,2022,2001,2003,2002,2008,2016,2009,2006,1999,2004,2017,2005,2012,2023,2015,2024,2010,1996,1998,proportional_change
str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""Terraced""",171554.792736,110705.145826,106783.707825,138342.397906,113051.850552,35322.560875,141784.167079,116089.159481,41173.366392,152571.167637,185396.111002,44692.734081,190780000.0,51891.311015,114855.329225,530070000.0,108498.559644,107070.110059,38407.736305,81780.275771,131200.037207,94607.465121,109042.616306,186512.844729,120269.148806,193688.821384,108701.984397,33421.896121,37171.572781,4.795267
"""Detached""",402523.636554,264351.721037,276285.755531,343944.032287,274200.884622,102924.000405,342840.281123,288504.119433,125853.889858,359218.095506,433191.432621,141006.950907,198760.303002,166777.329302,289958.505648,307581.463318,271449.693489,269262.843617,116077.458123,234221.306007,325859.812353,260358.531337,265948.81194,441489.06364,285218.5542,439431.507797,290195.379225,96281.872014,106811.736397,3.564011
"""Semi Detached""",242468.335143,156773.672456,153196.438533,199330.45444,163583.96045,55135.859989,2561600000.0,163353.104231,67073.089483,220359.551639,264097.122314,74228.008179,105859.13613,87592.904123,157661.128806,180221.510394,152143.137035,153262.083294,61256.458684,129444.712446,189292.459269,144142.398415,155251.947723,264596.988643,171017.350698,1644200000.0,158038.284683,51892.181523,57748.912675,31683.856447
"""Other (Non Res…",981599.580965,1307600.0,657326.818182,850035.501104,1536100.0,470.0,683704.927107,248066.666667,,819943.07318,900376.620977,204.0,900000.0,180.0,85262.666667,872681.082618,200256.25,1050.0,110.0,98640.0,774288.548128,43957.333333,523022.0,866643.585928,1255700.0,804821.805794,72500.0,200.0,,4023.109029
"""Flat""",206560.531562,119849.53731,121326.238147,157237.262454,171327.945373,49636.85881,164151.132139,148473.174611,74443.135777,196593.796317,211673.183528,83872.392633,113899.974438,103527.916642,135021.862961,147949.885926,121194.813742,144081.580659,62582.307579,132231.399693,158068.65131,137921.983974,122371.411944,199456.59963,132979.451617,203965.842264,124278.871129,45915.696867,54477.571429,3.442181


### Q6: Since 1996, what property sale(s) in Trafford was the furthest south‑east?

In order to answer this question I needed to work out how to find south east using lat and long. To begin, I filtered the dataset for properties sold in Trafford since 1996. I used lat and long columns to Identify the furthest south east point, south = lowest latitude, east = highest longitude. Finally I returned the top result to inspect. 



In [42]:
# Filter for Trafford sales since 1996
trafford_sales = clean_df.filter(
    (pl.col("year") >= 1996) &
    (pl.col("district").str.to_lowercase() == "trafford")
)

# Sort by latitude ascending (south) and longitude descending (east)
furthest_se = (
    trafford_sales.sort(
        by=["lat", "long"],
        descending=[False, True]  # south = low lat, east = high long
    )
    .head(1)
)

# Select key columns for context
furthest_se

q6_clean = furthest_se.select([
    "price", "date", "postcode", "street", "paon", "town_city", "district", "lat", "long"
])

q6_clean.write_csv("../data/q6_answer.csv")