# Assessing the Impact of Building Attributes and Energy Efficiency Ratings on Housing Price Fluctuations in Reading

## Introduction
The housing market plays a significant role in every nation’s economy (Amin and Al-Din, 2018), as it is classified as a necessity for human existence per Maslow’s theory of needs. Changes in housing prices can have a major impact on individuals, communities, and the broader economic environment. Therefore, policymakers, real estate professionals, and homeowners must fully comprehend the numerous factors impacting these changes in the various geographical regions. In Reading, a thriving town in Berkshire, United Kingdom, the housing market undergoes fluctuating changes caused by several different factors such as location, property physical features, neighbourhood amenities, etc.  According to Zancanella et al., (2018), the physical characteristics of the buildings and their energy efficiency rating are important factors that constitute house prices. While several determinants of house price variation have been extensively studied, empirical examination of the impact of key building attributes such as legal interest and energy efficiency rating remains scarce in this study area. This research aims to bridge this gap by investigating how major building attributes and energy efficiency ratings influence housing price fluctuations in Reading, providing valuable insights into the dynamics of the local housing market.


## Literature Review
House prices are influenced by myriad factors, including micro and macroeconomic conditions, demographic trends, and housing supply and demand dynamics of the market segment (Abate and Anselin, 2016). Recent studies document how building attributes and energy efficiency ratings play a crucial role in determining housing prices (Zuo and Zhao, 2014; Zhang et al., 2017; Zancanella et al., 2018). Green building certification, which encompasses features such as sustainable construction materials and energy-efficient designs, has gained consideration for its positive impact on house prices (Huang, 2023). The study of Zhang et al., (2023), emphasized that energy-efficient homes tend to command higher prices in the housing market due to their lower utility costs and environmental benefits. However, the specific impact of green building rating on housing price fluctuations in Reading requires further exploration, as the dynamics of the local market may differ from broader market trends.
Although most studies focused on traditional factors such as property location, features, and amenities in determining housing prices, the impact of legal interest structures, such as leasehold and freehold arrangements, remains relatively understudied. Legal interest influences property values by shaping ownership rights, maintenance responsibilities, and future development opportunities (Caesar et al, 2019). While several research explored broader determinants of housing price disparities, few have delved into the unique features of local housing markets and their correlation with building attributes and energy efficiency ratings. We therefore aim to address this gap by conducting a thorough analysis of the housing market in Reading, examining how essential building attributes like freehold/leasehold interests and energy efficiency ratings affect fluctuations in housing prices. Through an examination of these factors within the context of Reading's housing market, this research aims to contribute valuable insights for policymakers, professionals, and homeowners looking to comprehend and navigate the intricacies of the local housing market.


## Research Question 
What is the impact of building attributes like freehold/leasehold interests and energy efficiency ratings on fluctuation in housing prices in Reading?


## Presentation of data 
The dataset used for this study was accessed on the UCL database via LondonDatastore. It was created and maintained by Bin Chi, Adam Dennett, Thomas Oleron-Evans, and Robin Morphet (all from UCL) for a non-commercial purpose. 
Find dataset (hpm la 2023.zip) **[here](https://data.london.gov.uk/dataset/house-price-per-square-metre-in-england-and-wales)**. The dataset on house prices per square meter was generated through complex address-based matching procedures, aligning information from the Land Registry's Price Paid Data (LR-PPD) with property size details sourced from the Domestic Energy Performance Certificates (EPC) data, which is publicly available through the Department for Levelling Up, Housing and Communities (DLUHC, previously known as MHCLG).



## Import Libraries

In [1]:
#This tells python to draw the graphs "inline" - in the notebook
%matplotlib inline  
import matplotlib.pyplot as plt
import statsmodels.api as sm
from math import sqrt
from numpy.random import seed
from numpy.random import randn
from numpy import mean
from scipy.stats import sem
import statistics 
import seaborn as sns
from IPython.display import display, Math, Latex, display_latex
import plotly.express as px
import pylab
import pandas as pd
import numpy as np
# make the plots (graphs) a little wider by default
pylab.rcParams['figure.figsize'] = (10., 8.)
sns.set(font_scale=1.5)
sns.set_style("white")


## Loading Data

In [2]:

# Reading the CSV file from the github URL and handling parsing errors differently 
try:
    df = pd.read_csv('https://github.com/VincentBEDU/DSS/raw/main/data/Reading_link_02122023.csv')
    print(df)
except pd.errors.ParserError as e:
    print("Error parsing CSV:", e)


          priceper  year dateoftransfer propertytype duration     price  \
0      2876.543210  2015     2015-03-20            S        F  233000.0   
1      2283.333333  2006     2006-08-23            S        F  184950.0   
2      1728.333333  2003     2003-04-29            S        F  139995.0   
3      1349.380015  1997     1997-06-06            D        F  185000.0   
4      4609.589041  2016     2016-04-22            T        F  336500.0   
...            ...   ...            ...          ...      ...       ...   
69651  6583.333333  2017     2017-07-26            F        L  237000.0   
69652  1077.922078  1998     1998-09-25            S        F   83000.0   
69653  1157.754813  1998     1998-06-19            F        L   81000.0   
69654   814.606742  1998     1998-01-28            T        F   72500.0   
69655  2633.333333  2006     2006-07-24            S        F  395000.0   

       postcode    lad21cd                           transactionid       id  \
0       RG2 8PP  E06

The information above shows that the dataset contains **69656** rows and **16** columns. It is necessary to look at the column names and what they represent. 

## Printing column names

In [8]:
# prints column names
df.columns

Index(['priceper', 'year', 'dateoftransfer', 'propertytype', 'duration',
       'price', 'postcode', 'lad21cd', 'transactionid', 'id', 'tfarea',
       'numberrooms', 'classt', 'CURRENT_ENERGY_EFFICIENCY',
       'POTENTIAL_ENERGY_EFFICIENCY', 'CONSTRUCTION_AGE_BAND'],
      dtype='object')

## Column Names Interpretation 

| Column Name                  | Interpretation                  |
|------------------------------|---------------------------------|
| priceper                     | Price per square meter          |
| year                         | Year of transaction             |
| dateoftransfer               | Transfer date                   |
| propertytype                 | Property type                   |
| duration                     | Property tenure                 |
| price                        | Price of property               |
| postcode                     | Property postcode               |
| lad21cd                      | 2021 Local authority code       |
| transactionid                | Transaction identifier          |
| id                           | Domestic EPCs Identifier        |
| tfarea                       | Total floor area                |
| numberrooms                  | Number of rooms                 |
| classt                       | Class matching type             |
| CURRENT_ENERGY_EFFICIENCY    | Current energy efficiency rating|
| POTENTIAL_ENERGY_EFFICIENCY  | Potential energy efficiency     |
| CONSTRUCTION_AGE_BAND        | Age band when part were built   |


For property types, D = Detached, S = Semi-Detached, T = Terraced, F = Flats.

For property tenure, F = Freehold and L = Leasehold

## Data Description 

In [15]:

# Replace inf and NaN values in the priceper column with np.nan
df['priceper'] = df['priceper'].replace([np.inf, -np.inf], np.nan)

summary=df.describe().round(2)  # generate summary statistics, and round everything to 2 decimal degrees
summary=summary.T #.T transposes the table (rows become columns and vice versa)
summary

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
priceper,69649.0,2721.78,1368.41,8.0,1735.31,2619.05,3564.1,34029.04
year,69656.0,2008.35,8.04,1995.0,2001.0,2007.0,2015.0,2023.0
price,69656.0,217018.39,144124.17,400.0,125000.0,185000.0,273000.0,3500000.0
id,69656.0,3162285.27,200365.7,3059586.0,3140804.25,3158094.0,3175483.0,21488281.0
tfarea,69656.0,81.66,36.4,0.0,60.01,75.0,93.0,1322.0
numberrooms,64717.0,4.28,1.7,0.0,3.0,4.0,5.0,83.0
classt,69656.0,11.28,0.45,11.0,11.0,11.0,12.0,12.0
CURRENT_ENERGY_EFFICIENCY,69656.0,63.55,12.58,1.0,57.0,64.0,72.0,109.0
POTENTIAL_ENERGY_EFFICIENCY,69656.0,77.03,9.96,1.0,72.0,79.0,84.0,115.0


The data description above shows a total sample size of **69,656** property price data. The average price per square meter and property price are **£2,722** and **£217,018** respectively at Reading. The endogenous or dependent variable for the analysis is the sale property (price). It has a standard deviation of **£144,124.17** demonstrating how property prices spread across Reading geographical regions. However, there exists a large gap between the minimum price of **£400.00** and the maximum of **£3,500,000.00**, which signals the existence of outliers in the dataset. Outliers have an impact on the average value, therefore, it will be examined and dealt with during the analysis.
