# Housing Price Prediction Using Machine Learning Models

## Content 
- [Introduction](#intro)
- [Importance](#importance)
- [Data sourse and methods](#sourse)
- [Timeline](#timeline)
- [References](#refs)

# Introduction

The main problem which may be associated with Ireland’s economy is currently experiencing severe housing crisis. Housing prices is big part of a country's economy and it affects many others sectors. This issue has deep roots, as was mentioned by Nowlan (2016) and Scuffil (2022), this crisis is going back to 1900s.

Lyons, R. (2018) point out that property market is a regular topic of national attention in Ireland. For majority house owners their house is most valuable assets.

Ireland's economy has been suffering one of the most severe housing busts of the global financial crisis (Norris and Byrne M. (2017), however in recent years been represented as having recovered economically (Nowicki et al.,
2019). Since 2013, Irish house prices have increased by 50%, while rent rates have grown by over 60% according to Byrne  M. (2020).

The pricing of houses is being affected by many different factors, such as the location of the house, the features in the house and condition Phan, T. (2019).

Many factors may affect the prices of hosing. These factors, as well as other parameters such as the materials used for building, number of bedrooms, living area, location, upcoming projects and proximity as was noted by Bourassa, Cantoni and Hoesli (2011).

The prediction of housing sale price may be considered as an essential economic metric. The value of a house that grows with time requires the estimated value to be calculated as this value is required for sale, purchase or even mortgage. (Shinde and Gawande, 2018).

## Literate review
Before getting started, we should investigate recent researches, methods and results. It will helps us to undestand which methods we can use and which results we should expect.

1. Aswin (2017) applied 6 different machine learning models to predict house prices in data set with 2000 records and 10 features. The author used 6 machine learning algorithms ,such as Random Forest, Neural Networks, Gradient Boosted, Bagging, Support Vector Machine and Multiple Regression. The best accuracy performed Random Forest with R-squared value of 90%.
2. Hujia Yu and Jiafu Wu (2016), also were working on a price prediction model. They created a regression and classification models which are able to estimate the price of the house given the features. It was concluded that for classification models the best model is Support Vector Classifier with linear kernel. The model showed an accuracy of 0.6740 and after PCA was performed on the dataset it increased to 0.6913. For regression problem, the best model is  Support Vector Regression with gaussian kernel, with RMSE of 0.5271.
3. Ng A. (2015) explored the use of machine learning methods for London house price prediction. Approach is used to create local models by comparing various regression methods. Gaussian method was found to be most efficient because its probabilistic approach to learning and model selection.


## Research question

Each dataset behaves differently.A machine learning model may work with high accuracy in one dataset, but perform poorly in another despite both of them were applied to similar data. Social and ecomonic data s very dinamic in contrast to phisical or chimical where approved theory can be reviewed only in unique circumstances.
This implies that data related to economy or social sector should be reviewed preferably as soon as new data has come.

This study is going to focus on applying different machine learning algorithms to Irish housing prices datasets, in order to understand which model gives the best accuracy.




## Data sourse and methods

### Data Source

### EDA
Exploratory Data Analysis (EDA). 
It will help us to understand the structure of the dataset including the size, shape, properties and types of variables. Also identify patterns and relationships between variables. Additionally, EDA allows as to select appropriate techniques and models for our futher alalysis.
For reaching those objectives we are going to use following approaches:
#### 1. Summary statistics
Mean, median, mode, standard deviation.
#### 2. Data visualization
Histograms, box plots, scatter plots.
#### 3. Correlation analysis
Pearson correlation coefficient.
#### 4. Outlier detection: 
IQR (Interquartile Range)

### Data preparation
Data preparation process is a very stage in the data analysis, it includes several steps to ensure that the data is in a appreciate  format for analysis and modeling. Depending on the project stages of data preparation may vary. In this project we are going to perform following stages:

1. Data Cleaning
Removing duplicate records and handling missing values using techniques such as mean, median, mode or method K-nearest neighbors.Also fixing formatting issues.
2. Data Transformation
Converting categorical variables into numerical representations.
3. Handling outliers:
Detecting and dealing with outliers that may affect our analysis.
4. Feature engineering:
Creating new features using existing ones to capture additional information.
5. Feature Selection:
Identifying and selection only the most relevant features for the analysis.

### Machine Learning Models

# References 

1. Nowlan, B., (2015). Housing Supply in Ireland: Perennial Problems and Sustainable Solutions. Sunday Business Property Conference Paper 25.09.15
2. Scuffil, C., (2022). Dublin’s Housing Crisis in Troubled Times. Dublin City Council Libraries Blog. Accessed on 03.09.22 at https://www.dublincity.ie/library/blog/dublins-housing-crisis-troubled-times
3. Phan, T. D. (2019) ‘Housing price prediction using machine learning algorithms: The case of Melbourne city, Australia’, Proceedings - International Conference on Machine Learning and Data Engineering, iCMLDE 2018. IEEE, pp. 8–13. DOI: 10.1109/iCMLDE.2018.00017.
4. Norris, M. & Byrne, M. (2017) A tale of two busts (and a boom): Irish social housing before and after the global financial crisis, Critical Housing Analysis, 4, pp. 19–28.
5. Nowicki, M., Brickell, K. & Harris, E. (2019) The hotelisation of the housing crisis: Experiences of family homelessness in Dublin hotels, The Geographical Journal, 185, pp. 313–324.
6. Byrne, M. (2020) Generation rent and the financialization of housing: A comparative exploration of the growth of the private rental sector in Ireland, the UK and Spain, Housing Studies, 35, pp. 743–765.
7. Bourassa, S. C., Cantoni, E. and Hoesli, M. (2011) ‘Predicting House Prices with Spatial Dependence:Impacts of Alternative Submarket Definitions’, SSRN Electronic Journal. DOI: 10.2139/ssrn.1090147.
8. Aswin S. R (2017), ‘Real Estate Price Prediction Using Machine Learning’ https://norma.ncirl.ie/3096/1/aswinsivamravikumar.pdf
9. Hujia Yu and Jiafu Wu., (2016), ' Real Estate Price Prediction with Regression and Classification' https://cs229.stanford.edu/proj2016/report/WuYu_HousingPrice_report.pdf
10. Ng, A. (2015). Machine Learning for a London Housing Price Prediction Mobile Application. Imperial College London. http://www.doc.ic.ac.uk/~mpd37/theses/2015_beng_aaron-ng.pdf


In [1]:
import nbformat as nbf

doc_path = "CA1 Machine learning.ipynb"
with open(doc_path, "r", encoding = "utf-8") as doc_file:
    doc_content = nbf.read(doc_file, as_version=4)

total_words = 0
references = ["References"]
for cell in doc_content['cells']:
    if cell['cell_type'] == 'markdown':
        exclude = any(reference in cell['source'] for reference in references)
        
        if not exclude:
            words = len(cell['source'].split())
            total_words += words
print("Total words:",total_words)

Total words: 1593
