In [1]:
import os
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy as sp
import keras
import sklearn.decomposition
import sklearn.model_selection
import sklearn.preprocessing
import sklearn.linear_model
import sklearn.metrics


- [1. Introduction](#1-introduction)



# 1. Introduction

Predicting for how much houses will sell will help current and prospective homeowners navigate the cluttered landscape of real estate. 

# 2. Dataset

The dataset provided houses a lot of different parameters on which the price of a house can depend. Just to give a couple examples with their usual impact on the house price from general knowledge and experience:

- LotArea: the area of the land on which a house stands. In general the higher this number the higher the price.
- Bedroom: number of bedrooms above ground. In general the higher the number of bedrooms the higher the price.

There are a lot more prameters that are contained in the dataset. For more detail on this consult the data_description.txt file. 

# 3. Analysing the dataset

For this we first have to open the csv files that contain the data. In here we see that there are some columns that contain numeric data and some that do not. There are also some empty values. Before any models can be trained on the dataset we have to format it such that there are solely numeric values. In the data_description.txt file all the different kinds of non-numeric values can be found. 

In [11]:
# accessing the csv files

train_csv = pd.read_csv("./house-prices-advanced-regression-techniques/train.csv")
test_csv = pd.read_csv("./house-prices-advanced-regression-techniques/test.csv")

data_fields = train_csv.columns.values

print(train_csv.head())
print(data_fields)

   Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0   1          60       RL         65.0     8450   Pave   NaN      Reg   
1   2          20       RL         80.0     9600   Pave   NaN      Reg   
2   3          60       RL         68.0    11250   Pave   NaN      IR1   
3   4          70       RL         60.0     9550   Pave   NaN      IR1   
4   5          60       RL         84.0    14260   Pave   NaN      IR1   

  LandContour Utilities  ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold  \
0         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
1         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      5   
2         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      9   
3         Lvl    AllPub  ...        0    NaN   NaN         NaN       0      2   
4         Lvl    AllPub  ...        0    NaN   NaN         NaN       0     12   

  YrSold  SaleType  SaleCondition  SalePrice  
0   2008        WD   

In [30]:
# formatting the data

# change the NaN values to zero

train_csv = train_csv.mask(pd.isna(train_csv), 0)
test_csv = test_csv.mask(pd.isna(test_csv), 0)




    
print(train_csv)
print(test_csv)

        Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0        1          60       RL         65.0     8450   Pave     0      Reg   
1        2          20       RL         80.0     9600   Pave     0      Reg   
2        3          60       RL         68.0    11250   Pave     0      IR1   
3        4          70       RL         60.0     9550   Pave     0      IR1   
4        5          60       RL         84.0    14260   Pave     0      IR1   
...    ...         ...      ...          ...      ...    ...   ...      ...   
1455  1456          60       RL         62.0     7917   Pave     0      Reg   
1456  1457          20       RL         85.0    13175   Pave     0      Reg   
1457  1458          70       RL         66.0     9042   Pave     0      Reg   
1458  1459          20       RL         68.0     9717   Pave     0      Reg   
1459  1460          20       RL         75.0     9937   Pave     0      Reg   

     LandContour Utilities  ... PoolArea PoolQC  Fe

# N. References

- Predicting House Prices (Keras - ANN), Tomas Mantero, https://www.kaggle.com/code/tomasmantero/predicting-house-prices-keras-ann/notebook
- 