## Project: Zillow's market value estimation

**Introduction:**\
A Zestimate is Zillow’s estimated market value for a home, computed using a proprietary formula including public and user-submitted data, such as details about a home (bedrooms, bathrooms, home age, etc.), location, property tax assessment information and sales histories of the subject home as well as other homes that have recently sold in the area.

**Objective:**\
In this competition, Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home. The log error is defined as\
$logerror=log(Zestimate)−log(SalePrice)$\
and it is recorded in the transactions file train.csv. In this competition, you are going to predict the logerror for the months in Fall 2017.\[1pt]

### Import of python libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import tensorflow as tf

### Import of the data

In [9]:
path = os.getcwd()
parent_path = os.path.abspath(os.path.join(path, os.pardir))
data_path = '\\data\\processed_data\\'

# Import the file into our working environment
X_train = pd.read_csv(parent_path + data_path + 'X_train.csv', sep=",", index_col=0)
X_test = pd.read_csv(parent_path + data_path + 'X_test.csv', sep=",", index_col=0)
y_train = pd.read_csv(parent_path + data_path + 'y_train.csv', sep=",", index_col=0)
y_test = pd.read_csv(parent_path + data_path + 'y_test.csv', sep=",", index_col=0)

In [8]:
X_train.head()

Unnamed: 0,yearbuilt,regionidcity,regionidcounty,regionidzip,roomcnt,bathroomcnt,bedroomcnt,calculatedfinishedsquarefeet,lotsizesquarefeet,assessmentyear
29578,2009.0,50677.0,3101.0,96531.0,0.0,2.0,3.0,1360.0,7000.0,2016.0
978580,1955.0,39306.0,3101.0,96488.0,0.0,2.0,3.0,1388.0,5953.0,2016.0
1940748,1941.0,12447.0,3101.0,96412.0,0.0,3.0,3.0,2240.0,6503.0,2016.0
1994608,1955.0,16764.0,1286.0,97023.0,7.0,2.0,3.0,2320.0,7200.0,2016.0
2144800,1913.0,47568.0,1286.0,97001.0,6.0,1.0,3.0,1292.0,5750.0,2016.0


In [17]:
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Build the neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1)  # Output layer
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train_scaled, y_train, epochs=10, batch_size=1024, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2278cd447f0>

In [18]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (None, 64)                704       
                                                                 
 dense_9 (Dense)             (None, 32)                2080      
                                                                 
 dense_10 (Dense)            (None, 16)                528       
                                                                 
 dense_11 (Dense)            (None, 1)                 17        
                                                                 
Total params: 3,329
Trainable params: 3,329
Non-trainable params: 0
_________________________________________________________________


In [19]:
# Make predictions
y_pred = model.predict(X_test_scaled)



In [20]:
from sklearn.metrics import r2_score
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("Root Mean Squared Error (RMSE):", rmse)
print("R2 score:", r2_score(y_test, y_pred))

Root Mean Squared Error (RMSE): 774979.1418979057
R2 score: 0.1963013390664624
