<img src="https://news.illinois.edu/files/6367/543635/116641.jpg" alt="University of Illinois" width="250"/>

## HW: Deep Learning ##

HW submission by group (up to 4 people)
* John Doe <johndoe@illinois.edu>
* Jane Roes <janeroe@illinois.edu>

**Redfin Price Prediction**:  Download property data from Redfin <https://www.redfin.com/> for several neighborhoods of Chicago.  Use multilayer neural networks to predict price based upon the feature set
* Square Feet
* Property Type
* number of Beds
* number of Baths
* Year built
* HOA/Month

In [None]:
import numpy
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
def getfile(location_pair,**kwargs): #tries to get local version and then defaults to google drive version
    (loc,gdrive)=location_pair
    try:
        out=pd.read_csv(loc,**kwargs)
    except FileNotFoundError:
        print("local file not found; accessing Google Drive")
        loc = 'https://drive.google.com/uc?export=download&id='+gdrive.split('/')[-2]
        out=pd.read_csv(loc,**kwargs)
    return out

In [None]:
url="https://www.redfin.com"
fname=("redfin_data.csv","https://drive.google.com/file/d/1BFgKwV58YkPX_PRWMuKRHQoT6T0de_Qf/view?usp=sharing")


In [None]:
data_raw=getfile(fname)
data_raw

local file not found; accessing Google Drive


Unnamed: 0,SALE TYPE,SOLD DATE,PROPERTY TYPE,ADDRESS,CITY,STATE OR PROVINCE,ZIP OR POSTAL CODE,PRICE,BEDS,BATHS,...,STATUS,NEXT OPEN HOUSE START TIME,NEXT OPEN HOUSE END TIME,URL (SEE https://www.redfin.com/buy-a-home/comparative-market-analysis FOR INFO ON PRICING),SOURCE,MLS#,FAVORITE,INTERESTED,LATITUDE,LONGITUDE
0,MLS Listing,,Single Family Residential,333 Dodge Ave,Evanston,IL,60202,339900,3.0,2.0,...,Active,,,https://www.redfin.com/IL/Evanston/333-Dodge-A...,MRED,11818689,N,Y,42.024226,-87.699124
1,MLS Listing,,Condo/Co-op,2254 Sherman Ave #2,Evanston,IL,60201,199000,2.0,1.0,...,Active,,,https://www.redfin.com/IL/Evanston/2254-Sherma...,MRED,11831019,N,Y,42.059131,-87.682292
2,MLS Listing,,Single Family Residential,2701 Noyes St,Evanston,IL,60201,1075000,5.0,3.5,...,Active,August-5-2023 11:00 AM,August-5-2023 01:00 PM,https://www.redfin.com/IL/Evanston/2701-Noyes-...,MRED,11850222,N,Y,42.058239,-87.710956
3,MLS Listing,,Townhouse,1507 Wilder St,Evanston,IL,60202,825000,4.0,2.5,...,Active,,,https://www.redfin.com/IL/Evanston/1507-Wilder...,MRED,11849960,N,Y,42.040571,-87.693225
4,MLS Listing,,Single Family Residential,9409 Crawford Ave,Evanston,IL,60203,525000,3.0,1.5,...,Active,August-6-2023 02:00 PM,August-6-2023 04:00 PM,https://www.redfin.com/IL/Evanston/9409-Crawfo...,MRED,11849554,N,Y,42.052003,-87.727151
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
103,MLS Listing,,Vacant Land,1821 Lake St,Evanston,IL,60201,350000,,,...,Active,,,https://www.redfin.com/IL/Evanston/1821-Lake-S...,MRED,11675879,N,Y,42.044480,-87.698591
104,MLS Listing,,Condo/Co-op,1508 Hinman Ave Unit 4B,Evanston,IL,60201,344900,2.0,2.0,...,Active,,,https://www.redfin.com/IL/Evanston/1508-Hinman...,MRED,11651979,N,Y,42.045088,-87.678878
105,MLS Listing,,Single Family Residential,90 Kedzie St,Evanston,IL,60202,5750000,5.0,5.0,...,Active,,,https://www.redfin.com/IL/Evanston/90-Kedzie-S...,MRED,11649944,N,Y,42.031828,-87.669248
106,MLS Listing,,Vacant Land,1815 Hovland Ct,Evanston,IL,60201,110000,,,...,Active,,,https://www.redfin.com/IL/Evanston/1815-Hovlan...,MRED,11385578,N,Y,42.050800,-87.700945


In [None]:
data = data_raw.copy()

data["SQUARE FEET/1000"]=data["SQUARE FEET"]/1000
data["PRICE/$1M"]=data["PRICE"]/1.0E6
data = data[["PRICE", "SQUARE FEET/1000", "PROPERTY TYPE", "BEDS", "BATHS", "YEAR BUILT", "HOA/MONTH"]]

In [None]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler

In [None]:
data.head()

Unnamed: 0,PRICE,SQUARE FEET/1000,PROPERTY TYPE,BEDS,BATHS,YEAR BUILT,HOA/MONTH
0,339900,1.52,Single Family Residential,3.0,2.0,1958.0,
1,199000,0.9,Condo/Co-op,2.0,1.0,1959.0,452.0
2,1075000,3.135,Single Family Residential,5.0,3.5,2023.0,
3,825000,3.219,Townhouse,4.0,2.5,2019.0,
4,525000,1.6,Single Family Residential,3.0,1.5,1949.0,


# Preprocessing and building model

In [None]:
# Using a Min Max Scaler to keep positive and small values for the label (prices) and features
df_preprocessed = data.copy()

minmax_scaler_label = MinMaxScaler()
minmax_scaler_features = MinMaxScaler()
label = "PRICE"
numerical_vars = ["SQUARE FEET/1000", "BEDS", "BATHS", "YEAR BUILT", "HOA/MONTH"]
df_preprocessed[numerical_vars] = df_preprocessed[numerical_vars].fillna(df_preprocessed[numerical_vars].mean())
df_preprocessed[numerical_vars] = minmax_scaler_features.fit_transform(df_preprocessed[numerical_vars])

df_preprocessed[label] = minmax_scaler_label.fit_transform(df_preprocessed[label].values.reshape(-1,1))


#Convert PROPERTY TYPE as a set of binary variables for each modality (One-Hot Encoding)
df_preprocessed = pd.get_dummies(df_preprocessed, columns=['PROPERTY TYPE'])
df_preprocessed.head()


Unnamed: 0,PRICE,SQUARE FEET/1000,BEDS,BATHS,YEAR BUILT,HOA/MONTH,PROPERTY TYPE_Condo/Co-op,PROPERTY TYPE_Multi-Family (2-4 Unit),PROPERTY TYPE_Single Family Residential,PROPERTY TYPE_Townhouse,PROPERTY TYPE_Vacant Land
0,0.04449,0.088436,0.285714,0.181818,0.591195,0.153929,0,0,1,0,0
1,0.019604,0.028838,0.142857,0.0,0.597484,0.074283,1,0,0,0,0
2,0.17432,0.24368,0.571429,0.454545,1.0,0.153929,0,0,1,0,0
3,0.130166,0.251754,0.428571,0.272727,0.974843,0.153929,0,0,0,1,0
4,0.077181,0.096126,0.285714,0.090909,0.534591,0.153929,0,0,1,0,0


In [None]:
#Using the RELU activation function to keep positive and real values in the final prediction
class FeedForwardNN(nn.Module):
    def __init__(self, input_size, layer_size1=64, layer_size2=32):
        super(FeedForwardNN, self).__init__()

        self.fc1 = nn.Linear(input_size, layer_size1)

        self.fc2 = nn.Linear(layer_size1, layer_size2)

        self.fc3 = nn.Linear(layer_size2, 1)

    def forward(self, x):

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return F.relu(self.fc3(x))

input_size = 10  # 10 predictors after turning Property Type into 5 predictors
model = FeedForwardNN(input_size)

print(model)

NameError: ignored

In [None]:
X_tensor = torch.FloatTensor(df_preprocessed.drop(label, axis=1).values)
y_tensor = torch.FloatTensor(df_preprocessed[label].values).view(-1, 1)

In [None]:
BATCHSIZE=50
EPOCHS=1000
learningRate= 0.01
mydataset=torch.utils.data.TensorDataset(*[X_tensor,y_tensor])
mydataloader=torch.utils.data.DataLoader(mydataset,batch_size=BATCHSIZE,shuffle=True)

In [None]:
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learningRate) # Experimented with learning rate


for epoch in range(EPOCHS):
  for iter_ctr,(trainingfeatures,traininglabels) in enumerate(mydataloader):
    optimizer.zero_grad()
    outputs = model(X_tensor)
    loss = loss_function(outputs, y_tensor)


    loss.backward()
    optimizer.step()

    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{EPOCHS}], Loss: {loss.item():.4f}')

Epoch [100/1000], Loss: 0.0023
Epoch [100/1000], Loss: 0.0023
Epoch [100/1000], Loss: 0.0023
Epoch [200/1000], Loss: 0.0023
Epoch [200/1000], Loss: 0.0023
Epoch [200/1000], Loss: 0.0023
Epoch [300/1000], Loss: 0.0023
Epoch [300/1000], Loss: 0.0023
Epoch [300/1000], Loss: 0.0023
Epoch [400/1000], Loss: 0.0023
Epoch [400/1000], Loss: 0.0023
Epoch [400/1000], Loss: 0.0023
Epoch [500/1000], Loss: 0.0023
Epoch [500/1000], Loss: 0.0023
Epoch [500/1000], Loss: 0.0023
Epoch [600/1000], Loss: 0.0023
Epoch [600/1000], Loss: 0.0024
Epoch [600/1000], Loss: 0.0024
Epoch [700/1000], Loss: 0.0023
Epoch [700/1000], Loss: 0.0023
Epoch [700/1000], Loss: 0.0023
Epoch [800/1000], Loss: 0.0023
Epoch [800/1000], Loss: 0.0024
Epoch [800/1000], Loss: 0.0024
Epoch [900/1000], Loss: 0.0023
Epoch [900/1000], Loss: 0.0023
Epoch [900/1000], Loss: 0.0023
Epoch [1000/1000], Loss: 0.0023
Epoch [1000/1000], Loss: 0.0023
Epoch [1000/1000], Loss: 0.0023


# Testing performances

In [None]:
model.eval()
with torch.no_grad():
    predictions = model(X_tensor)

In [None]:
predicted_prices = predictions.numpy()
for i, price in enumerate(predicted_prices[:20]):
    print(f"Row {i + 1}: Predicted Price = {price[0]:.2f}")

Row 1: Predicted Price = 0.04
Row 2: Predicted Price = 0.02
Row 3: Predicted Price = 0.17
Row 4: Predicted Price = 0.13
Row 5: Predicted Price = 0.08
Row 6: Predicted Price = 0.06
Row 7: Predicted Price = 0.08
Row 8: Predicted Price = 0.22
Row 9: Predicted Price = 0.12
Row 10: Predicted Price = 0.14
Row 11: Predicted Price = 0.21
Row 12: Predicted Price = 0.13
Row 13: Predicted Price = 0.02
Row 14: Predicted Price = 0.04
Row 15: Predicted Price = 0.14
Row 16: Predicted Price = 0.03
Row 17: Predicted Price = 0.04
Row 18: Predicted Price = 0.04
Row 19: Predicted Price = 0.05
Row 20: Predicted Price = 0.34


In [None]:
df_preprocessed["Predicted_price"] = predicted_prices

In [None]:
df_results = df_preprocessed[["PRICE", "Predicted_price"]]
df_results = pd.DataFrame(minmax_scaler_label.inverse_transform(df_results))
df_results.columns = ["PRICE", "Predicted_price"]
df_results

Unnamed: 0,PRICE,Predicted_price
0,339900.0,3.419339e+05
1,199000.0,1.987347e+05
2,1075000.0,1.075120e+06
3,825000.0,8.250103e+05
4,525000.0,5.252635e+05
...,...,...
103,350000.0,8.722287e+05
104,344900.0,3.447372e+05
105,5750000.0,5.750187e+06
106,110000.0,8.722287e+05


In [None]:
from google.colab import autoviz

def scatter_plots(df, colname_pairs, figscale=1, alpha=.8):
  from matplotlib import pyplot as plt
  plt.figure(figsize=(len(colname_pairs) * 6 * figscale, 6 * figscale))
  for plot_i, (x_colname, y_colname) in enumerate(colname_pairs, start=1):
    ax = plt.subplot(1, len(colname_pairs), plot_i)
    df.plot(kind='scatter', x=x_colname, y=y_colname, s=(32 * figscale), alpha=alpha, ax=ax)
    ax.spines[['top', 'right',]].set_visible(False)
  plt.tight_layout()
  return autoviz.MplChart.from_current_mpl_state()

chart = scatter_plots(df_results, *[[['PRICE', 'Predicted_price']]], **{})
chart

In [None]:
mae = numpy.mean(numpy.abs(df_results.Predicted_price - df_results.PRICE))
mean_relative_error = numpy.mean(numpy.abs(df_results.Predicted_price - df_results.PRICE)/df_results.PRICE)

print(mae)
print(mean_relative_error)

71367.7234394087
0.19753600627486256


The graph shows that the predictions seem quite accurate except for half a dozen points (probably scarcer points from the training dataset) with predictions around $900k.
Plus, the mean relative error is less than 20%, which is not optimal but still quite good considering the small size of the dataset.