# Predicting Monopoly Hotel Prices

If you've played the game
[Monopoly](https://en.wikipedia.org/wiki/Monopoly_(game)
you will know that some properties cost more than others, and 
the rent charged for a Hotel.

There's a relationship between them. Let's find out what it is.

The set of properties is on an HTML table here:
http://www.jdawiseman.com/papers/trivia/monopoly-rents.html

In [35]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

import sklearn.cross_validation
import sklearn.linear_model
import sklearn.metrics
import sklearn.dummy
import sklearn.tree

In [26]:
url = 'http://www.jdawiseman.com/papers/trivia/monopoly-rents.html'
dfs = pd.read_html(url, header=1, index_col=0)
monopoly = dfs[0]
monopoly

Unnamed: 0_level_0,Cost,M’tg,Site,1 hse,2 hses,3 hses,4 hses,Hotel
Property,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Old Kent Road,60,30,2,10.0,30.0,90.0,160.0,250.0
Whitechapel Road,60,30,4,20.0,60.0,180.0,320.0,450.0
Kings Cross Station,200,100,25 or 50 or 100 or 200,,,,,
The Angel Islington,100,50,6,30.0,90.0,270.0,400.0,550.0
Euston Road,100,50,6,30.0,90.0,270.0,400.0,550.0
Pentonville Road,120,60,8,40.0,100.0,300.0,450.0,600.0
Pall Mall,140,70,10,50.0,150.0,450.0,625.0,750.0
Electric Company,150,75,4×dice or 10×dice,,,,,
Whitehall,140,70,10,50.0,150.0,450.0,625.0,750.0
Northumberland Avenue,160,80,12,60.0,180.0,500.0,700.0,900.0


In [27]:
# There should only be one DataFrame.
# Make a copy of it and drop all the rows with NaN values (these
# are the railway stations and the utilities)
monopoly.dropna(inplace=True)

In [54]:
# The index is the name of the property.
monopoly.index, len(monopoly.index)

(Index([u'Old Kent Road', u'Whitechapel Road', u'The Angel Islington',
        u'Euston Road', u'Pentonville Road', u'Pall Mall', u'Whitehall',
        u'Northumberland Avenue', u'Bow Street', u'Marlborough Street',
        u'Vine Street', u'The Strand', u'Fleet Street', u'Trafalgar Square',
        u'Leicester Square', u'Coventry Street', u'Piccadilly',
        u'Regent Street', u'Oxford Street', u'Bond Street', u'Park Lane',
        u'Mayfair'],
       dtype='object', name=u'Property'), 22)

In [79]:
# Have a look at the column names
#  - "1 hse" means the rent when there is one house on the property
#  - "M'tg" means the price to mortgage the property

monopoly.columns

Index([u'Cost', u'M’tg', u'Site', u'1 hse', u'2 hses', u'3 hses', u'4 hses',
       u'Hotel'],
      dtype='object')

In [98]:
# Make a dataframe X which contains the "Site" column
x = monopoly[['Site']]
print type(x)

<class 'pandas.core.frame.DataFrame'>


In [99]:
# Make a series Y which contains the "Hotel" column
y = monopoly['Hotel']
print type(y)

<class 'pandas.core.series.Series'>


In [100]:
sklearn.cross_validation.train_test_split?

In [101]:
# Use sklearn.cross_validation.train_test_split to split X and y
# into train and test sets

test_split = sklearn.cross_validation.train_test_split

(train_x, test_x, train_y, test_y) = test_split(x,y)


In [103]:
# Check the index of the X_training data to see what properties were
# included. How many are there?

len(train_x)


16

In [104]:
# Check the index of the X_test data to see what properties were
# included. How many are there?

len(test_x)


6

In [105]:
# Do a train_test_split again, but this time, specify test_size

(train_x, test_x, train_y, test_y) = test_split(x,y,test_size=0.1)

print len(train_x), len(test_x)
print len(train_y), len(test_y)

19 3
19 3


# OSL

In [106]:
# Make a LinearRegression object, and try to fit the X_train data
# to Y_train

osl = sklearn.linear_model.LinearRegression()
osl.fit(train_x, train_y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [111]:
osl.score?

In [110]:
# Your LinearRegression object has a .score() method.
# Try it on your X_train and Y_train data.
# It will be good, but it will be overly confident about its accuracy

osl.score(train_x, train_y)


0.96234300849276555

In [112]:
# Make predictions for your X_test data

prediction = osl.predict(test_x)

In [113]:
# Zip these predictions with the correct answers (Y_test) and compare
# them visually. You could graph them too if you want to.

zip(test_y, prediction)

[(1275.0, 1278.7450515508767),
 (1100.0, 1073.4779440553357),
 (550.0, 594.52135989907333)]

In [114]:
# sklearn.metrics.median_absolute_error to get a sense of how
# far wrong it was

sklearn.metrics.median_absolute_error(test_y, prediction)

26.522055944664316

#### Gridsearch

In [None]:
# Try the same with a sklearn.dummy.DummyRegressor() and 
# sklearn.tree.DecisionTreeRegressor()

# Dummy Test

In [116]:
#Dummy Test

dummy = sklearn.dummy.DummyRegressor()
dummy.fit(train_x, train_y)
print dummy.score(train_x, train_y)
print

prediction = dummy.predict(test_x)
print zip(test_y, prediction)
print

print sklearn.metrics.median_absolute_error(test_y, prediction)


0.0

[(1275.0, 996.0526315789474), (1100.0, 996.0526315789474), (550.0, 996.0526315789474)]

278.947368421


# Decision Tree

In [117]:
#Decision Tree

tree = sklearn.tree.DecisionTreeRegressor()
tree.fit(train_x, train_y)
print tree.score(train_x, train_y)
print

prediction = tree.predict(test_x)
print zip(test_y, prediction)
print

print sklearn.metrics.median_absolute_error(test_y, prediction)


0.999433651532

[(1275.0, 1275.0), (1100.0, 1050.0), (550.0, 550.0)]

0.0


#### Gridsearch