# RentHop: Rental Listing Inquiries

url = https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/

### Understanding the Question

Given a set of features for a rental listing, we are to predict how much interest (low, medium, high) a rental listing will receive.  We are given labels for our data.  Our predictions should be represented as class probability (as per the competition rules).

This is a supervised classification problem.

### Getting Started - Load & Inspect Data

The data is available on kaggle at https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data.  We are given 14 features in our data set and the label column is called 'interest_level'.

In [1]:
import pandas as pd
import numpy as np

train_df = pd.read_json('train.json')
train_df.head()

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,interest_level,latitude,listing_id,longitude,manager_id,photos,price,street_address
10,1.5,3,53a5b119ba8f7b61d4e010512e0dfc85,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,[],medium,40.7145,7211212,-73.9425,5ba989232d0489da1b5f2c45f6688adc,[https://photos.renthop.com/2/7211212_1ed4542e...,3000,792 Metropolitan Avenue
10000,1.0,2,c5c8a357cba207596b04d1afd1e4f130,2016-06-12 12:19:27,,Columbus Avenue,"[Doorman, Elevator, Fitness Center, Cats Allow...",low,40.7947,7150865,-73.9667,7533621a882f71e25173b27e3139d83d,[https://photos.renthop.com/2/7150865_be3306c5...,5465,808 Columbus Avenue
100004,1.0,1,c3ba40552e2120b0acfc3cb5730bb2aa,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,"[Laundry In Building, Dishwasher, Hardwood Flo...",high,40.7388,6887163,-74.0018,d9039c43983f6e564b1482b273bd7b01,[https://photos.renthop.com/2/6887163_de85c427...,2850,241 W 13 Street
100007,1.0,1,28d9ad350afeaab8027513a3e52ac8d5,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,"[Hardwood Floors, No Fee]",low,40.7539,6888711,-73.9677,1067e078446a7897d2da493d2f741316,[https://photos.renthop.com/2/6888711_6e660cee...,3275,333 East 49th Street
100013,1.0,4,0,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,[Pre-War],low,40.8241,6934781,-73.9493,98e13ad4b495b9613cef886d79a6291f,[https://photos.renthop.com/2/6934781_1fa4b41a...,3350,500 West 143rd Street


In [2]:
train_df.shape

(49352, 15)

In [4]:
train_df.dtypes

bathrooms          float64
bedrooms             int64
building_id         object
created             object
description         object
display_address     object
features            object
interest_level      object
latitude           float64
listing_id           int64
longitude          float64
manager_id          object
photos              object
price                int64
street_address      object
dtype: object

In [5]:
#Check for NaNs
train_df.isnull().sum()

bathrooms          0
bedrooms           0
building_id        0
created            0
description        0
display_address    0
features           0
interest_level     0
latitude           0
listing_id         0
longitude          0
manager_id         0
photos             0
price              0
street_address     0
dtype: int64

### Feature Engineering

Before I start to visualized and look at descriptive statistics of the data, I want to convert some of the data fields into numerical features.

In [9]:
#Create # of Photos Column
train_df['NumPhotos'] = train_df.photos.str.len()
#Create # of Features Column
train_df['NumFeatures'] = train_df.features.str.len()

In [11]:
#Use sklearn LabelEncoder to label building_id and manager_id
from sklearn.preprocessing import LabelEncoder

#Encode building_id
le_building = LabelEncoder()
le_building.fit(train_df['building_id'])
train_df['BuildingID'] = le_building.transform(train_df['building_id'])

#Encode manager_id
le_manager = LabelEncoder()
le_manager.fit(train_df['manager_id'])
train_df['ManagerID'] = le_manager.transform(train_df['manager_id'])

#Inspect Changes
train_df.head()

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,interest_level,latitude,listing_id,longitude,manager_id,photos,price,street_address,NumPhotos,NumFeatures,BuildingID,ManagerID
10,1.5,3,53a5b119ba8f7b61d4e010512e0dfc85,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,[],medium,40.7145,7211212,-73.9425,5ba989232d0489da1b5f2c45f6688adc,[https://photos.renthop.com/2/7211212_1ed4542e...,3000,792 Metropolitan Avenue,5,0,2431,1239
10000,1.0,2,c5c8a357cba207596b04d1afd1e4f130,2016-06-12 12:19:27,,Columbus Avenue,"[Doorman, Elevator, Fitness Center, Cats Allow...",low,40.7947,7150865,-73.9667,7533621a882f71e25173b27e3139d83d,[https://photos.renthop.com/2/7150865_be3306c5...,5465,808 Columbus Avenue,11,5,5862,1583
100004,1.0,1,c3ba40552e2120b0acfc3cb5730bb2aa,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,"[Laundry In Building, Dishwasher, Hardwood Flo...",high,40.7388,6887163,-74.0018,d9039c43983f6e564b1482b273bd7b01,[https://photos.renthop.com/2/6887163_de85c427...,2850,241 W 13 Street,8,4,5806,2965
100007,1.0,1,28d9ad350afeaab8027513a3e52ac8d5,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,"[Hardwood Floors, No Fee]",low,40.7539,6888711,-73.9677,1067e078446a7897d2da493d2f741316,[https://photos.renthop.com/2/6888711_6e660cee...,3275,333 East 49th Street,3,2,1201,225
100013,1.0,4,0,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,[Pre-War],low,40.8241,6934781,-73.9493,98e13ad4b495b9613cef886d79a6291f,[https://photos.renthop.com/2/6934781_1fa4b41a...,3350,500 West 143rd Street,3,1,0,2081


### Investigating Correlations

For my first attempt I want to make a rather simple model.  I will select only a handful of features that I think are relevant (based on my domain knowldge of the problem) and any that show a high correlation to the 'interest_level'.

In [17]:
#Select Features I want
feature_cols = ['bathrooms', 'bedrooms', 'price', 'latitude', 'longitude', 'NumPhotos', 'NumFeatures', 'BuildingID', 'ManagerID']
train_features = train_df[feature_cols]

#Describe for subsets of training based on 'interest_level'
train_features[train_df['interest_level'] == 'high'].describe()

Unnamed: 0,bathrooms,bedrooms,price,latitude,longitude,NumPhotos,NumFeatures,BuildingID,ManagerID
count,3839.0,3839.0,3839.0,3839.0,3839.0,3839.0,3839.0,3839.0,3839.0
mean,1.116176,1.546496,2700.293045,40.748007,-73.964613,5.738474,5.158635,3622.783537,1741.772597
std,0.341725,1.112187,2080.554641,0.051965,0.040286,2.689384,4.167181,2280.917604,1054.697401
min,0.0,0.0,700.0,40.5758,-74.1598,0.0,0.0,0.0,11.0
25%,1.0,1.0,1850.0,40.7219,-73.99035,4.0,2.0,1618.0,763.5
50%,1.0,2.0,2400.0,40.7465,-73.9763,5.0,4.0,3640.0,1745.0
75%,1.0,2.0,3163.0,40.7738,-73.9486,7.0,8.0,5589.0,2743.0
max,4.0,5.0,111111.0,41.0868,-73.7142,26.0,32.0,7578.0,3480.0


In [18]:
train_features[train_df['interest_level'] == 'medium'].describe()

Unnamed: 0,bathrooms,bedrooms,price,latitude,longitude,NumPhotos,NumFeatures,BuildingID,ManagerID
count,11229.0,11229.0,11229.0,11229.0,11229.0,11229.0,11229.0,11229.0,11229.0
mean,1.163906,1.62205,3158.767388,40.745567,-73.965033,5.813251,5.888681,3748.008816,1805.258527
std,0.388318,1.122604,1243.693856,0.388466,0.698923,2.654043,4.108768,2270.037277,1042.308607
min,0.0,0.0,695.0,0.0,-75.1773,0.0,0.0,0.0,0.0
25%,1.0,1.0,2300.0,40.7265,-73.9918,4.0,3.0,1708.0,843.0
50%,1.0,2.0,2895.0,40.7488,-73.9781,5.0,5.0,3827.0,1816.0
75%,1.0,2.0,3650.0,40.7724,-73.9533,7.0,9.0,5718.0,2786.0
max,4.0,7.0,15000.0,44.6038,0.0,28.0,39.0,7581.0,3480.0


In [19]:
train_features[train_df['interest_level'] == 'low'].describe()

Unnamed: 0,bathrooms,bedrooms,price,latitude,longitude,NumPhotos,NumFeatures,BuildingID,ManagerID
count,34284.0,34284.0,34284.0,34284.0,34284.0,34284.0,34284.0,34284.0,34284.0
mean,1.238741,1.514759,4176.599,40.739504,-73.951667,5.524647,5.307957,3006.765459,1781.091996
std,0.544946,1.111595,26449.32,0.732933,1.355388,3.981574,3.820158,2482.302496,988.233997
min,0.0,0.0,43.0,0.0,-118.271,0.0,0.0,0.0,0.0
25%,1.0,1.0,2625.0,40.7297,-73.9918,4.0,2.0,328.0,933.0
50%,1.0,1.0,3300.0,40.7538,-73.9779,5.0,4.0,2926.5,1788.0
75%,1.0,2.0,4400.0,40.774725,-73.956,7.0,8.0,5150.0,2679.0
max,10.0,8.0,4490000.0,44.8835,0.0,68.0,31.0,7584.0,3480.0


Just a quick look at these basic descriptive statistics and it seems 'price' is very influential in the interest level.  The high interest subset has a much lower mean and median price than the rest of the data.

### Prepare Data for ML & Transform Test Features

With this smaller set of features, I want to scale it and then run it through a few classification algorithims to see what intial results I can get from a simple model. 

In order to do that I will first need to apply the same transforms to the test features.  Then I will scale the training and test features so they can be better utilized by a machine learning algorithm.

#### Apply same transforms to test features

In [23]:
#Load test data
test_df = pd.read_json('test.json')
#Create # of Photos Column
test_df['NumPhotos'] = test_df.photos.str.len()
#Create # of Features Column
test_df['NumFeatures'] = test_df.features.str.len()
#Encode manager_id -already trained, just fit
test_df['ManagerID'] = le_manager.transform(test_df['manager_id'])
#Encode building_id -already trained, just fit
test_df['BuildingID'] = le_building.transform(test_df['building_id'])

#Inspect to verify
test_df.head()

ValueError: y contains new labels: [u'0115c7abb8f1d3b50673f3af09fffd0e' u'011f9959a21777007f3cfed4d9138d15'
 u'01a202519a4a0e384ac97a1bf5f435b9' u'01b8e7749ec880170b33dd6a01216f14'
 u'01dab36b00b81746903c9577164f223a' u'02083c95193790b1acf5428e55d53947'
 u'0220efae47fbe40afbfbac95bbc7f239' u'0270de3c58a9285cef164e611aa0e79b'
 u'0279ea9e992f8108e39c9d2318dc2b98' u'028564cb4b3172a81589960afa143aac'
 u'032d78195212687c678990b2a7785b9d' u'0347071240c5d5eb0ed3c438e8f41a18'
 u'03a56d25f4fa1845d84325013b8e1cd2' u'03bb0ee3e78614a9dca4229451adfd71'
 u'04b3769d9e4676d9c22dbf8df4a63541' u'04c332720bb43160801867c75b6ba345'
 u'05604c02fd5fdf849464fa94e974faf0' u'056aea3f29ceb159af1d8edb5b4f3564'
 u'057a1cdfa6bf51d55a5baf9109521bfb' u'057e0cb0cec05f29190c9985614546f5'
 u'05959ca788e96cab3a88a7650a8cd183' u'05d06180ba2b7e04f411c4889488861a'
 u'0693b188538f4c0d32d40d49493eee24' u'07329b910729d73d5ba945f0e03e77c0'
 u'0764f425b4186cae92225bb562541456' u'078e957430b1ee48a4ba8c49b1709f4f'
 u'0839b83ec3989f907746982531e8ed47' u'0853d071f7747ec4798302c75af55a52'
 u'08a5661b55c41a019645bd242ec54238' u'08bd94fbc5552b2d02e66b16bd346f26'
 u'08fc50d1885b488811ea6e8b0cb53f12' u'091f3bcab992bf3b6ab62e3a609f9cb7'
 u'0955908eda99b0d7bd56be7586ab1a6c' u'096cdf6b1a313f43468597e18710bdea'
 u'0a33148743b6f6f28c7d5de2f8afb033' u'0a9a859ddb38b1e739ee4d7de0e6afdd'
 u'0a9ba4a32f7b01925fdecdf13a413e65' u'0b08b2dbae50f9f6eb6e9da350c07b1c'
 u'0b43a597cece3bb636a2e3604ccc4cf7' u'0b62f91cb5b16415409db1f5da45902f'
 u'0b662f3ec653ef734e7f613d20e9dd30' u'0bbf16a8bae967988546486456f7b361'
 u'0be563fa08961aa41338372a477e55ed' u'0c2f112b1fb52f888a9510a3ec924464'
 u'0ceda25392884b9459b2f62576a1a14f' u'0d2d59026316a3b27b66515160118686'
 u'0d622c04c35eae9500892eae6f060401' u'0d639f6b4eb0c964d25e8205c9ab0074'
 u'0d6f8e28ba9e45bf9007a3da478cf20e' u'0da7968d05a11dd693550798c600e53e'
 u'0e0f68a16b5bf5253dc2d4575c193082' u'0ead4ce211989abed5452c6d12cee62e'
 u'0eb915604720389709a038089ef98738' u'0efd956a1ef482b461d7896f64047f09'
 u'0fd81102356c0c1f69d4198d7594bcc4' u'0fe875fcc0f133e68772285472f39637'
 u'10249c170a5a4ceb7248546a67a184d2' u'10dec9ee95dd21d8fd2ba6542ca7dc76'
 u'11817ce55992ee93d0a9315f7909e051' u'11d246f0575cbd894eb603b8b195c24a'
 u'123687447fb39c75e17a15f08bf8baf5' u'1262a0eb387137387b41720a7b89cf3f'
 u'127645b395991da1b4571a78984fab58' u'127f76a258419b4eecf8edfff6707482'
 u'129fade0d8a3ea3858a2cf6d83c34643' u'12aa03a0335f8bf8cc6175be06c82ed8'
 u'12ee8746b2b7658a4570acfd31031f77' u'1303eda4de5ca59c6e35ddd4562a5983'
 u'130ef7693f74e1a91d17593666538fa3' u'134d0f1618c4b29df8d6d2e17995d0e1'
 u'1361abf3f9c7f3d469f8bdab5f428d34' u'13b6b00d8e5bf92dd7b32bce3102554d'
 u'13c15a78a6384521c5660e29051b3016' u'1432dd193efe7fd0031f2ab9a0465621'
 u'143640f3248aa1e97729f3f9b2a8c3fd' u'143840b407e0f87447e6b04e21374eec'
 u'14b6b420e08aa7ac2df6b55638a2b801' u'14e5733015528cf3bead6f4a3e2d5d95'
 u'15b15d04548e97fc5e8694ab09f34965' u'160166561c65c70a9c18a53b027f407a'
 u'1616e8da02f24a54d5731d1d9ba38804' u'16c62cbef11c0308c6783cdf2b0622e0'
 u'17dba57a37d0cee6f0dd02c24c73902b' u'1829268f39ef105ffc3c69836a583a2d'
 u'183d591baa7e38a9a1a4af30b267213f' u'186ba716d17d0c92c9fd41e642f2272c'
 u'191b1419bfeeea8286c2bf22bbb0618c' u'19276e720b1e49184d2db61f77030038'
 u'19d0b8bcd5af14c26d64a12dfb477995' u'19ed394c67c424350d2be32ac18acc29'
 u'1a939a7d9f8d87ef6493ce126c8a0dce' u'1aac258c92d352cd46e785b11272dab3'
 u'1af8a54fd0c2a18130fa3f53ca408395' u'1b336b9ab831fca69b521f13d40b1aa3'
 u'1b86a7cdc1f5f7f9ee3a2163dc1714ad' u'1b8b3bf93184b7b6de33140339de049f'
 u'1bfb02affc0dca2246441d275413273a' u'1c361000566a1e659098dd03669c3fb7'
 u'1c3e80a9c9d3e1556c2026b46c0d2688' u'1c739815fbf4baa995e26a76a15c1d33'
 u'1cb7c29985d4f5be2797cbd7c72f0030' u'1cef11354daece939e10040c7fbc8879'
 u'1db232ad6b533e19c8ff81726b58cc91' u'1dd27b40bccc2fe060eb780d00846a22'
 u'1eefef2b6d2675e31cca342b1159f78d' u'1f2bfc5c7e251fb2c650d2711c7c89d1'
 u'1f54ae1c64e4c1bfff6e6519db918ed3' u'1f61863e3bc8d20de94521b240aa7b82'
 u'2045b0b2db92b9daf7dcf04cfa017c9f' u'204daa6812e156915f3d614606e5e15a'
 u'2051eae7013b10dbda66fbcdd1739ae5' u'2056b2bdfa9069d7b28465a414ad3dca'
 u'2057da6e46df4419c737b3eb0a563ed1' u'20e8a822361f8addb3c0d2bd7a73b46c'
 u'211f050563ed76c42fabb20f2af8e8cb' u'212527b1b173301b7aeaa666310358cb'
 u'217992d21a1cbd4008e24726b8af7180' u'21841afdc4b66a9b142f5115ea7ebc28'
 u'218fb96d30e8d74cc50ef036a651c49a' u'21c2c4505f35d5f704438624bbed58a4'
 u'21f8c6d8c4ee7a5276ecba0060189b34' u'235390924e2a837b5cfd7b8b971468c1'
 u'238a7665e5c865c5bd87f67046c169b8' u'24452d37ae97f702052c07a3fd1a07dd'
 u'246766f43e0a8845ecda9e32ecaa54c5' u'246fc44226730951341fd699a6caa6e0'
 u'24b8e12cabb717693f94ff0dd04f01cc' u'24e83109c9b77ba1bece2a66239b88f2'
 u'251c46798211df0598691f3ace197de4' u'2533a018ffcf0c3f5450317f4d9b363b'
 u'25e5cdbb3ae80bce9608cb6d52a00abb' u'261c5727dccae342fded967114f4b721'
 u'262f6a74824e28c3165213d60ea95c5f' u'26426b4e169ee193904b14d3416751b7'
 u'266cd20d7ad5c7ff26b8fe19ea32cb88' u'2684476d22577e662cba919aa31118ed'
 u'26a2c8f454c1f60bf6767f11d0fe79d2' u'26f77295af2bfa42f38dda35b64b3bd1'
 u'270e8bcd787f96209e2bb0f66ce8197f' u'27beb38f7b41de74d39107695de55753'
 u'27c0e83b2307b122203406de2a331be0' u'27de1014084435885472a637c561bd39'
 u'27def945d4a749af825f0c0f402e76ee' u'283076d83c887e3bb29416c746f1f21c'
 u'28925f8032b474b611aed2a398a1cba6' u'28951e190adf6b1f57ce3abbdd82f3c2'
 u'28a4ed53a2c61a97689d18f4e5448932' u'28b05de60a3fb3530c38f9fae806674b'
 u'28b2735b5d6bbc6712bbed3a142a7122' u'2947e7d74de31e0b333c3da19206ca68'
 u'29922598e87657ae627fa0b7d2c79ffd' u'2a467e4cb1b901cf59ee9013099b3201'
 u'2a7365e44e01278df84c6ab3a32a2cb4' u'2add15eabecec463947b06b1ac121f3d'
 u'2b0fcf29e7c36b461938f58ff669ca12' u'2b38d4d9ecbb489cdb4d53112eadedcd'
 u'2b7c01540de48a398238c343f498aa5f' u'2bc6440c1fcd685ddfa0a244e58c0b28'
 u'2c41546526ed88da34d62ac477f4ae65' u'2c5ed81897a2033aec337a7435ce4d01'
 u'2c63a856cae946b87d5b0c3778833279' u'2d633036a1d0a77543cfdfd22e8ca62f'
 u'2d944ec1e03ebdcb260564ae6d54ea73' u'2daef8e53525becdac87369286195a13'
 u'2e275003f2c453844267e2b8bf794f90' u'2e35525ff96b21b6080e2b3a1052692a'
 u'2e5a052a9add5c8683e4b76d886589b0' u'2e8b60eb2adf00a4992609be9ed9d313'
 u'2f125ee54f7d5187d4b3e5e31ee42e77' u'2f82a35f6747a48e75fbad867f293c30'
 u'2fdd91c7a98ed94871f2162b0008ad35' u'30450eedd4c1b34fd0614e9ea5290114'
 u'307db6bdcfe043059cebfaf477073120' u'30b2031b43f33821908fb898b14c7111'
 u'30dee38f17985292d109bea9015e5e1b' u'310b11c25c4f78d2f9c886027b9eb66b'
 u'317b1a583deb8d12cf4b4d432d5a4d67' u'3198764323e3dd22f28df6164f551598'
 u'31eeb12e041f49d1a75e55b2fd87e92c' u'3207d94e8a06a26ea3c4992567ea7200'
 u'3291d8ec433feb49b61985b8047f3af3' u'32b9ef1527fda27a8289cd00535f7ba3'
 u'32f43670a4d8fdb3c79194abb07e0c03' u'334d1380c5ff41e59c2f0a3e6e0d4712'
 u'336f55d5953915007cfa287a6b8255b5' u'33905537ad237ffc5db52a5fcd0e854a'
 u'33bff098f543aceaeb27fceef2cd57f1' u'33f0c0b7d7780ae944a08a6c282282d0'
 u'343e08d949fc0a9e0811b586cad4e8ed' u'346756e42b04b4ae82a270a21561bbd5'
 u'34b4ab4e221b4ebc039f44608abe7d2f' u'354ba0a3e055a12641f240af66b6908f'
 u'35593e315bc1302026cfdbf709ebfc9b' u'357494379abde9f724c95ff77af8de0c'
 u'3575f0ff54123447d769eadeef9a7de0' u'35d2fd50f8dd7e1c137d981f0593884d'
 u'366538758502bba1c2c49b92fd314afa' u'37196cdcd149732aa7f63f5ac99714a8'
 u'376b6c33a993dd319ed1cb2fa18de156' u'377f8be37224a464a19fea2bf85281c7'
 u'37cfdd04b921dc8f00d263daabb27fe5' u'392d493e7924dea7042b89519990209d'
 u'39a44c6daff6f872345fa5660b42aed3' u'39c59f3c6ad67947eb560b36b2eb4f67'
 u'3a01e75cf5f6f8c78870b9203eb39f22' u'3a0957c1d3debdf6d1dba6e8b65c1860'
 u'3a4537910f3d87d31d5f90718c94dc51' u'3a4ac1544533e01d779e82abbe6daa24'
 u'3ac4510ae4393d49c7cc664069b3beab' u'3ae9999635529ede75033b7219085ae6'
 u'3b105fd4d7e3dec658eb405feafe63e0' u'3b5c9961d55f163cad2111506dd0bba9'
 u'3b75a7be2156913c618209c99b7d13a5' u'3bdd5c8442a7993e68e37d1ce0da6825'
 u'3c3dd2e1a40a7c2b137b88e15d6f7bd6' u'3c85465a1ca45501c7265c2ae0be10cd'
 u'3c862d3890bd91247f5bac43dd4cdd92' u'3c955be8357481ed1e34091ba616a65a'
 u'3cd6c3d4d6138b0accb8f97fab31efc8' u'3d84ad179ee8a00828f0ccb113c2fb5f'
 u'3db7d1a521bd86067ad5807f1868c62e' u'3e0c8bb01c7f8025adde469709d672c4'
 u'3e323ef7092780263ce3781716dc99f7' u'3e538d04e79e87beb1ea6bacb9c6b641'
 u'3e641113b6cde6a936670ca7fd12d253' u'3eea9029bc4bc1f984d4693e56fde17b'
 u'3f4df38174690d5f7fe46ed5bc331147' u'3faeaa7f217b5411721eae120cbc0882'
 u'3fb6615555c013aa6c108d6b8e6357d4' u'3fc063f885ca5e71cd4ed0277a282bad'
 u'4050348a3f1efeabf930dbd983ccbafc' u'406052a933b178cb91f0a1b1dd0080c9'
 u'408733cc28fa4ed1546b69edde856ab2' u'40aec709c0841aba13bd9979456aed4f'
 u'40b65f4c29d795ae9ca535a5bb0f63bf' u'40c55255e049777f6f2e584dfbad63e9'
 u'414ec106a6b86fcb151b9a7aad5e2002' u'41ed7ea47ebc5cf0d2b279a6e1a14236'
 u'41f955d8232afda2e663c3baa51f7a99' u'4218df4de335f464bc8a3106f674d7c4'
 u'421b1056a15f792a0b5f6e18ef5e66d1' u'42ac7b753ddd4fe643a688ef2b20aa7b'
 u'42adba3edbded51d1420e1c3476b5c98' u'432e74cb6f7b6a416becfb3943d102cf'
 u'435a100030acab048f38bd01ef856040' u'436068e396e299184d6fbc9af7f81916'
 u'43a0f072d2d96d740532f423446fc5d6' u'43d1e419c9b677aefca73b56ce6c44fd'
 u'43e7e0813771a32f72c6f3ea9995b726' u'44654a5318fc97b0e732ba7d110c9baf'
 u'4469386c5c131976933a57873c799387' u'4469c764cc75cd201695f9916eb251fd'
 u'449341ab11708764ed54fc591d9e9d29' u'449d84e9a7ab8ab2d68a1c79b1abe2f9'
 u'4540dfdfce04ac5f71d274e96a8e2a7e' u'4550f423d2b5ee7e1976cc47acf1103e'
 u'456d87c0eb0187ede0cbe70687b7818e' u'4665c58a10bba1ddeecc26b88549d880'
 u'466d486b5167a38fa9386b9ea0d6f571' u'46707de035121c341f67549e68b4d346'
 u'46b3fcad1a21099193e5f337cccf2225' u'471934f74dd40bf81bf7ed03da9f48eb'
 u'472106fa26cd6d713dddaf953ace0a8a' u'47754a36bcc3dabd4823097c89cdc93e'
 u'489dbda071884ecedfab779887d4c314' u'49166036181f5ea8d34730aa670115d6'
 u'4a833e0e9fe09924c6f251abf2fadca3' u'4a89e85ba944df683dd62fcee1f51d45'
 u'4abd2dfdaec1c0c115f1bd46cd7aeab7' u'4ac7d0a801591a013a8ece68bf4d50db'
 u'4bdb098d9305ef099883ddc273a72463' u'4be588b73440d4ac4dea8d2a207fb7d0'
 u'4bef72ebb9cec6a08b6a739221cec082' u'4c08514661b5a1b0faeb3c673b6f1790'
 u'4c96a8fd116ff4adb79636684a030c6d' u'4d9f15adbd1aa3dc67d3199b8855b091'
 u'4dfb03a30d8cb1714c4c1b900254da61' u'4dfbfe80a27752a8962b7b34617b4268'
 u'4e3abb21ace5c898ea3eeff6c1d7b136' u'4e9dbff3475adf80b3d7c48900edc0fe'
 u'4fd688a0b3eeecc1ec512f90a8fcc9a8' u'4fd71dca8466819f5f1d21c43ee6b851'
 u'4fdc1c66a7c0e5ecfb14213de7717d63' u'4ff759aaa2a63e01f98bec0e25485387'
 u'500b734f6db4f93b2e2acedd969fc984' u'508b10f6c438e27205dba35b4e9abb20'
 u'50c66def5f99d9035e9afb7262fe2f29' u'50c6f7ec0bf00715d2008190e56f9ecc'
 u'50dc6d3f880665bbcb0c30df15640541' u'511db27bb764e2b0573c21d75f33589e'
 u'513b775c06454f0a6a4280351d61b2c9' u'51955db16e6cb66d2560829b4273b429'
 u'520c01565b2a104599cf56e60919f167' u'524df93a878f71e8696b418812d69eca'
 u'527554cb47ae594a001b3a0eef41dd45' u'52b196381eca7709466d18adafcc334d'
 u'52d2503f076cdb2ef202e36e5f045341' u'534685b1550efe95b70bb51d9b61e40f'
 u'5381749d19c41e94735edcf619f96155' u'538a4e9606cb9a591d221283a7315a5b'
 u'5442a8928e63b5933033dc06db2dd7cb' u'54467df959d46c30b1a3be08ef1b1721'
 u'54d214f1d90b51ba468789a5c81101d1' u'54d23c0ed9f6937b9ea8e4204e9c8f49'
 u'55b43a1da692e6d6b808345d168b5938' u'55c6fdd46f3863a82b40acb441da0541'
 u'5615060b3438f0df7915403dbf81721b' u'5637d7fb8eb91aefd6a34ea0aa503c97'
 u'568e0979c0fc1b636599d15176904ccc' u'56ecc362b335bb45279fa01ce6736532'
 u'570f564156e60850f4d25ee61bf73db5' u'573365a322d9082754e5f9918884142f'
 u'576c2be711983e56beb7e734aabfd6a5' u'57802c4f410cb0616da57edf60ef727d'
 u'57b42288d5fa7243157cf7dd1a5a1daa' u'58856009e4a87cc9844fc42c279dddf8'
 u'589153893745e314977eb227aa571b55' u'58b7af211c5f45749544e2360ff8c7b2'
 u'5949e2f9c9914878355bcc3583e40b89' u'596931ba97faf831aee69b6bdd55181e'
 u'59c7296504eb0a969ee5a2f21048dec3' u'59f5e969ef21d0525c6451648ac56622'
 u'5a3a889c96ad6f6ffe696c78704bfb9d' u'5ab4b80381fcda86f842d715dc4f7bb3'
 u'5acc5f8f491104a9db22a8d5b4ca86f7' u'5af3bd1c271abd3bdc9a0cfd1e596989'
 u'5b5efb4dba6a518bf835bc7b9dbeb0b2' u'5b910923b2a2f256d9924730168c3ba5'
 u'5b9aed0dae7267e7656a62ec726c1eff' u'5bd6ec68715a25c919da380391afcbcb'
 u'5c58ff074b902df526b452d7d1b96818' u'5ccbc3e9f8b78f7f7251338a4083e396'
 u'5d283ac380f73ee1355241c5132ade45' u'5d44b2d3470cf205430d5b83f341f0f4'
 u'5d7a03646db0ba1010246910d7c891da' u'5db1783915b62f56d57a7f889b12718a'
 u'5e0d7188914476a0bbfa86a6fa61125a' u'5e802dcec3cec26092aa4658d84cf1ba'
 u'5e80d63cbd12262b76e95ae10a636019' u'5e8bb5f6eef85e87d40ecb424031a22a'
 u'5e9217d24e580d201cbd6e3a5a9686a7' u'5ec8552dc7ac7d409c8e76ba677ffbc2'
 u'5ed4f591fe6da8e85ced0cdac9d0a0b0' u'5f0ee8e6d18e11145ddfdb4c8ee72639'
 u'5fa60e4e0f8bd04a7a6fe09117bf48f0' u'5fcb42dc98fc0cc051ed403546ef057c'
 u'608388b780cc8e51aca9e0841f1eae9d' u'6106a7ddf7b399b174bc8143f3cfda0d'
 u'619e36c0938d0f5850939e67b2fac58d' u'61ea7514812b4e480b5cb1fea94fd808'
 u'63d1613e510bfb82b952024acda7e180' u'644412373f48247dbe136cedce607aab'
 u'6446fb7c3cbc5bcc99c6ba5d1b60c3d1' u'648fda2aebabd45d1391ca8942ce75b4'
 u'6527bc7924864720acf65e5108bd7524' u'657cd11ff40bd535503a9aec0dcda113'
 u'65d7fdc252d12654944ce80026efbf5b' u'665dd23ce39ba7a34b6b2edd4418a7d7'
 u'669c57ef158cc472a7a5161369e3ab76' u'66fd9c3f5e4b642350809c34d5b661c1'
 u'6724e4a1bd4d09f549739780abb86163' u'67754c942177a6e8f736ade9db382227'
 u'67855144a993c19e50da70981267a98b' u'67d2c99ed85fa12041f582e5218165cf'
 u'67fa5b9ee34a00c4265379e456c460b2' u'6814d8112b777a07339f4c27d48f028a'
 u'6857c0a54ba7b907b90c40eb0e3f9f32' u'68b93389c9372646c7387b37f5df992e'
 u'68d0819d4213a58e714e00ae396054cd' u'68e1b3ca7027e3932eb19bcee3f7b2b8'
 u'6925117212e105b7131f063a040c9fd3' u'697adfe2959cf265e3068ab4a688cc3f'
 u'6a3059d8e14528269e13838a0a676b74' u'6a3b9f9cea6e4abbbe692321ca5a72aa'
 u'6a94a5e8acbb156e488d268005a847b6' u'6ab66a6b90123e03340cd924adf03192'
 u'6ae9df4c3dce6a97438b1ecac287d620' u'6c22644ea36f9ce0f247354010b785e2'
 u'6cb52f3e6938b6abd0eed5db77b8368d' u'6cf6e9e605ebdf3fe520b0a39a921182'
 u'6d0726ffaa0a9615855879963900ea82' u'6dbd66a2b0c8633fea1e086938c54ab6'
 u'6dc993aabba2bb0fa55d0a5fef01874b' u'6e5a47081ff20597eeadd869ae9ce55f'
 u'6e67a1dc52b67686eff2f0b63933328b' u'6e7b18d0e13c7d8ab2f25f19fc206f15'
 u'6e99ac1b1483531ab42bde8f03cb13d0' u'6ecc9524b1541835dde92fb12365f975'
 u'6ed36735d64e055bbc75d65ed1257c28' u'6f3860de5ebe8ce5efc83b3412381bd5'
 u'6f8bf08d0ccdb63dcdca06fb9e4d5348' u'70e85e96079270d608397aed58004ada'
 u'71f0df61ef1de8fa5b3f0e6a5e52b023' u'7235f1e845afe8ae2eb0aafee03f4f4e'
 u'72519f14211aa323c52016870aa7708b' u'726b4c6c736933a7724ff3dc22cd9d29'
 u'728ac8d61b839c957a4ab98590a36dc5' u'72fe734731955b714a809ada63dcd7a3'
 u'731c172a86142fb3a6bde2c27fa235d1' u'734be551be60ff1a3e7c81d78f58f20b'
 u'736054adbcec1f5847bc6901bebb74ba' u'73d3fdfbb989991405841436e9300d35'
 u'73dde03e5ad5d1a0098fb876f1c45480' u'73ddfdd5b02e5c0439a305e3d8cf11a0'
 u'73e015b6d3a444f12972fa11f2fcf66f' u'756054ef020dbe662334f2236ad05e34'
 u'756c5cd2c68fc34b390f06a891379a75' u'758bcf56644c1e275dd4d4b5e9869d5c'
 u'75b1298f8897582b55b54fe7b9d4ec69' u'75cb641d0728ea83981e145b975b677f'
 u'75d0158480fc85bd749f03f036d20c0f' u'75e7f9aeb80c5b83cb6db3686247ebb3'
 u'7639603306d5141456069bffe53cd877' u'767305c4324b16bbbfaa06a8173a80b9'
 u'76b84a63a0ef597108c88429aecc3d76' u'76e1555858c4731e2c2b8475b619da85'
 u'7716e2bbe241c47e533d981dc86ca9e0' u'773c5998dacd0d4d07794deb1e736b6c'
 u'775fa4726d157f99663484edea294f8d' u'7760a2b248da798020e02e22592d780b'
 u'7767134fe3048e7d6f7d4741b5ca7057' u'77d04b99520d657d67200237bfc54541'
 u'792f63afa6e10b07b53a4448758f3356' u'79629ccbd79ac4780c7ded1e6bb9acca'
 u'79810e1b0e29e4eed2efe17453e4e1ca' u'799d6cf108cb6350e10eb2a4d77dc075'
 u'7a35b9c5f2b2f7e8049b34f98dc7901c' u'7a91dcd208e00282473d1c0cdc42dade'
 u'7be3f061fa36d2e4d59d66218bd7da78' u'7bfd6a691494b487bf1efe17f44e98ad'
 u'7c8454aab6021ac9b1057bccc8ba8925' u'7cead8c8db7cedc3014c14f0c2bf9931'
 u'7d1a14cb8338573ea0640c165b5e323a' u'7d66b18d2fb0d0025bb1f79ad02deaf2'
 u'7e35482ea6365917c3af08c225f73695' u'7e53d70e9d3788ed73f6f510e6e4556b'
 u'7e799f6e0e9f6ebe6b54f53cee213636' u'7e86dbe05a9f6a73de79540f2bcac77b'
 u'7eee9863fbaf84352bc302574fed97d0' u'7f5372643721147bd9c6c2f2476000b0'
 u'7fa38ebfd60d9c01f33301d5dfe23c39' u'7fd644fe7337e7398370e111f39fcf56'
 u'80c6a5c1a5db29a6e723cd366423f6fb' u'81212ad0d5897b867381b66e192d7f24'
 u'8152a75d88728f18be677f6f57e31337' u'8162a23118381ac2c39d99fd67987866'
 u'821ddd7fea3059472e9513e724d4babe' u'8221e41237bbd14f529ffa463d7b8921'
 u'8266917ea31e48e462e4fb06cbc01732' u'83571108b82eebf2375c4db88ccac83e'
 u'8374793571663ffe6b86ac1576c46649' u'83f7213e65b92814a8a92731228e03ee'
 u'8488075188c3346191c65ebd97a9075b' u'84a791745d2377d963f05ebf45bee94b'
 u'84fd3417b7616ccb413b468d915f990b' u'85050a3feb38d3a42773bf40d14b0c12'
 u'850aa2c690c8907f7b4e41a232bd3e5b' u'855b0c0ec885ab3529b9b67d2d96a999'
 u'857168cab798cf7f08c3f7068bfb8ab1' u'85aec4fedd882ddff4888e4cc3f8821d'
 u'8617ab8786ff869ee79efae03dce5c97' u'8639fa3ee7e992840d2cf31501682d17'
 u'8674d945b4c26fecf2482fcc5e13333b' u'869c61946db6dd1a31c175182dc34c63'
 u'86a04d3ee3035f79e3bf2266a09fe20d' u'86abed87ba288e3e073c1a0935007b5a'
 u'86cf007b2625e5f2a29cd9e3c90c561b' u'87091284e5082e6939df72fb46703505'
 u'8710ef74f21ac7d71a1a07f1fb3a7f1c' u'871186e8ffead1f4e78746ac8d4d9d04'
 u'875fc797b7b8751cb78c469061d91fd1' u'8764066afda737c2eedee9aaa54d7a6d'
 u'876e97d32cd9eecad28acab921c7f3d5' u'87bdcc0659092197eb0a6f48a97af4c6'
 u'8801420b36ef120f965eddd317e7dff6' u'8844f37c1261c107bbe4aac3ec2f71ad'
 u'8988e6a97480716a192bd6247938679a' u'898b57d147110f42e6b8b2f263030067'
 u'89bad25463e719e25ffb8691d1eed4f4' u'8a870f1ac93f1c72cf137a3fd0b27590'
 u'8aba4f8fb138a284336621ad6c95f20c' u'8ace79d7302ef10177b5518e49cf8e71'
 u'8afba12ff986f83a735ed7b65c3f312c' u'8b459b50fdbe1461f318bc251cdcde68'
 u'8b48583df07d0d90be17664949c86dd5' u'8b49a6dd6ea14c3da98db80fde8deb8d'
 u'8b9baa46a5d87c440d8c636e554d2c74' u'8bb5d03daee3d7a6e36039489a0d227a'
 u'8bbd1ca5130e2959fb907ea41d840a39' u'8bdfeda54f2fc97b78b51492c14ff6b5'
 u'8c24eec279eb85913236b2a51f117a54' u'8c6ebbd6c31ad5fc0c61a300437d2c12'
 u'8c91541ceac67ec47e455be621acb7d1' u'8cb4611ecf4870f03adb397c94b626f6'
 u'8cbbb094bfda1177c2939c4eb92c37b0' u'8e1f1e03dbfc598415c137dceee7adca'
 u'8eadb78d65834ee2d7119dc6d02dd1d4' u'8f4382215598dc2750bed01da7fa62e6'
 u'90137e9ff97715fe6209bfbdea7c611e' u'9039c53e24b2807159260638932846c0'
 u'908a7f27867feddc19356ec76e152ac7' u'908e7bd4d57d0fecf4d22ee6934e25a7'
 u'90abdd37f53239eafbe462974c1fd49a' u'90ac709e7a7a48a965e6885064998b20'
 u'91722cd421546517a05fdcb93ad48abb' u'91b24c6215794d06d05eea62e6d24e5d'
 u'91fc1d3eadd4c2d21f20f2f365adea3e' u'923237d38b87b0ebcc811bfd5aeda955'
 u'92c211e6c722434772c1cada69f7a555' u'92caa6e3adfc99d20933d23245a18c4c'
 u'92e3caa232f6af2d47975721ff7386d6' u'92f1712ce0504ecb549308061642d468'
 u'9310630d76ea82586562ff7203fba4f8' u'932be8be3c97afb4c86db2914a8b4891'
 u'9358c447f0af7533eb787823fc464a61' u'935f53ef03a5e8f3deef789843df0954'
 u'937ac689db4bcdb783e149e27c6fb126' u'93c1d2f2e35ecb709590cf41074ae523'
 u'9403e72c44646f86d8ea7ab0f1d3843a' u'947ae5d7d68d4e3dd500e60563065da7'
 u'9516493e0d4d57bf50afc677b53bf381' u'958525f99e0467640e22263e86a638e0'
 u'958f3d88cad7504ffb327bcb9b4ca3c7' u'95ba6a150f85044803b97fdff53a7668'
 u'9695af0aacc1a9a48564e738d2e37c5f' u'96aa764b9561611fb599277a374d4840'
 u'96ed7d59fc44c6b2dabbfa2aa786a154' u'971575ceff0091ebc16530d2ab940558'
 u'97376d8766343860431d390c1239e4d6' u'974ec58e06e27d205b2b7706dd96823f'
 u'9779376ab1d9850433800ad7ad55dbf0' u'980088a28ecb18a93406c6ef8599de24'
 u'98181b2f42441416c352761f2f40c6b0' u'98194694313f4fc424ecd4ba4df48a7e'
 u'983e6a7878ba717555b90d718b827864' u'986e734a5dff26009b5a0f57c650046a'
 u'9878f937fdf8cc8391676237387dab7a' u'98f66fb9a83803a14ba4e82828e4dfe4'
 u'991fe135841bfc07e413bff4d5204231' u'9947bf1837b4c739a8c3859df9f65e34'
 u'995de3e5e88a4e244dde2c50f505fd04' u'99a8042aa61625724a542f19ee32af12'
 u'99af44a8a6ab4219e02898ca0af0d460' u'99f0e9e1427678b13fd5299419eaed72'
 u'9a0eb2faa3be79953441098c66251179' u'9a1bb72686badc66c332b12cee3b8263'
 u'9a501754a64b72840c59709f351a04db' u'9b03a7df976695e0d0cd2393300134df'
 u'9b42a29a05618c3ecd5752f9456ccd4b' u'9b4c1bc2d92c566075d3ce7fa6633454'
 u'9b7e4e7938f33e34f25f1589cf8d0d90' u'9bb78c4be3d09283330ba7aa56561697'
 u'9c070a938afa8dab6d23ec9360f32e52' u'9c13220dfcef066dc82d596d1498b0d2'
 u'9c3b8fbd481a2b2a208082fb78ff4e92' u'9c53533af139d4100645b7235ac335d1'
 u'9c656ad77d672cccc9edbee3d6df6ce6' u'9cada6cd517ae94265a83fbf95c34c9a'
 u'9cc9d6d0644b93d7fdd5a72b417b60ac' u'9cd75800c21a896f726da6bdc147ac51'
 u'9cd9c2900979504f2c7318e7ab508cad' u'9d2e1d11d5a4b6280eb3c81102afa268'
 u'9d2f213ebc85d7ff65ee477facd14a58' u'9d4eaa32ff338e8f291470e941bc3dad'
 u'9d77074c7e4a9f91be26083ae463c425' u'9d8ee61aee91f6315551afc064ca5892'
 u'9d99f8201a1c12eba6857159f371fa9e' u'9dc60ff2a9e42481ccf78d77af5d4331'
 u'9dc9233172be1c747fa1eb6159dd70c3' u'9e1f94639d10e290656072aa86cbb2eb'
 u'9e3a7f42e30342b4aa5a98ef58e2425e' u'9e3e6b975123a2f185900e80d9598e31'
 u'9f7ca8810dfbc0f1006b254e250e4369' u'9fcedf0b1d5fc737f4c6cf1e8c53c60c'
 u'a04a771c59a25bf52474f18670cbda84' u'a12a5a5551ce1949d9c9d495f1c2f6f6'
 u'a1385269e4ce491734f61f942a593397' u'a2433c9ad3866e0280933470226f84f3'
 u'a2507bd9cbe1aca47bd84dbdfc732b24' u'a28186d52f8173af13a69a24d1a28823'
 u'a2ddd91ba2397810199441568f764302' u'a2ffd8e2938bb287ca93a76e491dd750'
 u'a3241a6863977d44add9dbfb347fa21d' u'a35645d1c4ccb4ac84cf055f312cfc00'
 u'a38448d28889bb6b28f31ebcb3f15ab1' u'a4055473844e096bcfa94a661b83c930'
 u'a423512036b1273100fb07983f3d3f77' u'a4a1ae265f799f28865eed812d783d73'
 u'a4ccc42705a4efcdd4a3ad37f3d2ba24' u'a55b89419bbbaf8fc28fd4ec95ad5898'
 u'a55cd2612b069833af2088d0de1d9fe0' u'a5c161b782e8679fedaaaf5052eb7fcb'
 u'a5d8f11d91ddeaa57b6e64b5bd2e0fbb' u'a5f76a28f4f55cb82353688d7f74cacc'
 u'a5f82fc3bb9b7186652f7f47f9f4bed2' u'a61eb842510f8124bbef323dcd4a34d0'
 u'a74f2948a813316eae6e535ec49098d8' u'a76dd59e57d8d812aac2c27a62fe9307'
 u'a780febaff3ca14ca2884b4ee652ea9d' u'a84ba63e529f65762fe2e359d087ffa1'
 u'a8551321b5b57a06218deeed3231bacf' u'a93a0cfd777d13dc94f6fdc1da7d6d12'
 u'a97c3ce331300fa6b95d97b4ece44542' u'a9a901ffa4e1904715461202d516e153'
 u'aa71d46809b0168ae3f52660115795b1' u'aa840f8dd5a59e512dba72816fc48eaf'
 u'aa9ab83cf242846df357aac619de33a9' u'aac107c3669f2178af554d631c36697a'
 u'aae1a1102cba3574b524266239f93c5a' u'ab01d605707d7910288be73b9756d3aa'
 u'ab36d2891f5fb9c78a73cf03cfacb23c' u'ab4b5d685b91957ec80b3f9ce5debfa8'
 u'ab7a97154439a527a8a1d3967986f56b' u'abc0fdd481e004865623d7d8a3abef7e'
 u'abef0101508e288a25519adef93536eb' u'ac08dccd9ab49401213e595c9aba42f4'
 u'ac78748388ef396bed82fef6a54867e1' u'ac82d15772fdb234678ab7e91ccf53c1'
 u'ad98d8533e88823fe715b8f5ad8e3e26' u'adac6814289b888b99b535b9934b1bce'
 u'adb276a3fcfabfb2838ab915a4ced3a1' u'ade007986bc410461f41df122befe094'
 u'ae683573a070ba23c137df0becc4486e' u'ae7d130947f07c0c119821205fab1177'
 u'aecf82f2694b5097a6d9164c6d38e235' u'aed0d25675bcee271eb5cd2d5826b695'
 u'aeda202ad8dbc3e66cd0bd898c07a1e7' u'aedb4da641a63f4715ca86d4031b621a'
 u'aee781fcc0502dd77e0f2eacc8ef6ef6' u'af610f745661f7090237639ca04043b3'
 u'b03c5dc904110934ddca6e5c1a1dbae1' u'b07d2874c01373acabe406e06f862e44'
 u'b085dbcf5e9c4c0c05f44605b8557fb9' u'b11371368da5ab5d21a1c0473377d94d'
 u'b11828abcd6a697c6b840fd9f3336fe9' u'b12ca4f94e847ce59fea9034ad315e1c'
 u'b1a1f5e1c098d50cd0aa22fc4f998efb' u'b1ed0dcae3cba8e23af548073796f99c'
 u'b25e9d309adf4f0f04c73b2f2e49aaec' u'b26a22d85206b958fa1510791de570cd'
 u'b285956f676f695033c872f45e2b295f' u'b358e9aff7c115e9fd994b76e184d904'
 u'b3639ca9f27a3b6771daec100130b123' u'b3a3aae7392093169b83436f8491adc0'
 u'b3ba7015052434969e709a1e6ad5cdb2' u'b3d28a9c173ee265446e08193958aa9e'
 u'b3eddfdc768e405ffaa8e4d847f6be82' u'b43c6b0d1866a8b57305776066f257c9'
 u'b43d432cacbd03f35ddfd512bb0fe773' u'b4ddb2e1f39a27fcff9220274a267018'
 u'b5469dfdc4505acc8a6784cc38636f55' u'b54d04dcaac5c760a5914401145d714b'
 u'b5ea85fc1e840f97116e94359538feda' u'b5f58e012e3e7d5a9225d2eadce65fa2'
 u'b607949f721881bb08a31a615b94541f' u'b616d1bec7e0bb2457d19d144209752f'
 u'b648ad312767d16d4620d81c9150c267' u'b6902d573ca864c20ab2a6fd72cd2a00'
 u'b70160cda455362c8f4804c2b507d39e' u'b83c029c3cfc0a010155ea67d0157fd5'
 u'b898b0fa027d2945ed0e789781b83e22' u'b8db1eaabe00c9cc383a146a4748feb6'
 u'b95255b8078ae4f68965ae57c3f35092' u'b95521727bbfd9cdcd5c83b9d558901f'
 u'b992175c3a58b6f674e929d1618d6a36' u'b9c9bd09c2c3e6b24294210d48637b71'
 u'ba8b413aa05ca3e8180f1a2d61ae6244' u'bb2c9cd86840e7dca41ceb3854efe88e'
 u'bbce5a5a51916fab3f573c9c5cde694d' u'bc58ee86be7c8b1ecbcc6e56d56ed0bf'
 u'bc8627082152796c69bfefdfdd72dba3' u'bd7d304a6e17030f67a240f7019c6166'
 u'bd8bc88fc48105e3a8c8f58b82479d73' u'bdc65c5c352209c4e597cba4904e7bc1'
 u'bdd01850f5d97cdf9425db290ef9fd8a' u'bdd0d7591c2d5cfb59d0ca5d9ec3bf6e'
 u'bde764444d7a14d317cda35d404f9e8d' u'bf224df219131a02492a7df88fefbdeb'
 u'bf577f91582d75ad327e8858dc76e428' u'bf5a09946730410abc226ef26472ca69'
 u'bf8588d52844a13f3c92d9510c66ab87' u'bfc686eb1bae65884686be12b36a9b70'
 u'c05c3ccc3c59c868261cee1f11815290' u'c09bbe926d247aad4856fb44169f06f2'
 u'c1036a834fe6b3538006c33f06093aed' u'c104df74a9a69b5483ee3ab110c6a63c'
 u'c1cac7b0359ac577ce708084083c2dc1' u'c2034dea1a8320f85f0c6c8e48114a87'
 u'c216c3cdec2af57a57ef8ac80368d6d5' u'c21abd7b04726677db5882cc33a1597f'
 u'c22211124b96b333461c6278f79b3b9c' u'c2634c41b8e689bf582358badf69055f'
 u'c26c70107c040df5819fcc1b82b541ae' u'c2c591f54d120bd812ac125aacffd494'
 u'c2cb2930a0bf742a25b1640c4d89ded7' u'c2fc97b522bc2d69a9ec8da64be29214'
 u'c3154f4dd4f561f11fae79128a383941' u'c33191421e613414ae75e61da036ee9f'
 u'c33780b4c00782d97938327172d2937b' u'c44460689808552b3cf88bda8961202f'
 u'c48479f7581f3616db6d6bc35fdaff54' u'c49a322db44affee7f2d3d2aa50bdf46'
 u'c5608634c396fbd6b5867cd46670a58f' u'c5e1ec10027918036efef5b935ddfd0b'
 u'c61f9c96ebe370541fecee1b5ed45d12' u'c6df7ec8c4b64c543ea6f761973988fe'
 u'c6f9cfc3a5bc53b60e1b9f493eede474' u'c7cebcb50de7b294eaf7c1ff54d0ec6d'
 u'c7dd07fe06627f8f0ab90c630abe3aa9' u'c80901e684c78f3e8abb867a8b4c5b4b'
 u'c8b56795fb3c5dfa0e5b68b618988288' u'c92113432c225d91221f3d65eb2a2d5b'
 u'c96f39e319536f6a6ce7e2dbbbbd345f' u'ca02ddabf704a9944f568026c7224e8f'
 u'ca463b68261185673d69e424d43768ed' u'ca4742606e29aa58c65ac7ecfbd6b9df'
 u'ca685b3e987e5ba2f5783106862ed426' u'cab83161423d1476d49661230b8f05aa'
 u'cafc039caabea1284a818bcd156f47bb' u'cafdb10e8707e46c0c6e56cb08338103'
 u'cb08b5236556ce441b389d3938541abb' u'cb21cdcba45913c694c9d1da44f54ca7'
 u'cb2d4a3415d1660a8837377190d8578d' u'cb38ee8c17e61e76801a82a7e376623c'
 u'cb4a00d3144e6a7fa88dbf6aabd4ce28' u'cb726795b6f6b9755f0d2123181ece7a'
 u'cbcdfb3f166c581e781242c59bbdafb9' u'cbda865c6a0aa85013aa8dfcbf8a9f26'
 u'cceda31510723a07967e7834eddccd73' u'cd0287ca89a7623922bf52faf1ae2cd9'
 u'cd5a2c5f9b52e116b0f3a68167681438' u'cd9d1ba33fced04eacbb4c8c4be9b1d0'
 u'ce76c58710c7948e9906360383d1b9ca' u'ce9465a6203a7afd14519cd73252e1dc'
 u'ceb957627050d59fdb0730c4625c0637' u'cf1029314f0cf5ea79b941984524aa5c'
 u'cf2c9589577ad50b5946c7fbddfe6315' u'cf3161c0eb6572bfd0cbd965c2d88568'
 u'd02d8acf7209d57254fbfa4c88fe5737' u'd13298426c31050df931fb52c5b87f8e'
 u'd1ace17bf184cb115ef791f9e111acfa' u'd1f1b8bd2001bb7c094996eb9cb860b0'
 u'd2528c9ccf535e55700633f44c1d2153' u'd262211f588659eb0822cee80a8dceb5'
 u'd2b74edd1e056f0c76b8ce12bcf5dfab' u'd43d385a1e2bfada5d4f88c4b8b34f54'
 u'd44b1bbd66bd8af74c0c5f40d1146796' u'd4f711cce34afabea5fdcc5f6cb5d8d4'
 u'd50067f27d87ee1ee9c6ecf07291c72d' u'd5bc047acb1f014e69660128b59ef410'
 u'd5d6622380aac0d152c09e8c041e0981' u'd5f15ff2fe422aa5626583d7f10c7068'
 u'd5ff01e18f41c7f864389071dbbb251c' u'd66bfec42def153f8e0628576a566ca0'
 u'd6cfe704b06dfda53b30dc1d71cf70d2' u'd6d710d38b6b5e889686210a531f2997'
 u'd7b49ee57aca5f1da5ff64504af34ce2' u'd7b8f9d930f023b7dd1c9c91bdd88884'
 u'd7bfcc3f5d7ec344755cb9e61fca38a2' u'd7c99cd3b6464d06c56c4b2d2909201c'
 u'd7f40bcd73a168318fe9a0bd6034e8be' u'd7faa77076a80ea91a581abf6b76160e'
 u'd8075fd2161931f9438c97e93863a240' u'd82f4984ac5aa451db0ad5601ab14583'
 u'd83d5e1e44f1c7b2ec04a37db8240407' u'd865cc9d92483fa9c83b534b01d4beac'
 u'd8d9f5c98d2a6a998960d86f0b54df25' u'd8e6a9957ff0a4bf0ccb9cff02b42cb8'
 u'd9247f3cce96f304f93f9f6684cefec2' u'd93abffc0bfeddde0260f647030040ee'
 u'd963aebe14c81922133dd951bb2d64dd' u'da2d5dc86e36b81fe3b35f99b02b61d4'
 u'daa3372e048a25b48d3ec81c592a3db2' u'daa78e1bc8b6233a9962df5f2c888aea'
 u'dab1457a8dd4ce8aa92e8f06e0b1c723' u'dbab828631a0785bd9bbf23dafb123d6'
 u'dbee96157ba90af940e0cefbe770fb7a' u'dc2111613f42c8caf830c98e57b24da0'
 u'dc2334bae77a743fb87cd94064f83f62' u'dc522a62fce65ebd98962f8e0014722a'
 u'dc76397e8371e14319f698767eb492ee' u'dcaed47d0410a60f1793edbae7976a95'
 u'dcbd8524519afd72abc05bb1b0c858fa' u'dcc52a3aded026b21794f6ef85e55025'
 u'dcc702156b13b63ca86b80a7d9602aeb' u'dd0c799d62e94738b49d15593cee449f'
 u'dd2b79da328f8800d4d262a58380f2c0' u'dd340b77a87fc9d0a5ace83f058e2527'
 u'dd4a1039345ec3f48eb95551bf655f02' u'dd519f4699cc1aa9aef077c438b98f22'
 u'dd81df755bbc1b0d58b4397b09e38c7f' u'dde0d3166e17deb6cc0936808f66b29a'
 u'def388f35254065c196a177251ce0151' u'df1b52e3cb6a0943e203f51e81f57af8'
 u'df79f1df511bb9f333ef05209fe989a6' u'dfdfb24d15dd3fd97b3e0d9cb7cc27ec'
 u'e020786536bff420c068ca511b095806' u'e02c01aec10e2c3de981163df560b370'
 u'e0b58945fdf42866278744687b39c438' u'e1486ec09e1d3d78533ef3a2c3bc92f9'
 u'e154b8236eea72f76df025ce5fa80baf' u'e15d58a1453fd91a5ecfad532afdd0a6'
 u'e17246ef2cd23b5754a4436e65e3bc46' u'e17f5eb2e3794f6bad8b19fe24272b3d'
 u'e17fc8348b1c181746010ca970639462' u'e1b2635ca9888c252d0358a6bcd671c7'
 u'e25d3a24a697a1b0dd021f846ad9d9fa' u'e2ad2dc4d2b7c96a69b94ab524cca498'
 u'e2d2576d07efdc1ed08cca7c43368f37' u'e2ebaec913fc401c37f10f646f948e3c'
 u'e3a2a5f2625a4e357f9a1b93fd80d379' u'e3b18231fa0e8c1976a487102ec2ab0b'
 u'e3db24f11376d8ad0d0d3904ac285608' u'e3dfda46dde3481d5c86e905b6e33dd6'
 u'e412df6c967b338e29d1c43529e87f19' u'e43786df278d5253b013b91b24217312'
 u'e49da2b83fef7fd5d93262eaecd8995b' u'e5203ee354e5bb06f52b0eabe97f23ff'
 u'e521e169f0812b42a0a6f34e540cadb4' u'e547ef4fd36c36604b5dbf6610be281d'
 u'e5545030d9a2374cdfc4754414579da5' u'e5843396bfc5df4affc9fa0daacec80d'
 u'e58e6e64ef6292e8cbeaab71dc7d003b' u'e5d4690737962906e33a70cf23bd04fd'
 u'e6d5fb6534dafd6565ba7776630a5de1' u'e6ec2e035beef871012b868a887d625b'
 u'e6fa02d68f17f25edbd537f1053eaed9' u'e781f6911a3794c7a0e258864aaac88e'
 u'e78d1b64915b3b0fe022a126be5c4115' u'e7f738fb726be461675e217af0d35edc'
 u'e97ebde5869c52462de0a8887846df10' u'ea1ca0257448aabd6f18ff12537b720e'
 u'ea96f9144ffbbb093906babd01ee818e' u'eaa1bc4d15d8135d72e325e3d3d55cb6'
 u'eab50d7b5731f4fa34b6938c149239e9' u'eacf1eafa6bbcf5c0f728c75282fdf3b'
 u'ead4f395a2c251bda9b02043f5f21234' u'eadd1325f94bd5e5c503d6a621c1c1a6'
 u'eb27ce96cedd6c6d5e3e081fdb6554ad' u'eb7af1ef5716067aba021e5712435d61'
 u'eb7d6ca6d330781ed0e99822797cca9e' u'eb96f60605896e7b4c899e5bbaf7aa60'
 u'ebd67869b43e4fae2c2895197c549a29' u'ec4f222a5bda84a4dd65b5e9d20f82a1'
 u'ec538f404208e985c179c50f2ea36871' u'ec620629c3861b3f825c6019fddce133'
 u'ec96814f5bd6b9a584f0b1ba4c8b53ad' u'ecb6ed4c3cc7699ecf3b8066dcd63cc9'
 u'ed3e904bb5b365e2f2c72c9ed70c7de2' u'ed49f49f3b7ae4c4847a8af799053ef5'
 u'ed78b4d27e568185ede7e9a98130ed21' u'ed8367b3745aa86cc28ccc82d9275753'
 u'ed8b245943c843b3ada1e39e5f9d5cc5' u'ed9bd2dca13b8f28c0a6bddc7859d32e'
 u'edb13d4e7d31c5182593a2e192a347e3' u'edcebc194bede34a0466ca1f21a89956'
 u'ee0fd16067ce96d8b8421577df650fe7' u'ee3cd472089ffc87e714662505ef1ec8'
 u'ee5863a02b16a3a9b1971c49b47df934' u'ee736eabf554a0948bb9ac2555a9f94f'
 u'eeca33e934e14b65f93a55fa64403536' u'eee26cac5b28cda56141bc9a1be7ff87'
 u'ef3dc834bca086c506452157c53e3d60' u'ef5d1b950d71247d322b390f1019e8e2'
 u'ef9b7d8546ffe996122f4d8a608e2772' u'f00fdfb4f7402b9dd2f0a7cb8d12bc13'
 u'f0477eae3130e3e19633aa6352e1fff4' u'f09279e1716c474a6783788fa70a09fd'
 u'f148a69c5a1bd8dee985e24ae8188e7f' u'f20209d3a7487a25cc70223b8b00fbdc'
 u'f25c61e11c3df37ae1266641c8c5067e' u'f2f8f992df7dc5b5eca7358399daf3c1'
 u'f2fc7f38cabf2af18311f5022a257c22' u'f349e0af7774d3afc1042db863ee2af6'
 u'f380858d8202b1ad6917ac1790eba87a' u'f3a73ddc26accc039aed9f20276ecd36'
 u'f48a82480915518dc6c8324c0a60a5e2' u'f4bd7455ae08a861a3c265b18235a63f'
 u'f50cb21490ec0c6027c5272ce67ae7cc' u'f5207a383825fe5cab97c627ac72729a'
 u'f543bf0cfd06ac5174cb42da73ce16ae' u'f5450f632e45db53dd52a28b0a037f24'
 u'f5dd92856afd876eb56238e8c8533fb6' u'f5e19ded3ecc4da04b44aebea81bc2c0'
 u'f64126e0fd2d593730266e245271c476' u'f775e7c4d1fc52edde330a17cc186c44'
 u'f77f0681e7b36a62f69bae06436ec6cc' u'f782957562386f0dc50af837ab9bd737'
 u'f7a6fb34b7029d81e4c8954a552e3523' u'f7e00391086319de04fcec17502fe9dc'
 u'f881c49eecc8f22856c3841a3d2dfd1b' u'f8d7032889cd2ed2e4d7e24a5dd6680b'
 u'f93a9f152ebb6eb8b1dbeef4759e08c1' u'f9a9bc5966077ddce251109ab1cfcd66'
 u'fa13b2b7c5f8380072539bf8f7e48ca7' u'fa2235130ce8fdc4494c89a3e6935c93'
 u'fa50b1c170f1729593e6026be79fa647' u'fa5b99fadb4ac7be5bc1bc0c07ad6016'
 u'fac1162a15cf052d953f72afc527df8a' u'fb56e251763c7e7dd7da5a8c972a3d3c'
 u'fbce5c35e4c4386a5f6b5c212204aa7b' u'fc34c21444cbe033d9ebd010db9d7438'
 u'fc433e8e9fd5da960974b46c5d68ce38' u'fc499cd8c8ba2b39dfef4cb5489a69f5'
 u'fcbfeab90a035f46e31b2484ca637471' u'fcfb27b1cd5786fc85ff79715949a3d6'
 u'fd49f8c2891a8a2a3bf4d404f75fd5a2' u'fd7c724f84bf0141ff286db6eb0627f3'
 u'fdd0587fa2d1063093f34b36a7189063' u'fdd2d2b034f2841f10c2a7c17ee45487'
 u'fe17afcdc6302e7d8a854df2ecd82270' u'fef8a58456972f8ed571945efcdbfa34'
 u'feff881a1d5e2772fb5462f41e8d1213' u'ff202918c045777a17e37ccac75badef'
 u'ff2ba6c711e660e119674b8d2ca2977a' u'ff927f945609736b00378a5066c4063e'
 u'ffc6f1453ee006d834a0447b7c8ee791' u'fffdb8b4c7b05e16f7b4ef10c2640836']

#### Fixing this Encoding Error

This error is basically saying that in the test set there are new buildings/ managers that were not in the training set and therefore not encoded.  To fix this we need to combine these columns from the train and test set and fit our encoder to that set.  Then apply the transform to both the train and test set.

In [29]:
#Combine the train and test columns
manager_combo = train_df['manager_id'].append(test_df['manager_id'])
building_combo = train_df['building_id'].append(test_df['building_id'])

#Encode building_id
le_building = LabelEncoder()
le_building.fit(building_combo)
#Transform Train & Test set
train_df['BuildingID'] = le_building.transform(train_df['building_id'])
test_df['BuildingID'] = le_building.transform(test_df['building_id'])

#Encode manager_id
le_manager = LabelEncoder()
le_manager.fit(manager_combo)
#Transform Train & Test set
train_df['ManagerID'] = le_manager.transform(train_df['manager_id'])
test_df['ManagerID'] = le_manager.transform(test_df['manager_id'])

#Inspect to verify
train_df.head()

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,interest_level,latitude,listing_id,longitude,manager_id,photos,price,street_address,NumPhotos,NumFeatures,BuildingID,ManagerID,IL
10,1.5,3,53a5b119ba8f7b61d4e010512e0dfc85,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,[],medium,40.7145,7211212,-73.9425,5ba989232d0489da1b5f2c45f6688adc,[https://photos.renthop.com/2/7211212_1ed4542e...,3000,792 Metropolitan Avenue,5,0,3797,1568,2
10000,1.0,2,c5c8a357cba207596b04d1afd1e4f130,2016-06-12 12:19:27,,Columbus Avenue,"[Doorman, Elevator, Fitness Center, Cats Allow...",low,40.7947,7150865,-73.9667,7533621a882f71e25173b27e3139d83d,[https://photos.renthop.com/2/7150865_be3306c5...,5465,808 Columbus Avenue,11,5,8986,1988,1
100004,1.0,1,c3ba40552e2120b0acfc3cb5730bb2aa,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,"[Laundry In Building, Dishwasher, Hardwood Flo...",high,40.7388,6887163,-74.0018,d9039c43983f6e564b1482b273bd7b01,[https://photos.renthop.com/2/6887163_de85c427...,2850,241 W 13 Street,8,4,8889,3733,0
100007,1.0,1,28d9ad350afeaab8027513a3e52ac8d5,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,"[Hardwood Floors, No Fee]",low,40.7539,6888711,-73.9677,1067e078446a7897d2da493d2f741316,[https://photos.renthop.com/2/6888711_6e660cee...,3275,333 East 49th Street,3,2,1848,282,1
100013,1.0,4,0,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,[Pre-War],low,40.8241,6934781,-73.9493,98e13ad4b495b9613cef886d79a6291f,[https://photos.renthop.com/2/6934781_1fa4b41a...,3350,500 West 143rd Street,3,1,0,2618,1


In [30]:
from sklearn.preprocessing import StandardScaler

#Scale & Set Feature Values
X_train = StandardScaler().fit_transform(train_features.values)
X_test = StandardScaler().fit_transform(test_df[feature_cols].values)


#Encode 'interest_level' to numerical
le_interest = LabelEncoder()
train_df['IL'] = le_interest.fit_transform(train_df['interest_level'])
#Set Train Y
Y = train_df['IL'].values
#Inspect to verify
Y [:10]

array([2, 1, 0, 1, 1, 2, 1, 1, 2, 1], dtype=int64)

In [46]:
#Get Label encodings for reference later
le_interest.classes_

array([u'high', u'low', u'medium'], dtype=object)

### Test a few ML Algorithims

Now the data is prepared to test a few different ML approaches.  Using cross-validation on a smaller set of the training data I will test a few different models.  Since the competition is being evaluated by using log loss, we will set the cross validation scorer in sklearn to use that method as well.

#### Naive Bayes

In [32]:
from sklearn.model_selection import KFold, cross_val_score
from sklearn.naive_bayes import GaussianNB

#Initialize Model
nb_model = GaussianNB()
#Create KFold
kfold = KFold(n_splits=7, random_state=5)
cross_val_results = cross_val_score(nb_model, X_train, Y, cv=kfold, scoring='neg_log_loss')
print cross_val_results.mean()

-4.57676082504


#### Logistic Regression

In [33]:
from sklearn.linear_model import LogisticRegression

#Initialize Model
log_model = LogisticRegression()
#Create KFold
kfold = KFold(n_splits=7, random_state=5)
cross_val_results = cross_val_score(log_model, X_train, Y, cv=kfold, scoring='neg_log_loss')
print cross_val_results.mean()

  np.exp(prob, prob)


-0.71154068797


### Train Model & Make First Submission


In [36]:
#Logistic Regression
log_model = LogisticRegression()
log_model.fit(X_train, Y)
prediction_probabilites = log_model.predict_proba(X_test)
prediction_probabilites[:10, :]

array([[  1.54128357e-01,   4.96697636e-01,   3.49174007e-01],
       [  1.74563954e-01,   5.10360311e-01,   3.15075735e-01],
       [  2.61561754e-02,   8.60331585e-01,   1.13512239e-01],
       [  1.54292599e-01,   4.59562735e-01,   3.86144666e-01],
       [  8.69627589e-03,   8.72590474e-01,   1.18713251e-01],
       [  5.54461164e-06,   9.98125304e-01,   1.86915136e-03],
       [  2.52257363e-01,   3.57603482e-01,   3.90139155e-01],
       [  3.59407483e-01,   2.81115697e-01,   3.59476820e-01],
       [  1.49331328e-01,   5.41461157e-01,   3.09207515e-01],
       [  2.55397888e-01,   4.20177473e-01,   3.24424639e-01]])

In [55]:
#Submission must be - listing_id, high, medium, low
#The index of our probabilties is from the label encoder earlier (0=high, 1=low, medium=2)
submission_df = pd.DataFrame({'listing_id':test_df['listing_id'], 'high':prediction_probabilites[:, 0],
                             'medium':prediction_probabilites[:, 2], 'low':prediction_probabilites[:, 1]})
#Re-Order Columns for submission
cols = ['listing_id', 'high', 'medium', 'low']
submission_df = submission_df[cols]
submission_df.head()

Unnamed: 0,listing_id,high,medium,low
0,7142618,0.154128,0.349174,0.496698
1,7210040,0.174564,0.315076,0.51036
100,7103890,0.026156,0.113512,0.860332
1000,7143442,0.154293,0.386145,0.459563
100000,6860601,0.008696,0.118713,0.87259


In [50]:
#Verify all is well (no NaNs)
submission_df.isnull().sum()

high          0
listing_id    0
low           0
medium        0
dtype: int64

In [54]:
#Write to CSV for submission
submission_df.to_csv('log_reg.csv', index=False)

### Conclusion

This simple Logistic Regression scored .8516 on Kaggle.  Not very good, a lot of room for improvement.  The first thing to do is tune the hyper-parameters for our logistic regression.