## Objective

#### How much interest will a new rental listing on RentHop receive?

In this <a href='https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data'>competition</a>, you will predict how popular an apartment rental listing is based on the listing content like text description, photos, number of bedrooms, price, etc. The data comes from renthop.com, an apartment listing website. These apartments are located in New York City.

### File descriptions

- **```train.json```** - the training set
- **```test.json```** - the test set


### Data fields

- **```bathrooms:```** number of bathrooms
- **```bedrooms:```** number of bathrooms
- **```building_id```**
- **```created```**
- **```description```**
- **```display_address```**
- **```features:```** a list of features about this apartment
- **```latitude```**
- **```listing_id```**
- **```longitude```**
- **```manager_id```**
- **```photos:```** a list of photo links. You are welcome to download the pictures yourselves from renthop's site, but they are the same as imgs.zip. 
- **```price:```** in USD
- **```street_address```**
- **```interest_level:```** this is the target variable. It has 3 categories: 'high', 'medium', 'low'


## Import data

In [2]:
from __future__ import print_function, division

In [3]:
# %load pp_tools.py
from IPython.display import display, HTML

def pp_dict(d):
    display(HTML(
        u'<table>{}</table>'.format(
            u''.join(u'<tr><td><b>{}</b></td><td>{}</td></tr>'.format(k, d[k]) for k in d))))

def pp_dictOflist(d):
    display(HTML(
        u'<table>{}</table>'.format(
            u''.join(u'<tr><td><b>{}</b></td>{}</tr>'.format(k,
                u''.join(u'<td>{}</td>'.format(v) for v in d[k])) for k in d.keys()))))
    
def pp_listOflist(l):
    display(HTML(
        u'<table>{}</table>'.format(
            u''.join(u'<tr>{}</tr>'.format(
                u''.join(u'<td>{}</td>'.format(v) for v in sublist)) for sublist in l))))
    
def pp_bold(str):
    display(HTML('<b>{}</b>'.format(str)))

def pp_dfinfo(df, width=4):
    ncols = len(df.columns)
    width = min(width, ncols)
    depth = -(-ncols // width)
    i = 0
    list_ = [[] for _ in range(depth)]
    for _ in range(width):
        for row in range(depth):
            if i < ncols:
                col = df.columns[i]
                list_[row].extend(['<b>{}</b>'.format(col), df[col].count(), df.dtypes[i]])
            i += 1

    print('{} entries, {} columns'.format(len(df), ncols))
    pp_listOflist(list_)


In [4]:
import json

In [5]:
%%time
with open('data/train.json', 'r') as fd:
    dataset = json.load(fd)

CPU times: user 1 s, sys: 788 ms, total: 1.79 s
Wall time: 1.79 s


The data set consists of a 15-item dictionary that maps field names (e.g., ```bedrooms```) to the data point for each listing . 

In [6]:
print(dataset.keys())
print(type(dataset.values()[0]))

[u'listing_id', u'interest_level', u'display_address', u'description', u'created', u'price', u'bedrooms', u'longitude', u'photos', u'manager_id', u'latitude', u'bathrooms', u'building_id', u'street_address', u'features']
<type 'dict'>


There are just under 50,000 listings in the dataset. Here I define helper functions to gather the data for a single listing into one object.

In [7]:
def get_id(data, i):
    """Returns id of ith listing."""
    return data.values()[0].keys()[i]

def get_listing(data, id):
    """Returns dictionary with data for one listing"""
    listing = {'id' : id}
    for k, v in data.items():
        value = v[id]
        if isinstance(value, list):
            value = '\n'.join(value)
        listing[k] = value
    return listing

Let's print one of them.

In [8]:
pp_bold('{} listings'.format(len(dataset.values()[0])))
pp_dict(get_listing(dataset, get_id(dataset, 0)))

0,1
listing_id,6818139
building_id,f7fc4fd7b6b80615ebfce3e212e17cce
display_address,Hausman Street
description,"This one WON'T LAST!! Here is a stunning 3 Bedroom, 2 Full Bathroom apartment measuring approximately 1100 sqft! The layout is NOT a railroad with a King Sized master bedroom complete with on suite full bathroom and bay-windows. The 2nd and 3rd bedrooms can fit a Full or Queen sized bed and have over-sized windows. Thee is a second full bathroom with soaking tub off of the main hallway. Open concept kitchen with large island finished in real stone counters and soft close cabinetry plus Stainless Steel industrial sized appliances! Heated hardwood flooring throughout with tons of sunlight and space. Central HVAC means that you never have to be too hot or too cold again! One of the best locations for street parking in Greenpoint! Don't have a car? No Problem, the Nassau G Train and Grand St L train are within reach. Call, Text, Email Taylor now to schedule your private showing. Not exactly what you are looking for? Email me your search criteria, I Specialize in North Brooklyn! -------------Listing courtesy of Miron Properties. All material herein is intended for information purposes only and has been compiled from sources deemed reliable. Though information is believed to be correct, it is presented subject to errors, omissions, changes or withdrawal without notice. Miron Properties is a licensed Real Estate Broker. www.MironProperties.com"
created,2016-04-03 02:22:45
price,2995
bedrooms,3
longitude,-73.9389
photos,https://photos.renthop.com/2/6818139_a50e80ff79c71a1ced4ec103985abdd9.jpg https://photos.renthop.com/2/6818139_a2be68c008aeff88347f97dc0350f85d.jpg https://photos.renthop.com/2/6818139_63166d3d69203d7bd63e8e8bb47db8c6.jpg https://photos.renthop.com/2/6818139_1aba230faec5ed91372bb3996b98acf1.jpg https://photos.renthop.com/2/6818139_af28eb103bab59ec9d929c3657d846e6.jpg https://photos.renthop.com/2/6818139_3637c3737c1286332310476fa4d5b1d8.jpg https://photos.renthop.com/2/6818139_83f4bfb1928581fb9feec154629f62e9.jpg
manager_id,198d2e96429920ff71cd06ddff323713


Import all the listings into a Mongo database as separate documents. This helper function loads listings into the database, skiping duplicates.

In [9]:
def load_data(collection, dataset):

    # Get id's of existing and new listings
    
    cursor = collection.find({}, projection={'_id': 0, 'id': 1})
    old = [cursor.next()['id'] for i in range(cursor.count())]
    new = dataset.values()[0].keys()
   
    # If nothing new, punt
    
    todo = set(new) - set(old)
    if not todo:
        return None
    
    # Add new listings
    
    bulk = collection.initialize_ordered_bulk_op()
    for id in todo:
        bulk.insert(get_listing(dataset, id))
    return bulk.execute()

In [10]:
from pymongo import MongoClient
client = MongoClient('ec2-34-198-246-43.compute-1.amazonaws.com', 27017)
db = client.renthop
collection = db.listings

In [11]:
%%time
result = load_data(collection, dataset)
print(result)

None
CPU times: user 120 ms, sys: 44 ms, total: 164 ms
Wall time: 1.9 s


Verify listings are in collection; print one of them.

In [12]:
pp_bold('{} listings'.format(collection.count()))
cursor = collection.find({ 'listing_id' : get_listing(dataset, get_id(dataset, 1))['listing_id']})
pp_dict(cursor.next())

0,1
listing_id,7088908
_id,58ac97520b0203c6b6436f4d
display_address,E 78 Street
description,"Exposed brick wall w/gas electric & heat included: Immaculate, Naturally well-lite studio apartment in vibrant Upper East Side. Close & convenient access to the M 31/79 bus lines, shops, cafes, enticing nightlife/restaurants, & a endless array of sites to see on York Ave. Alluring hardwood flooring throughout, accommodating closet space, spacious granite counter top kitchen w/white appliances & captivating mahogany cabinetry. Laundry/video intercom, pets-friendly. $1,950/mo. Immediate occupancy. call Aubyn 264-906-2321 kagglemanager@renthop.com to secure a viewing. Bond New York is a real estate broker that supports equal housing opportunity."
created,2016-06-01 02:46:47
price,1950
bedrooms,0
interest_level,medium
longitude,-73.9501
photos,https://photos.renthop.com/2/7088908_f931ca0d83f33e08b6373602c15a2d08.jpg https://photos.renthop.com/2/7088908_9bbdc2ef9f8c8d42fe09ff99190e52ef.jpg https://photos.renthop.com/2/7088908_c1e72e3a8c228290172d793d33a3ab5e.jpg https://photos.renthop.com/2/7088908_c5a4045aa176b6664d7d0814a502e412.jpg https://photos.renthop.com/2/7088908_f5b90044815d1c5b1e1db4ec23ee0163.jpg https://photos.renthop.com/2/7088908_59507afd1e75b9d5f621a87f425d810d.jpg https://photos.renthop.com/2/7088908_858d353bdf4cf0ca228ac8ebe0e54204.jpg https://photos.renthop.com/2/7088908_752a4559b3199e9f0082754e5792faea.jpg
