# Moving to the big apple

In September I began a Masters program in Data Science in New York City. Moving from Hartford, CT I was not really prepared for the New York housing market, despite many warnings from friends. I also did not take as much time off as I should have leaving my previous job, and this led me to some laziness when approaching the apartment search. The main issue in the New York housing market is that places come and go so quickly. If a place is any good, it will probably be off the market in a few days. After realizing this I wanted to be the first to know when something opened up, but I didn't want to have to sit on a computer and watch.


## The code

First, I'll need to scrape the places from streetEasy. In addition to all of major information someone looks for in an apartment such as price, size, neighborhood, the most important piece of information for me is how far the apartment is from Columbia, so first I'll write a function which uses the geopy library to tell me that.



In [None]:
from geopy.distance import vincenty


def getDistFromColumbia(coords):
    #given a set of coordinates, how far in miles is that location from Columbia
    columbia = (40.807511,-73.9647077)
    dist = vincenty(columbia, coords.split(",")).miles
    
    return dist



Next I'll build the main scraping function which gets me all of the information I want, including distance from Columbia.

In [None]:
from bs4 import BeautifulSoup
import re

def getPlace(pl):
    # scrape relevant info for an apartment, return as dict
    apt = {'price':pl.select_one("span.price").text, 'loc':pl["se:map:point"], "id":pl["data-id"]}
    apt["addr"] = pl.select_one("div.details-title a").text
    
    # does it have a brokers fee?
    apt["broker_fee"] = pl.select_one("div.banner.no_fee") is None
    
    # how far from Columbia?
    apt["dist"] = getDistFromColumbia(apt['loc'])
    
    # sometimes the size is not available
    apt_size = pl.select_one("span.last_detail_cell")
    if apt_size is not None:
        apt["size"] = apt_size.text
        
    # what neighborhood?
    apt["nbh"] = pl(text=re.compile(r' in '))[0]
    
    # url
    apt["href"] = "http://streeteasy.com" + pl.select_one("div.details-title a")["href"]
    
    return apt



## Scraping time

Now its time to scrape. To get the URL I just went to streetEasy, set the parameters I wanted, and copied that URL to here. Now it probably would have been better to make those all variables which I set here, but for this example they were never going to change, so I'll save that work for the next time I move



In [None]:
import requests

steasy = "http://streeteasy.com/for-rent/nyc/price:1000-2000%7Carea:154,148,153,147%7Cbeds%3C1?sort_by=listed_desc"
st = requests.get(steasy)
soup = BeautifulSoup(st.text, "html.parser")
stplaces = soup.select("div.item")

places = [getPlace(s) for s in stplaces]


## Using email as push notification

I would like to get an email when a new place is found, so using smtplib I'll send myself one. Using Yahoo to send the email because I was okay with downgrading the security on that account so it would allow automatically generated sent emails.



In [None]:
import smtplib

def sendMail(apt):
    # send an email with the apartment information
    fromMy = ''
    to  = ''
    subj='New Apt: ' + apt["price"] + ", Dist: "+str(apt["dist"])
    message_text=str(apt).encode('ascii', 'ignore')
    msg = "From: %s\nTo: %s\nSubject: %s\n\n%s" % ( fromMy, to, subj, message_text )

    username = str('')  
    password = str('')
    server = smtplib.SMTP("smtp.mail.yahoo.com",587)
    server.ehlo()
    server.starttls()
    server.login(username,password)
    server.sendmail(fromMy, to,msg)
    server.quit()
    


## No duplicates

Finally, I don't wan't to get any duplicates, so I'll create a text file with all of the ids of places I've already sent. Only send an email with a place if an id isn't in that file.

In [None]:

with open('places.txt', 'r+') as f:
    placeIds = f.read()
    for p in places:
        pSplit = placeIds.split(",")
        if p["id"] not in pSplit:
            print("FOUND A NEW SPOT", p["id"])
            sendMail(p)
            f.write(p["id"]+",")
    f.close()


## Update time

I also have a text file which writes the last time this file ran, so I can confirm the cron task is working periodically.

In [None]:
# keep track of last updated time to verify its still running
with open("time.txt", 'w+') as wr:
    fmt='%Y-%m-%d-%H-%M-%S'
    wr.write(datetime.datetime.now().strftime(fmt));


# That's it!


I've included the python module (find_apt.py) which I used to set up an ubtuntu cron task so this ran automatically. I found this to be a lot of help in my apartment hunt, and I'm hoping it can be of use to someone else.