# The Battle of Neighborhoods - Week 2

## Case Scenario: Searching for student apartment in NYC - The problem of student housing in big cities

## 1. Introduction / Business Problem

In order to further pursue our career and take advantage of both academic and professional opportunities we examine the case scenario of relocating in New York, USA.
In the following project we examine a case in which one is willing to continue on postgraduate studies in one of the universities in NY that are available for a specific field of studies (ie. civil / structural engineering).
The decision on which university we will apply first to, is a combination of reputation and feasibility of economic transportation.
The main question that immediately arises is whether we can find an apartment within a viable range of the university.
Therefore the requirements to select the best housing for this project are:

* Apartment must have at least 1 bedroom
* Desired location is near a metro station in wider NYC area and within 1.0 km of the station
* Price of rent not exceed $2400 per month
* Top amenities in the selected neighborhood shall be similar to current residence
* Desirable to have venues such as coffee shops, bakery, fast food, snack place, bar, gym and food shops


Map crawling will be performed using the Foursquare API.

This problem is an optimal selection problem that thousands of people that want to relocate for business or studies face every day.
Various techniques taught in course can be implemented such as clustering, optimization even machine learning.

## 2. Data

Data that need to be acquired may include:

* List of Boroughs and neighborhoods of Manhattan with their geodata
* List of Subway metro stations in Manhattan with their address location
* List of Universities and their address location
* List of apartments for rent in Manhattan area with their addresses and price (crawled from provider)
* List of apartment for rent with additional information, such as price, address, area, # of beds
* Optional: Venues for each Manhattan neighborhood ( than can be clustered)

Data to be sourced are:

* Distance between university and nearby metro stations (wikipedia)
* List of apartments fitting the requirements (craigslist or similar real estate websites)
* Distance of apartments fitting the requirements from metro stations
* Ammenities in each neighborhood of the selected apartments (clustering)
+ any other data needed

Data Providers:

-- NYC OpenData - https://opendata.cityofnewyork.us/

-- MTA - The Metropolitan Transportation Authority

-- https://educatingengineers.com/states/new-york/civil-engineering

-- Wikipedia

--Craigslist

## 3. Methodology

In [1]:
import sys
#You need Java installed for tabula-py to work.
#Reference: https://tabula-py.readthedocs.io/en/latest/getting_started.html#get-tabula-py-working-windows-10
!{sys.executable} -m pip install tabula-py
!{sys.executable} -m pip install geopy
!{sys.executable} -m pip install folium
!{sys.executable} -m pip install selenium
!{sys.executable} -m pip install webdriver-manager
!{sys.executable} -m pip install PyPDF2


from re import sub
from decimal import Decimal
import numpy as np
from webdriver_manager.chrome import ChromeDriverManager
import tabula
import time
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from time import sleep
import requests
import matplotlib.pyplot as plt
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import folium
from pandas.io.json import json_normalize
from folium import plugins
import matplotlib.cm as cm
from IPython.display import HTML, display
import matplotlib.colors as colors
import seaborn as sns
import random
from selenium import webdriver
from sklearn.cluster import KMeans
from scipy.stats import gaussian_kde
print('Libraries Successfully imported.')

Libraries Successfully imported.


In [2]:
#Foursquare API
CLIENT_ID="ARTZNKOUJ2J24YXRPN1O1AMFF2IEUL4GUTQB4QCD3SEFHQGV"
CLIENT_SECRET="KGTQGTO3AXTBT5BN3RNROBVPGOTPZWBJPZYY1NQ3H05TUJML"
VERSION="20190707"

In [3]:
address = 'Georgiou Papandreou, Thessaloniki, Greece'

geolocator = Nominatim(user_agent="BOTN-App-001")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geo coords of current home are {}, {}.'.format(latitude, longitude))

The geo coords of current home are 40.7276938, 22.6945875.


In [4]:
#Initial Coords Data (Existing Residence example)
neighborhood_latitude=40.5997456
neighborhood_longitude=22.9515123

In [5]:
i=1
LIMIT = 500
radius = 1500*i

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

In [6]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [7]:
nearby_venues_all=[]
nearby_venues_all= pd.DataFrame()

#Pentagon Crunch
while (i*100) <= LIMIT:
    
    neighborhood_latitude = (np.sin(i*1.256)*0.001) + neighborhood_latitude
    
    neighborhood_longitude = (np.cos(i*1.256)*0.001) + neighborhood_longitude
    
    venues = results['response']['groups'][0]['items']
    
    nearby_venues = json_normalize(venues)

    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues = nearby_venues.loc[:, filtered_columns]

    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    
    nearby_venues_all = pd.concat([nearby_venues_all, nearby_venues])
    
    nearby_venues_all.drop_duplicates(inplace=True)
    
    print("Pentagon point", i, neighborhood_latitude,',', neighborhood_longitude)
            
    i=i+1

nearby_venues_all.head(10)

Pentagon point 1 40.60069645946051 , 22.951821922813057
Pentagon point 2 40.60128527502248 , 22.951013655385786
Pentagon point 3 40.600699037023304 , 22.950203516503663
Pentagon point 4 40.59974719614449 , 22.95051010897163
Pentagon point 5 40.5997440108427 , 22.951510103898546


  del sys.path[0]


Unnamed: 0,name,categories,lat,lng
0,Νέα Παραλία - Ποσειδώνιο,Park,40.600627,22.949517
1,Πεζόδρομος Νέας Παραλίας Θεσσαλονίκης,Park,40.599218,22.949013
2,BizBize,Meze Restaurant,40.598252,22.951265
3,Asian House,Asian Restaurant,40.601313,22.950223
4,Από τη σχάρα στη λαδόκολλα,Grilled Meat Restaurant,40.600563,22.950494
5,Garden of Water (Κήπος του Νερού),Park,40.602718,22.950266
6,Thessaloniki Concert Hall (Μέγαρο Μουσικής Θεσ...,Concert Hall,40.598053,22.947822
7,Peinirly's,Pizza Place,40.600808,22.953672
8,Thria,Mediterranean Restaurant,40.601216,22.949868
9,Παιδικό πάρκο Ν. παραλίας,Park,40.600179,22.949407


In [8]:
nearby_venues_all.shape

(100, 4)

In [9]:
nearby_venues_all['categories'].value_counts().nlargest(12)

Dessert Shop            8
Park                    7
Café                    7
Bar                     7
Coffee Shop             5
Bakery                  4
Meze Restaurant         4
Restaurant              3
Greek Restaurant        3
Fast Food Restaurant    2
Cocktail Bar            2
Gym / Fitness Center    2
Name: categories, dtype: int64

### NYC Neighbourhood data

In [10]:
# Read csv file with clustered neighborhoods with geodata
nyc_data  = pd.read_csv('https://raw.githubusercontent.com/tvitalis/Coursera-Capstone/master/nynta.csv') 
nyc_geojson = 'https://raw.githubusercontent.com/tvitalis/Coursera-Capstone/master/NeighborhoodAreas.geojson'
nyc_neigh = nyc_data['NTAName']
nyc_neigh.drop_duplicates
nyc_data.head()

Unnamed: 0,BoroName,the_geom,CountyFIPS,BoroCode,NTACode,NTAName,Shape_Leng,Shape_Area
0,Brooklyn,MULTIPOLYGON (((-73.97604935657381 40.63127590...,47,3,BK88,Borough Park,39247.228028,54005020.0
1,Queens,MULTIPOLYGON (((-73.80379022888246 40.77561011...,81,4,QN51,Murray Hill,33266.904995,52488280.0
2,Queens,MULTIPOLYGON (((-73.8610972440186 40.763664477...,81,4,QN27,East Elmhurst,19816.712293,19726850.0
3,Queens,MULTIPOLYGON (((-73.75725671509139 40.71813860...,81,4,QN07,Hollis,20976.335574,22887770.0
4,Manhattan,MULTIPOLYGON (((-73.94607828674226 40.82126321...,61,1,MN06,Manhattanville,17040.685413,10647080.0


In [11]:
nyc_data.tail()

Unnamed: 0,BoroName,the_geom,CountyFIPS,BoroCode,NTACode,NTAName,Shape_Leng,Shape_Area
190,Brooklyn,MULTIPOLYGON (((-73.93213397515774 40.72815960...,47,3,BK76,Greenpoint,29047.573201,35333580.0
191,Manhattan,MULTIPOLYGON (((-73.96236596889439 40.72420906...,61,1,MN50,Stuyvesant Town-Cooper Village,12021.790416,5582283.0
192,Bronx,MULTIPOLYGON (((-73.8312915777183 40.855434104...,5,2,BX37,Van Nest-Morris Park-Westchester Square,42870.392803,36302380.0
193,Bronx,MULTIPOLYGON (((-73.90958727269663 40.84275637...,5,2,BX14,East Concourse-Concourse Village,27223.847106,18222400.0
194,Bronx,MULTIPOLYGON (((-73.9119181232027 40.843257886...,5,2,BX63,West Concourse,28499.044417,19379820.0


In [12]:
unidata = pd.read_csv('https://raw.githubusercontent.com/tvitalis/Coursera-Capstone/master/COLLEGE_UNIVERSITY.csv') 
unidata

Unnamed: 0,Longitude,Latitude,NAME,HOUSENUM,STREETNAME,CITY,ZIP,URL,BIN,BBL
0,-73.882627,40.767812,"College Of Aeronautics, Laguardia Airport",86-01,23 AVENUE,East Elmhurst,11369,http://www.aero.edu/,4437065,4010640002
1,-73.987663,40.695521,Nyc Technical College Cuny,300,JAY STREET,Brooklyn,11201,http://www.nyctc.cuny.edu,3335891,3001280001
2,-73.983083,40.692306,Institute Of Design And Construction,141,WILLOUGHBY STREET,Brooklyn,11201,http://www.idcbrooklyn.org/,3058246,3020600001
3,-73.985658,40.694632,Polytechnic University / Brooklyn-Metrotech Ca...,6,METROTECH CENTER,Brooklyn,11201,http://www.poly.edu/index_ie.cfm,3331744,3001427501
4,-73.961667,40.806815,Columbia University,1130,AMSTERDAM AVENUE,New York,10027,http://www.columbia.edu/,1084846,1018860001
5,-73.997878,40.732075,New York University,22,WASHINGTON SQUARE NORTH,New York,10011,http://www.nyu.edu/,1080111,1005510011
6,-73.950263,40.819401,The City College of New York,160,CONVENT AVENUE,New York,10031,http://www1.ccny.cuny.edu/,1084081,1019570200


In [13]:
mtastations = pd.read_csv('https://raw.githubusercontent.com/tvitalis/Coursera-Capstone/master/Stations.csv')
mtastations.tail()

Unnamed: 0,Station ID,Complex ID,GTFS Stop ID,Division,Line,Stop Name,Borough,Daytime Routes,Structure,GTFS Latitude,GTFS Longitude,North Direction Label,South Direction Label
491,517,517,S15,SIR,Staten Island,Prince's Bay,SI,SIR,Open Cut,40.525507,-74.200064,St George,Tottenville
492,518,518,S14,SIR,Staten Island,Pleasant Plains,SI,SIR,Embankment,40.52241,-74.217847,St George,Tottenville
493,519,519,S13,SIR,Staten Island,Richmond Valley,SI,SIR,Open Cut,40.519631,-74.229141,St George,Tottenville
494,522,522,S09,SIR,Staten Island,Tottenville,SI,SIR,At Grade,40.512764,-74.251961,St George,
495,523,523,S11,SIR,Staten Island,Arthur Kill,SI,SIR,At Grade,40.516578,-74.242096,St George,Tottenville


In [14]:
print('These are all NYC (5 boroughs) neighbourhoods:')
nyc_neigh

These are all NYC (5 boroughs) neighbourhoods:


0                                           Borough Park
1                                            Murray Hill
2                                          East Elmhurst
3                                                 Hollis
4                                         Manhattanville
5                              Springfield Gardens North
6                                              Homecrest
7                                                Erasmus
8                                               Longwood
9                                  Westchester-Unionport
10                                  Fresh Meadows-Utopia
11                                          Clinton Hill
12                                            St. Albans
13                                                Corona
14                   Pelham Bay-Country Club-City Island
15                                       Cambria Heights
16                            Jamaica Estates-Holliswood
17                             

In [15]:
# Read pdf into DataFrame
pdf_path = "https://github.com/tvitalis/Coursera-Capstone/raw/master/postal.pdf"

postal_neigh = tabula.read_pdf(pdf_path, stream=True)
# read_pdf returns list of DataFrames
print(len(postal_neigh))
postal_neigh[0].dropna(axis=0, inplace=True)
postal_neigh[0]

'pages' argument isn't specified.Will extract only from page 1 by default.


1


Unnamed: 0.1,Bronx,Unnamed: 0
0,Central Bronx,"10453, 10457, 10460"
1,Bronx Park and Fordham,"10458, 10467, 10468"
2,High Bridge and Morrisania,"10451, 10452, 10456"
3,Hunts Point and Mott Haven,"10454, 10455, 10459, 10474"
4,Kingsbridge and Riverdale,"10463, 10471"
5,Northeast Bronx,"10466, 10469, 10470, 10475"
6,Southeast Bronx,"10461, 10462,10464, 10465, 10472, 10473"
8,Central Brooklyn,"11212, 11213, 11216, 11233, 11238"
9,Southwest Brooklyn,"11209, 11214, 11228"
10,Borough Park,"11204, 11218, 11219, 11230"


In [16]:
postal_boro = pd.read_csv('https://raw.githubusercontent.com/tvitalis/Coursera-Capstone/master/zip_borough.csv')
postal_boro.tail()

Unnamed: 0,zip,borough
235,11691,Queens
236,11692,Queens
237,11693,Queens
238,11694,Queens
239,11697,Queens


In [17]:
# Generate Craigslist Links

base_links = []
for i in range(0, 6):
    link = "https://newyork.craigslist.org/search/aap?postal={}".format(unidata['ZIP'].iloc[i])
    base_links.append(link)

In [18]:
base_links = list(dict.fromkeys(base_links))
base_links

['https://newyork.craigslist.org/search/aap?postal=11369',
 'https://newyork.craigslist.org/search/aap?postal=11201',
 'https://newyork.craigslist.org/search/aap?postal=10027',
 'https://newyork.craigslist.org/search/aap?postal=10011']

In [19]:
def getZipListings(link):
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get(link)

    titles = []
    dates = []
    prices = []
    bedrooms = []
    links = []

    items = driver.find_elements_by_class_name('result-info')
    for item in items:
        try:
            titles.append(item.find_element_by_class_name('result-title').get_attribute('innerText'))
        except:
            titles.append("")

        try:
            dates.append(item.find_element_by_class_name('result-date').get_attribute('datetime'))
        except:
            dates.append("")

        try:
            prices.append(item.find_element_by_class_name('result-price').get_attribute('innerText'))
        except:
            prices.append("")

        try:
            bedrooms.append(item.find_element_by_class_name('housing').get_attribute('innerText'))
        except:
            bedrooms.append("")

        try:
            links.append(item.find_element_by_class_name('result-title').get_attribute('href'))
        except:
            links.append("")

    driver.close()
    data = [titles, dates, prices, bedrooms, links]
    df = pd.DataFrame(data).transpose()
    df.columns = ['Title', 'Date', 'Price', 'Bedrooms', 'Link']
    df['Zipcode'] = int(link[-5:])
    
    return df

In [20]:
suggested = unidata
suggested.drop(columns=['HOUSENUM', 'URL', 'BIN','BBL'], inplace=True)
suggested

Unnamed: 0,Longitude,Latitude,NAME,STREETNAME,CITY,ZIP
0,-73.882627,40.767812,"College Of Aeronautics, Laguardia Airport",23 AVENUE,East Elmhurst,11369
1,-73.987663,40.695521,Nyc Technical College Cuny,JAY STREET,Brooklyn,11201
2,-73.983083,40.692306,Institute Of Design And Construction,WILLOUGHBY STREET,Brooklyn,11201
3,-73.985658,40.694632,Polytechnic University / Brooklyn-Metrotech Ca...,METROTECH CENTER,Brooklyn,11201
4,-73.961667,40.806815,Columbia University,AMSTERDAM AVENUE,New York,10027
5,-73.997878,40.732075,New York University,WASHINGTON SQUARE NORTH,New York,10011
6,-73.950263,40.819401,The City College of New York,CONVENT AVENUE,New York,10031


### This piece of code has been used to parse the csv below via selenium
housing = pd.DataFrame()

for link in base_links:
    time.sleep(2)
    temp = getZipListings(link)
    housing = pd.concat([housing, temp])
    
housing = housing[['Zipcode', 'Date', 'Price', 'Bedrooms', 'Title', 'Link']]
housing.head()

housing.to_csv("housing.csv", sep=',', encoding='utf-8')

In [21]:
housing = pd.read_csv('https://raw.githubusercontent.com/tvitalis/Coursera-Capstone/master/housing.csv',error_bad_lines=False)
housing.tail(10)

Unnamed: 0.1,Unnamed: 0,Zipcode,Date,Price,Bedrooms,Title,Link
349,110,10011,2019-06-25 15:56,$3775,1br -,The Elusive West Village 1 bedroom. Renoed & B...,https://newyork.craigslist.org/mnh/fee/d/new-y...
350,111,10011,2019-06-24 19:38,$4500,1br -,SPECTACULAR CHELSEA 1BR-LAUNDRY-PRIV TERRACE-S...,https://newyork.craigslist.org/mnh/fee/d/new-y...
351,112,10011,2019-06-24 18:23,$2995,1br - 500ft2 -,Large Studio Alcove on 19th & 9th No-Fee July ...,https://newyork.craigslist.org/mnh/nfb/d/new-y...
352,113,10011,2019-06-24 17:05,$2900,,**BEAUTY IN THE HEART OF CHELSEA** **STUDIO W/...,https://newyork.craigslist.org/mnh/fee/d/new-y...
353,114,10011,2019-06-24 17:04,$7695,1br -,WEST VILLAGE- CHRISTOPHER ST-LUXURY LIFESTYLE,https://newyork.craigslist.org/mnh/nfb/d/new-y...
354,115,10011,2019-06-23 19:13,$5995,3br -,AMAZING/SPACIOUS/RENOVATED/PRIVATE BALCONY 3 B...,https://newyork.craigslist.org/mnh/nfb/d/new-y...
355,116,10011,2019-06-23 10:36,$2950,,"77 W.15St, SS Kit, Lg STUDIO, Balc, SS Kit, Gy...",https://newyork.craigslist.org/mnh/nfb/d/new-y...
356,117,10011,2019-06-22 10:19,$2950,,"77 W.15St, SS Kit, Lg STUDIO, Balc, SS Kit, Gy...",https://newyork.craigslist.org/mnh/nfb/d/new-y...
357,118,10011,2019-06-19 10:31,$1895,,!!GREAT DEAL!! !!STUDIO HEART OF CHELSEA!!,https://newyork.craigslist.org/mnh/fee/d/new-y...
358,119,10011,2019-06-18 16:08,$2995,1br -,2 x's as Big as any others. 700 Squarefeet in ...,https://newyork.craigslist.org/mnh/fee/d/new-y...


In [22]:
housing.shape

(359, 7)

In [23]:
for i in range(0,359):
    housing['Price'].iloc[i] = Decimal(sub(r'[^\d.]', '', housing['Price'].iloc[i]))
    
housing.sort_values(by=['Price'], inplace=True)
price = housing['Price']
housing

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Unnamed: 0.1,Unnamed: 0,Zipcode,Date,Price,Bedrooms,Title,Link
114,82,11201,2019-07-06 16:06,700,700ft2 -,Room for rent $700 ncludes all!,https://newyork.craigslist.org/mnh/fee/d/new-y...
80,48,11201,2019-07-12 00:42,745,1br -,Beautiful Available now Lovely One bedroom Apa...,https://newyork.craigslist.org/brk/nfb/d/brook...
201,49,10027,2019-07-04 23:27,995,2br -,great for Columbia and City College students,https://newyork.craigslist.org/mnh/nfb/d/new-y...
165,13,10027,2019-07-13 23:24,995,2br -,great for Columbia and City College students,https://newyork.craigslist.org/mnh/nfb/d/new-y...
192,40,10027,2019-07-09 17:44,1200,,Room for rent,https://newyork.craigslist.org/mnh/abo/d/room-...
20,20,11369,2019-07-08 12:42,1200,,large 2 bedroom for rent,https://newyork.craigslist.org/que/fee/d/east-...
237,85,10027,2019-06-14 23:14,1200,1br - 9ft2 -,4 BR apartment looks for the forth roomate,https://newyork.craigslist.org/mnh/nfb/d/new-y...
171,19,10027,2019-07-12 12:01,1250,3br -,1Bedroom AVAILABLE August 1st in 3bedroom/2bath,https://newyork.craigslist.org/mnh/abo/d/new-y...
153,1,10027,2019-07-14 12:52,1250,3br -,Spacious Bedroom Available in Harlem 3br/2bath,https://newyork.craigslist.org/mnh/abo/d/new-y...
10,10,11369,2019-07-13 10:26,1300,1br -,excelente apartamento de un cuarto/parted stud...,https://newyork.craigslist.org/que/abo/d/east-...


In [24]:
housing['Bedroom_count'] = housing['Bedrooms'].str[0]
housingclear = housing.dropna()
housingclear['Bedroom_count'].astype({'Bedroom_count':int})
housingclear.sort_values(by=['Price'])
housingclear.drop([114,256,108,147,144,136,166,71,240,347,267,249], inplace=True)
housingclear

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0.1,Unnamed: 0,Zipcode,Date,Price,Bedrooms,Title,Link,Bedroom_count
80,48,11201,2019-07-12 00:42,745,1br -,Beautiful Available now Lovely One bedroom Apa...,https://newyork.craigslist.org/brk/nfb/d/brook...,1
201,49,10027,2019-07-04 23:27,995,2br -,great for Columbia and City College students,https://newyork.craigslist.org/mnh/nfb/d/new-y...,2
165,13,10027,2019-07-13 23:24,995,2br -,great for Columbia and City College students,https://newyork.craigslist.org/mnh/nfb/d/new-y...,2
237,85,10027,2019-06-14 23:14,1200,1br - 9ft2 -,4 BR apartment looks for the forth roomate,https://newyork.craigslist.org/mnh/nfb/d/new-y...,1
171,19,10027,2019-07-12 12:01,1250,3br -,1Bedroom AVAILABLE August 1st in 3bedroom/2bath,https://newyork.craigslist.org/mnh/abo/d/new-y...,3
153,1,10027,2019-07-14 12:52,1250,3br -,Spacious Bedroom Available in Harlem 3br/2bath,https://newyork.craigslist.org/mnh/abo/d/new-y...,3
10,10,11369,2019-07-13 10:26,1300,1br -,excelente apartamento de un cuarto/parted stud...,https://newyork.craigslist.org/que/abo/d/east-...,1
218,66,10027,2019-06-20 13:31,1300,1000ft2 -,NO FEE DORM STYLE STUDENT SUITE,https://newyork.craigslist.org/mnh/nfb/d/new-y...,1
236,84,10027,2019-06-15 10:38,1300,1000ft2 -,NO FEE DORM STYLE STUDENT SUITE,https://newyork.craigslist.org/mnh/nfb/d/new-y...,1
9,9,11369,2019-07-13 14:25,1350,1br -,$1350 for one bedroom Apt in east Elmhurst nea...,https://newyork.craigslist.org/que/abo/d/east-...,1


In [25]:
latitude= 40.7308619
longitude= -73.9871558 
map_points = folium.Map(location=[latitude, longitude], zoom_start=14)

#folium.GeoJson(
#    nyc_geojson,
#    name='nyc_geojson'
#).add_to(map_points)

for lat, lng, label in zip(mtastations['GTFS Latitude'], mtastations['GTFS Longitude'], mtastations['Stop Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_points)

map_points

In [26]:
latitude= 40.7308619
longitude= -73.9871558 
map_points2 = folium.Map(location=[latitude, longitude], zoom_start=14)

#folium.GeoJson(
#    nyc_geojson,
#    name='nyc_geojson'
#    ).add_to(map_points2)

for lat, lon, poi, cluster in zip(unidata['Latitude'], unidata['Longitude'], unidata['NAME'], unidata['ZIP']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='white',
        fill_opacity=0.7).add_to(map_points2)

map_points2

In [27]:
#from Nominatim
US11369 = [40.7624964954897, -73.8727912890767]
US11201 = [40.6928067, -73.9887616]
US10027 = [40.8088437, -73.9658566]
US10011 = [40.7408412934514, -73.9994557773656]

In [28]:
latitude= 40.7308619
longitude= -73.9871558

map_manhattan_rent = folium.Map(location=[latitude, longitude], zoom_start=12.4)

folium.CircleMarker(
    US11369,
    radius=10,
    popup="US11369",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_manhattan_rent)
    
folium.CircleMarker(
    US11201,
    radius=10,
    popup="US11201",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_manhattan_rent)

folium.CircleMarker(
    US10027,
    radius=10,
    popup="US10027",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_manhattan_rent) 
    
folium.CircleMarker(
    US10011,
    radius=10,
    popup="US10011",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(map_manhattan_rent)
    

map_manhattan_rent

In [29]:
latitude= 40.7308619
longitude= -73.9871558

nyc = folium.Map(location=[latitude, longitude], zoom_start=12.4)

folium.CircleMarker(
    US11369,
    radius=10,
    popup="US11369",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(nyc)
    
folium.CircleMarker(
    US11201,
    radius=10,
    popup="US11201",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(nyc)

folium.CircleMarker(
    US10027,
    radius=10,
    popup="US10027",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(nyc) 
    
folium.CircleMarker(
    US10011,
    radius=10,
    popup="US10011",
    color='yellow',
    fill=True,
    fill_color='purple',
    fill_opacity=0.7,
    parse_html=False).add_to(nyc)

for lat, lon, poi, cluster in zip(unidata['Latitude'], unidata['Longitude'], unidata['NAME'], unidata['ZIP']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='white',
        fill_opacity=0.7).add_to(nyc)
    
for lat, lng, label in zip(mtastations['GTFS Latitude'], mtastations['GTFS Longitude'], mtastations['Stop Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(nyc)
    

nyc

## Looking at the map produced we see that apartments at zip codes 10027 (Nearby Columbia University) and 11201 (Nearby Cuny and Metrotech) are the best options

The price chart fetched from craigslists should be taken into consideration. We clearly see that there are more apartments in the cheaper price range (under $2400) near 10027 - Columbia University. That makes it an ideal location for studying and renting a place there.

Finally we should take a look at the neighbourhood's POI.

In [30]:
i=1
LIMIT = 500
radius = 1500*i
neighborhood_latitude=40.6928067
neighborhood_longitude=-73.9887616
nearby_venues_all_11201=[]
nearby_venues_all_11201= pd.DataFrame()

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

#Pentagon Crunch
while (i*100) <= LIMIT:
    
    neighborhood_latitude = (np.sin(i*1.256)*0.001) + neighborhood_latitude
    
    neighborhood_longitude = (np.cos(i*1.256)*0.001) + neighborhood_longitude
    
    venues = results['response']['groups'][0]['items']
    
    nearby_venues = json_normalize(venues)

    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues = nearby_venues.loc[:, filtered_columns]

    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    
    nearby_venues_all_11201 = pd.concat([nearby_venues_all_11201, nearby_venues])
    
    nearby_venues_all_11201.drop_duplicates(inplace=True)
    
    print("Pentagon point", i, neighborhood_latitude,',', neighborhood_longitude)
            
    i=i+1

nearby_venues_all_11201.head(10)

Pentagon point 1 40.69375755946051 , -73.98845197718694
Pentagon point 2 40.69434637502248 , -73.98926024461421
Pentagon point 3 40.693760137023304 , -73.99007038349635
Pentagon point 4 40.69280829614449 , -73.98976379102838
Pentagon point 5 40.6928051108427 , -73.98876379610147




Unnamed: 0,name,categories,lat,lng
0,SoulCycle Brooklyn Heights,Cycle Studio,40.692253,-73.991042
1,Xi'an Famous Foods,Chinese Restaurant,40.69217,-73.98673
2,Shake Shack,Burger Joint,40.692122,-73.988606
3,New York Transit Museum,History Museum,40.690469,-73.989963
4,Perelandra Natural Foods,Grocery Store,40.69338,-73.991341
5,Equinox Brooklyn Heights,Gym,40.69253,-73.991587
6,Borough Hall Greenmarket,Farmers Market,40.693707,-73.990321
7,Sophies Cuban Cuisine,Cuban Restaurant,40.690602,-73.9877
8,Yoga Pole Studio,Yoga Studio,40.690993,-73.9918
9,Damascus Bread & Pastry Shop,Bakery,40.690047,-73.993054


In [31]:
nearby_venues_all_11201['categories'].value_counts().nlargest(12)

Coffee Shop             8
Cocktail Bar            5
Grocery Store           5
Bar                     4
Italian Restaurant      3
Pizza Place             3
Bakery                  3
Yoga Studio             3
Gym / Fitness Center    3
Chinese Restaurant      2
Japanese Restaurant     2
French Restaurant       2
Name: categories, dtype: int64

In [32]:
i=1
LIMIT = 500
radius = 1500*i
neighborhood_latitude=40.8088437
neighborhood_longitude=-73.9658566
nearby_venues_all_10027=[]
nearby_venues_all_10027= pd.DataFrame()

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()

#Pentagon Crunch
while (i*100) <= LIMIT:
    
    neighborhood_latitude = (np.sin(i*1.256)*0.001) + neighborhood_latitude
    
    neighborhood_longitude = (np.cos(i*1.256)*0.001) + neighborhood_longitude
    
    venues = results['response']['groups'][0]['items']
    
    nearby_venues = json_normalize(venues)

    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues = nearby_venues.loc[:, filtered_columns]

    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    
    nearby_venues_all_10027 = pd.concat([nearby_venues_all_10027, nearby_venues])
    
    nearby_venues_all_10027.drop_duplicates(inplace=True)
    
    print("Pentagon point", i, neighborhood_latitude,',', neighborhood_longitude)
            
    i=i+1

nearby_venues_all_10027.head(10)

Pentagon point 1 40.80979455946051 , -73.96554697718693
Pentagon point 2 40.810383375022475 , -73.9663552446142
Pentagon point 3 40.8097971370233 , -73.96716538349634
Pentagon point 4 40.80884529614449 , -73.96685879102837
Pentagon point 5 40.8088421108427 , -73.96585879610146




Unnamed: 0,name,categories,lat,lng
0,Riverside Park,Park,40.806809,-73.968651
1,Riverside Park @ 115th St.,Park,40.80664,-73.966514
2,Book Culture,Bookstore,40.806629,-73.96494
3,Blue Bottle Coffee,Coffee Shop,40.80629,-73.965524
4,Shake Shack,Burger Joint,40.807933,-73.964013
5,Alma Mater Statue,Outdoor Sculpture,40.807726,-73.962252
6,Milano Market,Sandwich Place,40.805848,-73.965424
7,Hex & Company,Coffee Shop,40.805266,-73.966203
8,Columbia Greenmarket,Farmers Market,40.807195,-73.964335
9,Riverside Park 119th Street Tennis Courts,Tennis Court,40.811358,-73.965748


In [33]:
nearby_venues_all_10027['categories'].value_counts().nlargest(12)

Coffee Shop            9
Italian Restaurant     6
Park                   5
American Restaurant    4
Mexican Restaurant     4
Playground             3
Bakery                 3
Wine Shop              3
Bar                    3
Farmers Market         2
Gastropub              2
Indian Restaurant      2
Name: categories, dtype: int64

## 4. Results

In todays rising market students require a wholistic approach to choosing a university to study, or chase postgraduate studies. 
The large cities we live in, require daily use of transit and ease of accessing your personal space. 
This can largely impact the quality of living especially in crowded cities with very large distances and to people with very demanding schedules. This project is dedicated to them.
Looking at the results above Columbia University located at Central Harlem is clearly a winner. 
The location provides the best combination of neighbourhood POIs, big availability of cheap housing and good access to MTA Subway stations. 
All these features make this location ideal for studying and living.

## 5. Discussion

##### This is the discussion pane!

#### Note 1: The algorithm for parsing Foursquare API data was optimized using a 5 point pentagon data crunch. This code uses a predifined circular radius to crunch data from points in the range of the circle and combine them into a single table. Duplicates have been deleted from the results. In that case this algorithm performed similar or better to simply using one point (due to the extended range it covers). The matter of geodata acquisition is a very interesting one and many ways like this one can be used for way better results.

## 6. Conclusion

This section concludes the Coursera Capstone project. With this project i have been able to understand and progress with Data Science. The capstone project was a great opportunity to create and analyze a real life problem, that bothers daily thousands of students worldwide.
Code in this section can be easily applied to other problems as  well as be remade to model the problem of student housing in other cities.