# Applied Data Science Capstone
Zeiad Wael Sabra 4/5/2019

## Analysis of New York Airbnb Prices

![](https://raw.githubusercontent.com/OZWSO/Coursera_Capstone/master/NY/NY.PNG)


### Table of Content
1. [Introduction](#introduction)
2. [Business Plan](#business_paln)
3. [Data Selection](#data_selection)
4. [Data Exploration](#data_exploration)
5. [Next](#next)

### Introduction <a name="introduction"></a>
New York City is a huge tourist attraction visited by millions each year, making finding a place to stay a very difficult and pricey endeavour.
IsPriceRight is a website that offers recommendations for prices of Airbnb. Is the place you plan on staying at overpriced? Head to RightPrice to check if the price is right or if you are being scammed.

### Business Plan <a name=business_paln></a>

RightPrice wants us to make a model to advise tourists visiting New York City on the optimal price for a place to stay in New York city.
A tourist provides us with Information about the place and we provid him/her with the optimal price using our model.
Our goal is to build a model that give an estimate of the rent of a place in New York City using available data.  

The Desired outcomes are:   
* A model for calculating rental prices.
* A description of the most relivant features of the model.
* Cluster the Neigbourhoods based on the Rent, Venues, and location.

### Data Selection <a name=data_selection></a>
We used http://data.beta.nyc as the source of our data 

1. Geospatial Data
   * The data for the geolocations and boundries of New York's neighbourhoods was downloaded from <a href=http://data.beta.nyc//dataset/0ff93d2d-90ba-457c-9f7e-39e47bf2ac5f/resource/35dd04fb-81b3-479b-a074-a27a37888ce7/download/d085e2f8d0b54d4590b1e7d1f35594c1pediacitiesnycneighborhoods.geojson>here</a>
   * The data consists of Neigbourhoods names, Boroughs names, Neigbourhood bountries and some other columns.  
   
   
2. Airbnb Rental data
   * The data for the Rental Data was download from <a href= http://data.insideairbnb.com/united-states/ny/new-york-city/2015-05-01/data/listings.csv.gz>here</a>  
   * We chose the listing for may of 2015, as it is the latest data available on <a href=ttp://data.beta.nyc>data.beta.nyc</a>    
   
   
3. Foursquare API  
   * We are going to use the Foursquare API to explore the nearby venues available around each listing of the Airbnb dataset and see how they affect the price of the listing.


### Data Exploration <a name=data_exploration></a>
In this early stage we are just going to load the data, look at it and make some visualizations.

In [1]:
#importing numpy and Pandas
import pandas as pd
import numpy as np
from urllib.request import urlopen
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import matplotlib.colors as colors
import geopandas as gpd
import requests


Loading the geospatiol data of New York City.

In [2]:
gdf = gpd.read_file("pediacitiesnycneighborhoods.geojson")
gdf.head()

Unnamed: 0,neighborhood,boroughCode,borough,@id,geometry
0,Allerton,2,Bronx,http://nyc.pediacities.com/Resource/Neighborho...,POLYGON ((-73.84859700000018 40.87167000000012...
1,Alley Pond Park,4,Queens,http://nyc.pediacities.com/Resource/Neighborho...,"POLYGON ((-73.74333268196389 40.7388830992604,..."
2,Arden Heights,5,Staten Island,http://nyc.pediacities.com/Resource/Neighborho...,"POLYGON ((-74.169827 40.56107800000017, -74.16..."
3,Arlington,5,Staten Island,http://nyc.pediacities.com/Resource/Neighborho...,POLYGON ((-74.15974815874296 40.64141652579018...
4,Arrochar,5,Staten Island,http://nyc.pediacities.com/Resource/Neighborho...,POLYGON ((-74.06077989345394 40.59318800468343...


Assigning  a color to each borough

In [3]:
x = {"Bronx":'red', "Manhattan":'blue', "Brooklyn":'green', "Queens":'orange', "Staten Island":'yellow'}
gdf["color"] = gdf["borough"].apply(lambda i : x[i])

Getting the latitude and longitude of New York City

In [4]:
address = 'New York, NY'
geolocator = Nominatim(user_agent="UK_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

Creating a map of the neighbourhood of New York City color coded by boroughs

In [5]:
m = folium.Map(location=[latitude, longitude], zoom_start=10, control_scale=False)
get_color = lambda x: x['properties']['color']
folium.GeoJson(gdf
            ,style_function=lambda C: {
            'fillColor': get_color(C) ,
            'color' : 'white',
            'weight' : 1,
            'fillOpacity' : 0.5},
            tooltip=folium.features.GeoJsonTooltip(
            fields=['neighborhood', 'borough'],
            aliases=['',''])).add_to(m)
m

Loading the Airbnb rental Data.

In [6]:
airbnb_rental = pd.read_csv("NY/listings.csv.gz", compression='gzip')
airbnb_rental.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,picture_url,host_id,...,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,calculated_host_listings_count,reviews_per_month
0,1533652,https://www.airbnb.com/rooms/1533652,20150501110800,2015-05-02,Charming Studio - Central Park,Stay at my charming Central Park studio apartm...,"Charming, sunny studio apartment 1/2 block fro...",Stay at my charming Central Park studio apartm...,https://a1.muscache.com/ic/pictures/66278406/8...,8178950,...,10.0,10.0,10.0,10.0,9.0,f,,,1,1.1
1,3423077,https://www.airbnb.com/rooms/3423077,20150501110800,2015-05-02,Rockaway Bungalow by the Bay,Situated on a quiet block in the Rockaways our...,This is a real home lovingly re-built after Hu...,Situated on a quiet block in the Rockaways our...,https://a2.muscache.com/ic/pictures/43320896/6...,17253913,...,8.0,10.0,9.0,9.0,9.0,f,,,1,1.5
2,326908,https://www.airbnb.com/rooms/326908,20150501110800,2015-05-03,Cozy Mexican Inspired Private Room,,"Hi There, I'm Michelle and I am excited to sh...","Hi There, I'm Michelle and I am excited to sh...",https://a1.muscache.com/ic/pictures/3547451/b3...,1288422,...,9.0,10.0,10.0,10.0,9.0,f,,,1,2.4
3,4625178,https://www.airbnb.com/rooms/4625178,20150501110800,2015-05-02,Modern 1BD with exposed brick,"Newly renovated 1BD features dark hardwood, e...",,"Newly renovated 1BD features dark hardwood, e...",https://a0.muscache.com/ic/pictures/60151768/8...,8315139,...,10.0,10.0,10.0,10.0,10.0,f,,,1,0.5
4,3614041,https://www.airbnb.com/rooms/3614041,20150501110800,2015-05-02,Manhattan Cozy 1BR Apartment $60,"Cozy apartment, top of Manhattan 225th and Bro...",,"Cozy apartment, top of Manhattan 225th and Bro...",https://a1.muscache.com/ic/pictures/45521304/2...,18210143,...,7.0,7.0,9.0,9.0,8.0,f,,,1,0.2


In [7]:
airbnb_rental.shape

(27319, 68)

In [8]:
airbnb_rental.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'picture_url', 'host_id', 'host_url',
       'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_picture_url', 'street', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'city',
       'state', 'zipcode', 'market', 'country', 'latitude', 'longitude',
       'is_location_exact', 'property_type', 'room_type', 'accommodates',
       'bathrooms', 'bedrooms', 'beds', 'bed_type', 'square_feet', 'price',
       'weekly_price', 'monthly_price', 'guests_included', 'extra_people',
       'minimum_nights', 'maximum_nights', 'calendar_updated',
       'availability_30', 'availability_60', 'availability_90',
       'availability_365', 'calendar_last_scraped', 'number_of_reviews',
       'first_review', 'last_review', 'review_scores_rating',
       

In [9]:
airbnb_rental.latitude[0]

40.7815607857965

Plotting the first 100 listings on the map as there are way too many listing to put them all on the map.

In [10]:
for i in range(100):
    row = airbnb_rental.iloc[i, :]
    lat, long = row["latitude"], row["longitude"]
    folium.Circle(
            radius=10,
            location=[lat, long],
            popup=row["price"],
            color='black',
            fill=True
    ).add_to(m)
m

Foursquare credentials

In [11]:
CLIENT_ID = '#' # your Foursquare ID
CLIENT_SECRET = '#' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT=15

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Place', 
                  'Place Latitude', 
                  'Place Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

 A sample of the data to be used from Foursquare API

In [13]:
df_venues = getNearbyVenues(names=airbnb_rental['name'].iloc[0:10],
                                   latitudes=airbnb_rental['latitude'].iloc[0:10],
                                   longitudes=airbnb_rental['longitude'].iloc[0:10]
                                  )
df_venues.head()

Charming Studio - Central Park
Rockaway Bungalow by the Bay
Cozy Mexican Inspired Private Room
Modern 1BD with exposed brick
Manhattan Cozy 1BR Apartment $60
NYC West Side Apt Share*mini hostel
X - Large Studio Apt on the UES 
CENTRAL PARK SOUTH Cheap and Chic
Chic BDR 1 block from Central Park
Modern 2BR overlooking Central Park


Unnamed: 0,Place,Place Latitude,Place Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Charming Studio - Central Park,40.781561,-73.971238,American Museum of Natural History,40.781282,-73.973238,Science Museum
1,Charming Studio - Central Park,40.781561,-73.971238,Hayden Planetarium,40.781718,-73.973239,Planetarium
2,Charming Studio - Central Park,40.781561,-73.971238,American Museum of Natural History Museum Shop,40.780973,-73.973028,Souvenir Shop
3,Charming Studio - Central Park,40.781561,-73.971238,Rose Center for Earth and Space,40.781741,-73.973127,Planetarium
4,Charming Studio - Central Park,40.781561,-73.971238,Shakespeare Garden,40.779755,-73.969976,Garden


### Next <a name=next></a>

Since there are so many listings and some of them are very close togther requesting nearby venues from Foursquare API would be difficult.Instead, for each neighbourhood, we will request the available venues, then place each place listed in its neighbourhood. We then create a model that will predict the optimal price for that neighbourhood.