# The Battle of Neighborhoods (Week 1)

## Part 1 : Introduction and Data Sections

In [1]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ------------------------------------------------------------
                       


**REPORT CONTENT**

**1.Introduction Section**: ⁃ Discussion of the business problem and the interested audience in this project.

**2.Data Section**: ⁃ Description of the data that will be used to solve the problem and the sources.

**3.Methodology section**: ⁃ Discussion and description of exploratory data analysis carried out, any inferential statistical testing performed, and if any machine learnings were used establishing the strategy and purposes.

**4.Results section** ⁃ Discussion of the results.

**.Discussion section** ⁃ Elaboration and discussion on any observations noted and any recommendations suggested based on the results.

**6.Conclusion section** ⁃ Report Conclusion.


## 1. Introduction Section :

**Discussion of the business problem and the audience who would be interested in this project.<br>Description of the Problem and Background
Scenario:**

I am a business analyst residing in Southbank Melbourne. I currently live within walking distance to Melbourne CBD and Flinders station and I enjoy many amenities and venues in the area, such as various international cuisine restaurants, cafes, shopping malls and entertainment. I currently work in a bank within walking distance to my home. However, I have been offered an opportunity to work for a leading consulting firm in Manhattan, NY. They will leverage my BA skills and newly acquired data science to help their business analytics team. I am unsure about taking the role and would like to use this opportunity to dedicate my final coursera project into investigating advantages and disadvatanges of taking this offer.The key question is : How much do I need to spend in comparison to Melbourne to find an apartment with my girl friend which is as convenient and affordable as mine in Melbourne, considering the salary difference. I can use available real estate websites and Google but the idea is to use and apply myself the learned tools during the course. In order to make a comparison and evaluation of the rental options in Manhattan NY, I must set some basis, therefore the apartment in Manhattan must meet the following demands:

- Apartment must be 2 or 3 bedrooms
- Desired location is near a metro station in the Manhattan area and within (1.5 km) radius or less than 30 minutes walking distance.
- Price of rent not exceed $6,000 per month
- Top amenities in the selected neighborhood shall be similar to current residence
- Desirable to have venues such as coffee shops, Asian and Japanese/Korean Restaurants, gym and Grocery stores nearby.
- As a reference, I have included a map of venues near current residence in Melbourne.

**Business Problem:**

The challenge is to find a suitable apartment for rent in Manhattan NY that complies with the demands on location, price and venues. The data required to resolve this challenge is described in section 2 below.

**Interested Audience**

I believe this is a relevant challenge with valid questions for anyone moving to other large city in the US, EU or Asia. The same methodology can be applied in accordance to demands as applicable. This case is also applicable for anyone interested in exploring starting or locating a new business in any city. Lastly, it can also serve as a good practical exercise to develop Data Science skills.


## 2. Data Section:

**Description of the data and its sources that will be used to solve the problem <br> Description of the Data:**

**The following data is required to answer the issues of the problem:**

- List of Boroughs and neighborhoods of Manhattan with their geodata (latitude and longitude)
- List of Subway metro stations in Manhattan with their relative location
- List of apartments for rent in Manhattan area with their addresses and price
- Preferably, a list of apartments for rent with additional information, such as price, address, area, # of beds
- Venues for each Manhattan neighborhoods ( that can be clustered)
- Venues for subway metro stations

**How the data will be used to solve the problem**

**The data will be used as follows:**

- Use Foursquare and geopy data to map top 10 venues for all Manhattan neighborhoods and clustered in groups ( as per Course LAB)
- Use foursquare and geopy data to map the location of subway metro stations , separately and on top of the above clustered map in order to be able to identify the venues and amenities near each metro station, or explore each subway location separately
- Use Foursquare and geopy data to map the location of rental places, in some form, linked to the subway locations.
- Create a map that depicts, for instance, the average rental price per square ft, around a radious of (1.5 km) around each subway station - or a similar metric. I will be able to quickly point to the popups to know the relative price per subway area.
- Addresses from rental locations will be converted to geodata( lat, long) using Geopy-distance and Nominatim.
- Data will be searched in open data sources if available, from real estate sites if open to reading, libraries or other government agencies such as Metro New York MTA, etc.

**The procesing of these DATA will allow to answer the key questions to make a decision:**

- What is the cost of rent (per square ft) around a mile radius from each subway metro station?
- What is the area of Manhattan with best rental pricing that meets criteria established?
- What is the distance from work place ( Park Ave and 53 rd St) and the tentative future home?
- What are the venues of the two best places to live? How the prices compare?
- How venues distribute among Manhattan neighborhoods and around metro stations?
- Are there tradeoffs between size and price and location?
- Any other interesting statistical data findings of the real estate and overall data.



## Reference of venues around current residence in Melbourne for comparison to Manhattan place

In [2]:
# Riverside Quay Southbank (Riverside Quay) Southbank VIC 3006 Australia
address = 'Riverside Quay, Southbank'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Melbourne home are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Melbourne home are -37.8210248, 144.965629.


In [3]:
neighborhood_latitude=-37.8212898
neighborhood_longitude=144.9644173

In [4]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
CLIENT_ID = 'CETGLD3C3BHQLWQYAVNIGTTHRNOCSMO3GDQ1KYN3V0RPQZVA' # My Foursquare ID
CLIENT_SECRET = 'X5CYNQD03LMAYUQNXYVHWUSI3D24LHWL1MN12OVWN5HKJP3K' # My Foursquare Secret
VERSION = '20180604'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CETGLD3C3BHQLWQYAVNIGTTHRNOCSMO3GDQ1KYN3V0RPQZVA
CLIENT_SECRET:X5CYNQD03LMAYUQNXYVHWUSI3D24LHWL1MN12OVWN5HKJP3K


In [5]:
url= 'https://api.foursquare.com/v2/venues/explore?&client_id=CETGLD3C3BHQLWQYAVNIGTTHRNOCSMO3GDQ1KYN3V0RPQZVA&client_secret=X5CYNQD03LMAYUQNXYVHWUSI3D24LHWL1MN12OVWN5HKJP3K&v=20180604&ll=-37.8212898,144.9644173&radius=500&limit=100'
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=CETGLD3C3BHQLWQYAVNIGTTHRNOCSMO3GDQ1KYN3V0RPQZVA&client_secret=X5CYNQD03LMAYUQNXYVHWUSI3D24LHWL1MN12OVWN5HKJP3K&v=20180604&ll=-37.8212898,144.9644173&radius=500&limit=100'

In [6]:
results = requests.get(url).json()
#results

In [7]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [8]:
venues = results['response']['groups'][0]['items']
    
SBnearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
SBnearby_venues =SBnearby_venues.loc[:, filtered_columns]

# filter the category for each row
SBnearby_venues['venue.categories'] = SBnearby_venues.apply(get_category_type, axis=1)

# clean columns
SBnearby_venues.columns = [col.split(".")[-1] for col in SBnearby_venues.columns]

SBnearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,Southbank Promenade,Pedestrian Plaza,-37.819959,144.965467
1,Yarra River,River,-37.819684,144.965115
2,Broad Bean Organic Grocer,Grocery Store,-37.822588,144.966912
3,Arbory Bar & Eatery,Bar,-37.81902,144.966004
4,Candela Nuevo,Cocktail Bar,-37.818614,144.962604
5,Crown Promenade,Plaza,-37.821669,144.960354
6,Arbory Afloat,Bar,-37.81943,144.966338
7,Melbourne Recital Centre,Performing Arts Venue,-37.823685,144.967599
8,Boccata Italian Deli,Café,-37.822355,144.962986
9,Arts Centre Melbourne,Opera House,-37.82162,144.968808


## Map of Southbank in Melbourne with venues near residence place - for reference

In [9]:


# create map of Southbank Melbourne using latitude and longitude values
map_sb = folium.Map(location=[latitude, longitude], zoom_start=20)

# add markers to map
for lat, lng, label in zip(SBnearby_venues['lat'], SBnearby_venues['lng'], SBnearby_venues['name']):
    label = folium.Popup(label, parse_html=True)
    folium.RegularPolygonMarker(
        [lat, lng],
        number_of_sides=4,
        radius=10,
        popup=label,
        color='blue',
        fill_color='#0f0f0f',
        fill_opacity=0.7,
    ).add_to(map_sb)  
    
map_sb

