## 1. Introduction

In this example, there is a client who want’s to buy an apartment in Manhattan, but needs specific venues in the vicinity of the apartment, so he hires me to find him a neighborhood that has all the ideal conditions. The task is to find all three required venues [“Supermarket”, “Gym”, “Pharmacy”] located in a radius of 500 meters.

## 2. Data

	1. Manhattan neighborhood location dataset,
	2. Manhattan Venues location dataset.

I’m going to need all neighborhood locations in Manhattan.
I will use Foursquare API to get all venues’ loaction in each neighborhood.
Then explore & modify the data to find out which neighborhoods has all three desired venues.

## 3. Methodology

Import all the dependencies that will be needed.

In [1]:
!conda install -c conda-forge geopy --yes
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', 10)
pd.set_option('display.max_rows', 20)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         393 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0

The following packages will b

### Explore and Download Datasets

In [2]:
df_ny = pd.read_csv("new_york.csv")
df_ny

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


Since the client wants to buy an apartment in Manhattan. Segment only the neighborhoods in Manhattan. So slice the original dataframe and create a new dataframe of the Manhattan data.

In [3]:
manhattan_data = df_ny[df_ny['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Geographical coordinates of Manhattan.

In [4]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


### Then I will utilize the Foursquare API to segment neighborhoods.

Define Foursquare Credentials and Version

In [5]:
CLIENT_ID = 'NQHE5XNYXFU1QDN2OWLAAZV0ALW3FX1J4F20ET10RAZQLRIJ'
CLIENT_SECRET = 'WJM3RVOJIY0DZZ4NEDSPSBR3ZMNSHXXDIGCC1AQPZ3PCA2LH'
VERSION = '20180605'

Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

In [6]:
radius = 500
LIMIT = 100
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']

From the Foursquare lab in the previous module, we know that all the information is in the items key.

Let's create a function called **getNearbyVenues** to do the process to all the neighborhoods in Manhattan

In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now I'm going to write the code to run the above function on each neighborhood and create a new dataframe called **"manhattan_venues"**.

In [10]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

manhattan_venues = pd.DataFrame(columns = column_names)

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Let's check the size of the resulting dataframe

In [9]:
manhattan_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.910660,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.910660,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.910660,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.910660,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.910660,Dunkin',40.877136,-73.906666,Donut Shop
...,...,...,...,...,...,...,...
3146,Hudson Yards,40.756658,-74.000111,Cachet Boutique Hotel,40.759773,-73.996460,Hotel
3147,Hudson Yards,40.756658,-74.000111,StarDust,40.759869,-73.996460,Nightclub
3148,Hudson Yards,40.756658,-74.000111,Jake's,40.757954,-74.002296,American Restaurant
3149,Hudson Yards,40.756658,-74.000111,Gray Line New York Sightseeing Cruises - Pier 78,40.759721,-74.003982,Harbor / Marina


### Extract required categories in Manhattan

the client only needs three categories of venues in vicinity

**[Supermarket, Gym, Pharmacy]**

In [10]:
required_venue_categories = ["Supermarket", "Gym", "Pharmacy"]

Drop all other rows

In [11]:
manhattan_required_venues = manhattan_venues[manhattan_venues["Venue Category"].isin(required_venue_categories)].reset_index(drop=True)
manhattan_required_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.910660,Rite Aid,40.875467,-73.908906,Pharmacy
1,Marble Hill,40.876551,-73.910660,Astral Fitness & Wellness Center,40.876705,-73.906372,Gym
2,Marble Hill,40.876551,-73.910660,Blink Fitness,40.877271,-73.905595,Gym
3,Chinatown,40.715618,-73.994279,Hong Kong Supermarket 香港超級市場,40.717596,-73.996173,Supermarket
4,Chinatown,40.715618,-73.994279,Stanley's Pharmacy,40.715782,-73.990544,Pharmacy
...,...,...,...,...,...,...,...
88,Flatiron,40.739673,-73.990947,Equinox Gramercy,40.740749,-73.985771,Gym
89,Flatiron,40.739673,-73.990947,Rowgatta,40.736900,-73.995094,Gym
90,Hudson Yards,40.756658,-74.000111,Brooklyn Fare,40.756130,-73.996614,Supermarket
91,Hudson Yards,40.756658,-74.000111,505W37 Gym,40.757275,-73.997797,Gym


Checking how many venues were returned for each neighborhood

In [12]:
manhattan_required_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,3,3,3,3,3,3
Carnegie Hill,4,4,4,4,4,4
Central Harlem,1,1,1,1,1,1
Chelsea,3,3,3,3,3,3
Chinatown,2,2,2,2,2,2
...,...,...,...,...,...,...
Turtle Bay,3,3,3,3,3,3
Upper West Side,1,1,1,1,1,1
Washington Heights,5,5,5,5,5,5
West Village,2,2,2,2,2,2


### Analyze Each Neighborhood

Hot encode the **Venue Category**

In [13]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_required_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_required_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Gym,Pharmacy,Supermarket
0,Marble Hill,0,1,0
1,Marble Hill,1,0,0
2,Marble Hill,1,0,0
3,Chinatown,0,0,1
4,Chinatown,0,1,0


Next, group rows by neighborhood and take the sum of the occurrence of category in each Neighborhood

In [14]:
grouped_neighborhood = manhattan_onehot.groupby('Neighborhood').agg(np.sum).reset_index()
grouped_neighborhood

Unnamed: 0,Neighborhood,Gym,Pharmacy,Supermarket
0,Battery Park City,3,0,0
1,Carnegie Hill,3,0,1
2,Central Harlem,1,0,0
3,Chelsea,1,1,1
4,Chinatown,0,1,1
...,...,...,...,...
29,Turtle Bay,1,2,0
30,Upper West Side,1,0,0
31,Washington Heights,2,1,2
32,West Village,1,0,1


The client needs all three categories, so filter out any 0s.

In [15]:
grouped_neighborhood = grouped_neighborhood[grouped_neighborhood['Supermarket'] > 0]
grouped_neighborhood = grouped_neighborhood[grouped_neighborhood['Gym'] > 0]
grouped_neighborhood = grouped_neighborhood[grouped_neighborhood['Pharmacy'] > 0]
grouped_neighborhood = grouped_neighborhood.reset_index(drop=True)
grouped_neighborhood

Unnamed: 0,Neighborhood,Gym,Pharmacy,Supermarket
0,Chelsea,1,1,1
1,Washington Heights,2,1,2


### So the two Neighborhoods that our client is interested in, are **"Chelsea"** and **"Washington Heights"**

Now in order to find the lat&lng of these two neighborhoods, we need to merge them with the original **"manhattan_data"**.

set the index to Neighborhood, in order to join with another dataframe.

In [16]:
grouped_neighborhood.set_index("Neighborhood", inplace=True)
grouped_neighborhood

Unnamed: 0_level_0,Gym,Pharmacy,Supermarket
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chelsea,1,1,1
Washington Heights,2,1,2


Create a list of the two neighborhoods' name.

In [17]:
neighborhoods = grouped_neighborhood.index.values
neighborhoods = list(neighborhoods)
neighborhoods

['Chelsea', 'Washington Heights']

with the original dataset, keep only the rows with the two neighborhoods' name, and call the dataset **manhattan_data_2**"

In [18]:
manhattan_data_2 = manhattan_data[manhattan_data['Neighborhood'].isin(neighborhoods)]
manhattan_data_2 = manhattan_data_2.reset_index(drop=True)
manhattan_data_2

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Washington Heights,40.851903,-73.9369
1,Manhattan,Chelsea,40.744035,-74.003116


Now join the two dataframes: **manhattan_data_2** & **grouped_neighborhood**

and call it **manhattan_merged**

In [19]:
manhattan_merged = manhattan_data_2.join(grouped_neighborhood, on='Neighborhood')
manhattan_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Gym,Pharmacy,Supermarket
0,Manhattan,Washington Heights,40.851903,-73.9369,2,1,2
1,Manhattan,Chelsea,40.744035,-74.003116,1,1,1


Finished. this is the Final result, what the client wanted.

### Map the two neighborhoods

## 4. Result

In [21]:
manhattan_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Gym,Pharmacy,Supermarket
0,Manhattan,Washington Heights,40.851903,-73.9369,2,1,2
1,Manhattan,Chelsea,40.744035,-74.003116,1,1,1


This is the dataset that the client was looking for, and now he knows where to search for his wished-for apartment!

Lastly, I wanted to map out where these two neighborhoods are located, so the client gets an idea where he buys the apartment.

In [20]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

## 5. Discussion

This is rather an easy example to find a result. Where a client wants to buy an apartment in a vicinity of desired venues.

## 6. Conclusion

This was one of many ideas about problems that can be solved using location data in addition to other datasets.

There are many other practicable approaches to solving problems that can be solved by using location data in addition to other datasets.

For example, if you want to establish a restaurant, and you are looking for the most promising area, getting population-density, age-structure, education level, gender-structure, purchasing-power in addition to location data, you can determine where the best place is for your price level and culinary delicacies you wish to offer.