# The Battle of Neighborhoods - Capstone Project
#### Please Use https://nbviewer.jupyter.org/ to copy paste notebook URL to view maps

## 1) Business Problem

The goal of the business case here, is to understand the similarities and differences between 2 cities' venues (Specifically Downtown), and to be able to have a better insight of the demographic and decide on what neighbourhoods will be the most suitable to open a specific venue downtown.

The Two cities' downtown neighbourhoods will be compared to each other, based on the clusters they fall within. The City of NEW YORK will be compared to the city of TORONTO, to better understand the style of their venues, and how they are similar or dissimilar. 

This business case is aimed towards new business owners to allow them to decide on what Neighbourhood is the most suitable for their new venue investment such as; Restaurants, coffee shops or other entertainment venues.

## 2) Analytics Approach

K means clustering will be used to segment the cities' neighbourhoods and give an idea of how some neighbourhoods are similar or dissimilar to others, based on the venues' categories that exist in each of these neighbourhoods.

## 3) Data Sourcing and Requirements 

1. The first Data Set to be used is of the NEW YORK city - Including different cities, boroughs and Neighbourhoods within NY
2. The second Data Set will be from the Wikipedia page for Toronto city and its neighbourhoods, boroughs and Neighbourhoods
3. FourSquare Location Data

Both Data sets will utilize the Foursquare location Data, and all of the venues for each neighbourhood will be displayed. Then both Datasets will be merged together within a bigger dataframe that inlcudes different cities (NY and Toronto) and their borughs and Neighbourhoods

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import scipy as sc
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
import seaborn as sns


import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

!pip install pandas==1.0.3

import pandas as pd

#!pip install geocoder
#import geocoder # import geocoder

# initialize your variable to None
#lat_lng_coords = None

# loop until you get the coordinates
#while(lat_lng_coords is None):
#  g = geocoder.google('{}, Toronto, Ontario'.format(new_data['Postal Code']))
#  lat_lng_coords = g.latlng

#latitude = lat_lng_coords[0]
#longitude = lat_lng_coords[1]

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


## 4) Data Collection  

### 1 - Collecting Toronto Data

#### A) WEB SCRAPING

In [2]:
!pip install lxml



In [3]:
raw_data = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M", header=0)
raw_data = raw_data[0]
raw_data.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [4]:
raw_data.columns = ["Postal Code", "Borough","Neighborhood"]

#### B) Filtering only valid Boroughs - Removing "Not Assigned" Boroughs

In [5]:
new_data = raw_data[raw_data['Borough']!='Not assigned']
new_data.shape

(103, 3)

In [6]:
new_data.reset_index(drop=True,inplace=True)

In [7]:
new_data.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [8]:
new_data[new_data['Neighborhood']=='Not Assigned']

# No Neighbourhoods Exist with NOT ASSIGNED if they have a borough

Unnamed: 0,Postal Code,Borough,Neighborhood


In [9]:
print("The Data Frame has {} rows and {} columns".format(new_data.shape[0], new_data.shape[1]))

The Data Frame has 103 rows and 3 columns


#### C) Including the Latitude and Longitude

#### Getting Latitude and Longitude of Neighbourhoods - Google Geocoder is not complying - There is ARC GIS Option but not used

In [10]:
latlong = pd.read_csv('Geospatial_Coordinates.csv') 
latlong.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Joining Neighbourhoods with their Latitudes and Longitudes

In [11]:
toron_with_latlong = new_data.merge(right=latlong,on='Postal Code')
toron_with_latlong.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [12]:
# Getting Latitude and Longitude using ARC GIS

#!pip install geocoder
#import geocoder # Import geocoder package
#postal_code = new_data['Postal Code'] # Postal code for each neighborhood in Toronto, Canada

# Initialize your variable to 'None'
#lat_lng_coords = None

# Create an empty list to append the Latitude values
#lat_toronto = []

# Create an empty list to append the Longitude values
#lon_toronto = []

# Loop until getting the geographical coordinates
#for postal in postal_code:
#    g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal))
#    lat_lng_coords = g.latlng
#    lat_toronto.append(lat_lng_coords[0])
#    lon_toronto.append(lat_lng_coords[1])

#neigh_with_latlong = new_data.copy()

#neigh_with_latlong['Latitude'] = lat_toronto
#neigh_with_latlong['Longitude'] = lon_toronto
#neigh_with_latlong.head(12)

#### D) Exploring Neighbourhoods and Venues in Toronto

In [13]:
len(toron_with_latlong['Borough'].unique())

10

#### > We have 10 unique boroughs

In [14]:
toron_with_latlong['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

In [15]:
len(toron_with_latlong['Neighborhood'])

103

In [16]:
print('The Toronto dataframe has {} boroughs and {} neighborhoods.'.format(
        len(toron_with_latlong['Borough'].unique()),
        len(toron_with_latlong['Neighborhood'])
    )
)

The Toronto dataframe has 10 boroughs and 103 neighborhoods.


In [17]:
toron_with_latlong[toron_with_latlong[['Neighborhood']].duplicated()]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
13,M3C,North York,Don Mills,43.7259,-79.340923
46,M3L,North York,Downsview,43.739015,-79.506944
53,M3M,North York,Downsview,43.728496,-79.495697
60,M3N,North York,Downsview,43.761631,-79.520999


#### > We have 99 unique neighbourhoods and 103 total neighbourhoods with different postal codes

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [18]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


#### E) lets visualize ALL Boroughs and Neighbourhoods around TORONTO - Using Follium - Please use NB VIEWER WEBSITE and copy paste the project link

In [20]:
# create map of New York using latitude and longitude values
map_tor = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toron_with_latlong['Latitude'], toron_with_latlong['Longitude'], toron_with_latlong['Borough'], toron_with_latlong['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tor)  
    
map_tor

#### F) lets visualize ONLY Boroughs around Downtown toronto and nearby Boroughs - Using Follium - Please use NB VIEWER WEBSITE and copy paste the project link

In [21]:
toronto_df = toron_with_latlong[toron_with_latlong['Borough'].str.contains('Toronto')]
toronto_df.head(3)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [24]:
# create map of New York using latitude and longitude values
map_tor = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tor)  
    
map_tor

#### G) Exploring Venues around Downtown Toronto and Nearby Boroughs

In [44]:
# @hidden_cell
CLIENT_ID = '0IRSU1PXCTADSJ0GAWTDUVNWMXS352WY4JU4XJN4XBLW4BV1' # your Foursquare ID
CLIENT_SECRET = 'TCNAWWPCZ22X34PMOHGEA2O0S1BILYL3BS3TQRD1Y5YQDAUG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT= 100
radius = 500

Defining a function that retrieves a list venues for each neighbourhood and creates a dataframe

In [28]:
def getNearbyVenues(postal, borough, names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for postal, borough, name, lat, lng in zip(postal, borough, names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            postal,
            borough,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                  'Postal Code',
                  'Borough',
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [49]:
toronto_venues = getNearbyVenues(  postal = toronto_df['Postal Code'],
                                   borough = toronto_df['Borough'],
                                   names = toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )

#### H) The Compelete DataFrame containing Downtown Toronto Boroughs, Nearby Boroughs and their VENUES

In [50]:
print(toronto_venues.shape)
toronto_venues.head(5)

(1636, 9)


Unnamed: 0,Postal Code,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [51]:
toronto_venues['City'] = 'TORONTO'
toronto_columns = [toronto_venues.columns[-1]]+ toronto_venues.columns[:-2].tolist()
toronto_venues = toronto_venues[toronto_columns]
toronto_venues.head(4)

Unnamed: 0,City,Postal Code,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,TORONTO,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017
1,TORONTO,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809
2,TORONTO,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008
3,TORONTO,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698


## 2 - Collecting NEW York Data

In [32]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### A) Load and explore the data

Next, let's load the data.

In [52]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [53]:
#newyork_data

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [54]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [55]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### B) Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [56]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [57]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [58]:
for data in neighborhoods_data:
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [59]:
ny_with_latlong = neighborhoods

In [60]:
ny_with_latlong.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [61]:
print('The NEWYORK dataframe has {} boroughs and {} neighborhoods.'.format(
        len(ny_with_latlong['Borough'].unique()),
        ny_with_latlong.shape[0]
    )
)

The NEWYORK dataframe has 5 boroughs and 306 neighborhoods.


In [62]:
ny_with_latlong['Borough'].value_counts()

Queens           81
Brooklyn         70
Staten Island    63
Bronx            52
Manhattan        40
Name: Borough, dtype: int64

#### C) Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [63]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### D) Create a map of New York with neighborhoods superimposed on top.

In [64]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(ny_with_latlong['Latitude'], ny_with_latlong['Longitude'], ny_with_latlong['Borough'], ny_with_latlong['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Now let's simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [65]:
manhattan_data = ny_with_latlong[ny_with_latlong['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's get the geographical coordinates of Manhattan.

In [66]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


As we did with all of New York City, let's visualizat Manhattan the neighborhoods in it.

In [67]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### E) Define Foursquare Credentials and Version

In [68]:
# @hidden_cell

CLIENT_ID = '0IRSU1PXCTADSJ0GAWTDUVNWMXS352WY4JU4XJN4XBLW4BV1' # your Foursquare ID
CLIENT_SECRET = 'TCNAWWPCZ22X34PMOHGEA2O0S1BILYL3BS3TQRD1Y5YQDAUG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0IRSU1PXCTADSJ0GAWTDUVNWMXS352WY4JU4XJN4XBLW4BV1
CLIENT_SECRET:TCNAWWPCZ22X34PMOHGEA2O0S1BILYL3BS3TQRD1Y5YQDAUG


<a id='item2'></a>

####  F) Explore Neighborhoods in Manhattan

#### Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [71]:
def getNearbyVenues(borough,names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for borough, name, lat, lng in zip(borough,names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            borough,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                  'Borough',
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Another way to understand the NEST LIST COMPREHENSION - EACH VENUE LIST IS A LIST OF VENUES FOR EACH NEIGHBOURHOOD
#l=[]

#for venue_list in venues_list:
#    for item in venue_list:
#        l.append(item)
    
#nearby_venues = pd.DataFrame(l)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *manhattan_venues*.

In [72]:
# type your answer here

manhattan_venues = getNearbyVenues( borough = manhattan_data['Borough'],
                                    names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

#### Let's check the size of the resulting dataframe

In [73]:
print(manhattan_venues.shape)
manhattan_venues.head()

(3192, 8)


Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Manhattan,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Manhattan,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Manhattan,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Manhattan,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Manhattan,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


In [74]:
manhattan_venues['City'] = 'NEW YORK'
manhattan_columns = [manhattan_venues.columns[-1]]+ manhattan_venues.columns[:-2].tolist()
manhattan_venues = manhattan_venues[manhattan_columns]
manhattan_venues.head(4)

Unnamed: 0,City,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271
1,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204
2,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937
3,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582


### MERGING NEW YORK (MANHATTEN) DATA WITH (DOWNTOWN) TORONTO DATA 

In [75]:
toronto_venues.drop("Postal Code",axis=1,inplace=True)

In [76]:
toronto_venues.head(3)

Unnamed: 0,City,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,TORONTO,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017
1,TORONTO,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809
2,TORONTO,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008


In [77]:
manhattan_venues.head(3)

Unnamed: 0,City,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271
1,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204
2,NEW YORK,Manhattan,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937


In [101]:
cities = [toronto_venues,manhattan_venues]
two_cities = pd.concat(cities)

print("The DataFrame of two cities have %.f rows and %.f Columns" %(two_cities.shape[0],two_cities.shape[1]))
two_cities.head(3)

The DataFrame of two cities have 4828 rows and 8 Columns


Unnamed: 0,City,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,TORONTO,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017
1,TORONTO,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809
2,TORONTO,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008
