<h1 align=center><font size = 5>Finding Coffee Shops in Manhattan's Neighborhoods</font></h1>

## Introduction

Tomorrow i'm going to Manhattan Borough, Newyork City for a short trip. As an ardent fan of coffee, i would like to know which neighborhood in Manhattan has the highest number of coffee shops. Knowing this would help me try out as many coffee shops as possible during my short visit.

In order to do so, i'll first download and tranform the necessary dataset. Then i'll create a plot of all Manhattan's neighborhoods to get familiar with the area. Next, i'll use the **explore** function to get the venue categories in each neighborhood, and then use this information to sort the neighborhoods by the number of coffee shops in each one.

Finally, i'll pick the top neighborhood listed in the final dataframe as the destination to visit.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. Download and Transform Dataset

2. Plotting to visualize Manhattan's neighborhoods

3. Use Foursquare to get all neighborhoods' venues information

4. Analyze the neighborhoods to get the number of coffee shops for each one
   
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np      # library to handle data in a vectorized manner

import pandas as pd     # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json             # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans



#!conda install -c conda-forge geopy --yes           # uncomment to run this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim               # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes    # uncomment this line if you haven't completed the Foursquare API lab
import folium                                       # map rendering library


import requests                                     # library to handle requests
#from pandas.io.json import json_normalize           # tranform JSON file into a pandas dataframe


print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

# 1. Download and Transform Dataset

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


## Load and explore the data

Next, let's load the data.

In [3]:
with open('newyork_data.json') as dulieu:    # with open() as ... :  #
    newyork_data = json.load(dulieu)

Let's take a quick look at the data.

In [None]:
newyork_data

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

## Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python __dictionaries into a *pandas* dataframe__. 

__So let's start by creating an empty dataframe.__

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Then let's loop through the data and fill the dataframe __one row at a time__.

In [9]:
for data in neighborhoods_data:
    #borough = neighborhood_name = data['properties']['borough']   # ???
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)         

Quickly examine the resulting dataframe.

In [19]:
print(neighborhoods.shape)
neighborhoods.head()

(306, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585



### Let's __slice__ the original Newyork dataframe and create a new dataframe of the Manhattan data.

In [22]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

print(manhattan_data.shape)
manhattan_data.head()

(40, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


# 2. Get the geographical coordinates of Manhattan and plot all of its neighborhoods.

In [16]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)

latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


## Let's visualize Manhattan and all of the neighborhoods in it.

In [24]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

# 3. Utilizing the Foursquare API to explore all neighborhoods' venues information.


<a id='item2'></a>

In [62]:
import numpy as np
import pandas as pd
import requests    

CLIENT_ID = 'Q1SRIERBH0HDZQ3JHT2DBQPHHZUNCAKGVOJPRHX0IHOFGCXK' # your Foursquare ID
CLIENT_SECRET = 'W22UMOKGFOEX0BRKNZ2EBRUXX3CBLB5BWSJS2C0EB3GUCFOL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT=400

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Now write the code to run the above function and create a new dataframe called *manhattan_venues*.

In [63]:
# type your answer here

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

# 4. Analyze the neighborhoods to get the number of coffee shops for each one

In [99]:
print(manhattan_venues.shape)
print("Unique Neighborhoods:" , len(manhattan_data['Neighborhood'].unique()))
manhattan_venues.head()

(3169, 7)
Unique Neighborhoods: 40


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


## Select rows that have Coffee Shop as the Venue Category attribute

In [117]:
selecteddf=manhattan_venues.loc[manhattan_venues['Venue Category']=='Coffee Shop']

print(selecteddf.shape)
selecteddf.head()

(146, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
9,Marble Hill,40.876551,-73.91066,Starbucks,40.873755,-73.908613,Coffee Shop
21,Marble Hill,40.876551,-73.91066,Starbucks,40.873234,-73.90873,Coffee Shop
72,Chinatown,40.715618,-73.994279,Little Canal,40.714317,-73.990361,Coffee Shop
100,Chinatown,40.715618,-73.994279,Cafe Grumpy,40.715069,-73.989952,Coffee Shop


In [118]:
coffeedf=selecteddf[['Neighborhood','Venue Category']]
coffeedf.reset_index(drop=True,inplace=True)

print(coffeedf.shape)
coffeedf.head()

(146, 2)


Unnamed: 0,Neighborhood,Venue Category
0,Marble Hill,Coffee Shop
1,Marble Hill,Coffee Shop
2,Marble Hill,Coffee Shop
3,Chinatown,Coffee Shop
4,Chinatown,Coffee Shop


## Count the number of Coffee Shop for EACH neighborhood

In [123]:
countdf=coffeedf.groupby(['Neighborhood']).count()
countdf.head(10)

Unnamed: 0_level_0,Venue Category
Neighborhood,Unnamed: 1_level_1
Battery Park City,4
Carnegie Hill,7
Chelsea,9
Chinatown,2
Civic Center,8
Clinton,4
East Village,3
Financial District,11
Flatiron,2
Gramercy,4


<a id='item3'></a>

## Find the neighborhood(s) which has highest number of bookstores: by sorting the dataframe using the Bookstore column

In [122]:
finaldf=countdf.sort_values(by=['Venue Category'],ascending=False)

finaldf.head(10)

Unnamed: 0_level_0,Venue Category
Neighborhood,Unnamed: 1_level_1
Financial District,11
Chelsea,9
Civic Center,8
Carnegie Hill,7
Lenox Hill,6
Midtown,6
Yorkville,5
Upper East Side,5
Turtle Bay,5
Sutton Place,5


## RESULT:  From the dataframe above, we see Financial District as the neighborhood with the highest number of coffee shops. 

## We should go there and explore the area!