# Introduction/Business Problem

For this project I have chosen to go with a variation of the proposed idea for the course. 
It was here proposed to compare Toronto and New York City which we already have some experience with. 

## Problem 

We have a firm that want to expand and thus establish a new Coffee shop in either New York City (Lets call them "Bad Morning Coffee" just for the fun of it).<br/>
To do so they want to get an overview of how tough the competition is in each neighborhood of the city <br/>
So the problem is "basically" just to find out which district how the right amount of coffee shops so that it is not a "dead" district but not an overfilled either. <br/>

# Data section

So what data do we need to do this?. <br/>
First we need to obtain the geolocation data for Newyork. <br/>
Then we would have to explorer the venues of the different areas with focus on coffee shops, which would be done using Foursquare. <br/>
Last from the data obtained here we should be able to make an analysis/conclussion on where to put the new coffee shop. 

### Newyork Neighborhood

The newyork data will be taken from the website: https://geo.nyu.edu/catalog/nyu_2451_34572?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork-21253531&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ

### Newyork  Neighborhood Geodata

The geodata would be obtained using Folium 

### Foursquare

Foursquare will be used to explore the neighborhoods alongside the geolocation obtained from the Folium

## Methodology  

In this section we will go through all the data needed for the <b> Results and Discussion section </b> <br>
Lets start by installing Folium in the enviroment and importing the needed packages: 

In [166]:
!pip install folium



In [167]:
import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<b> NewYork data </b>

In [168]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

As the data can not be used in the current state I have made a seperate section where the data relevant for the problem will be extracted for the City. 

### Newyork Preperations 

For the Newyork data set most of the interesting informations is in the <b> Features key </b> as such lets isolate that: 

In [169]:
neighborhoods_dataNY = newyork_data['features']

Now lets transform the data framework into a <b> Pandas dataframe </b>. Which will make the exstraction easier. 

In [170]:
column_names_NY = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods_NY = pd.DataFrame(columns=column_names_NY)

In [171]:
for data in neighborhoods_dataNY:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods_NY = neighborhoods_NY.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Now we need to define an instance for the <b> Geocoder </b>, which will be called ny_explorer. 

In [172]:
address = 'New York City, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

We can now make a map over the areas that the data is presented for NewYork. But it is a bit big in the current state. 

In [173]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)
for lat, lng, borough, neighborhood in zip(neighborhoods_NY['Latitude'], neighborhoods_NY['Longitude'], neighborhoods_NY['Borough'], neighborhoods_NY['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

However, lets focus solo on Manhatten as this is the area with a general high density. 

In [174]:
manhattan_data = neighborhoods_NY[neighborhoods_NY['Borough'] == 'Manhattan'].reset_index(drop=True)

In [175]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


Creating the same map again this time for Manhatten only: 

In [176]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

For the next step we need to acces Foursquare

In [177]:
CLIENT_ID = 'BRZO5TNJDFDY0IF5QDE43W00XF5IV55CP5YUDN04U4TFD0SO' # your Foursquare ID
CLIENT_SECRET = 'D2A2CQ5Y4QX5L0LDNA0EJNR2DKIORUXH2UGWHLDK1Q2ICEEV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Now lets start by defining a function for getting the Venues for making this easier. 

In [178]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we can easily get the venues of manhatten by using the function defined. 

In [179]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


#### Analysis of each of the Neighborhoods  

After having gotten all the Neighborhoods, it is time to look at the Venue category (keep in mind that we are looking at Coffee Shops)

In [180]:
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

In [181]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

Lets define another function for getting the most commen venues.

In [182]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

And then sort the list

In [183]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)


Now we can make use of kclusters to show the groupings and apply geodata to the neighborhoods

In [184]:
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

In [185]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
manhattan_merged = manhattan_data
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Lasty we can now see on the Map what the different clusters contains of top three Venues

In [186]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        tooltip = ' Cluster ' + str(cluster),
        fill_opacity=0.7).add_to(map_clusters)
        
       
map_clusters

manhattan_merged[manhattan_merged['1st Most Common Venue'].str.contains('Coffee Shop')]

In [187]:
manhattan_merged[manhattan_merged[('2nd Most Common Venue')].str.contains('Coffee Shop')]

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Cocktail Bar,Indian Restaurant,Liquor Store,Sushi Restaurant,Park,Deli / Bodega
5,Manhattan,Manhattanville,40.816934,-73.957385,3,Seafood Restaurant,Coffee Shop,Bar,Deli / Bodega,Italian Restaurant,Park,Mexican Restaurant,Spanish Restaurant,Gastropub,Lounge
9,Manhattan,Yorkville,40.77593,-73.947118,0,Italian Restaurant,Coffee Shop,Gym,Bar,Deli / Bodega,Sushi Restaurant,Japanese Restaurant,Wine Shop,Diner,Mexican Restaurant
15,Manhattan,Midtown,40.754691,-73.981669,1,Hotel,Coffee Shop,Bakery,Clothing Store,Theater,Steakhouse,Sporting Goods Shop,Sushi Restaurant,Bookstore,Pizza Place
16,Manhattan,Murray Hill,40.748303,-73.978332,1,Japanese Restaurant,Coffee Shop,Hotel,Gym / Fitness Center,Sandwich Place,American Restaurant,Bar,Restaurant,Pizza Place,Italian Restaurant
25,Manhattan,Manhattan Valley,40.797307,-73.964286,1,Bar,Coffee Shop,Yoga Studio,Pizza Place,Playground,Thai Restaurant,Mexican Restaurant,Cosmetics Shop,Gym / Fitness Center,Latin American Restaurant


In [188]:
manhattan_merged[manhattan_merged[('3rd Most Common Venue')].str.contains('Coffee Shop')]

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,4,Gym,Discount Store,Coffee Shop,Sandwich Place,Yoga Studio,Pizza Place,Deli / Bodega,Department Store,Pharmacy,Diner
8,Manhattan,Upper East Side,40.775639,-73.960508,1,Exhibit,Italian Restaurant,Coffee Shop,Bakery,Gym / Fitness Center,Yoga Studio,Cosmetics Shop,French Restaurant,Juice Bar,Spa
23,Manhattan,Soho,40.722184,-74.000657,0,Clothing Store,Italian Restaurant,Coffee Shop,Boutique,Mediterranean Restaurant,Shoe Store,Bakery,Café,French Restaurant,Pizza Place
26,Manhattan,Morningside Heights,40.808,-73.963896,3,Park,American Restaurant,Coffee Shop,Bookstore,Burger Joint,Café,Deli / Bodega,Pub,Paper / Office Supplies Store,Seafood Restaurant


In [189]:
manhattan_merged[manhattan_merged[('4th Most Common Venue')].str.contains('Coffee Shop')]

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Manhattan,Lenox Hill,40.768113,-73.95886,0,Italian Restaurant,Sushi Restaurant,Pizza Place,Coffee Shop,Cocktail Bar,Gym,Gym / Fitness Center,Café,Burger Joint,Salon / Barbershop
14,Manhattan,Clinton,40.759101,-73.996119,1,Theater,American Restaurant,Gym / Fitness Center,Coffee Shop,Cocktail Bar,Sandwich Place,Gym,Hotel,Italian Restaurant,Spa
28,Manhattan,Battery Park City,40.711932,-74.016869,1,Park,Hotel,Gym,Coffee Shop,Memorial Site,Shopping Mall,Plaza,Burger Joint,Gourmet Shop,Playground


In [190]:
manhattan_merged[manhattan_merged[('5th Most Common Venue')].str.contains('Coffee Shop')]

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,Manhattan,Sutton Place,40.76028,-73.963556,1,Italian Restaurant,Gym / Fitness Center,Furniture / Home Store,Pizza Place,Coffee Shop,Park,Gym,Bar,Thai Restaurant,Beer Bar
37,Manhattan,Stuyvesant Town,40.731,-73.974052,2,Park,Pet Service,Gym / Fitness Center,Harbor / Marina,Coffee Shop,Baseball Field,Bar,Bistro,Heliport,Farmers Market
39,Manhattan,Hudson Yards,40.756658,-74.000111,1,Hotel,Italian Restaurant,Gym / Fitness Center,American Restaurant,Coffee Shop,Café,Boat or Ferry,Nightclub,Thai Restaurant,Gym


## Results and Discussion

Now that we have all the data we need to make sure that we can pick the right place for the coffee shop. 

Since we want a possible nice area, it would be ideal to look for a place with a decent amount of Coffee Shops but also a Park as one of the most common Venues. <br>
Looking at the last table we see that in Stuyvesant Town there is a Park as the main Commen venue and a Coffee shop as the 5th most commen venue <br> 
There is further more also harbor nearby, which could provide the guest that take to go a place to go nearby with the coffee 

Lets look at the map and see where it is actually placed regarding the rest of the clusters. From the map is is quit far away from the rest of the neighborhoods (or clusters) <br>
This would indicate that it could be cheaper to stay here, while there is the Inewood Hill very close by, as such the Coffee shop should focus a lot on take away orders. 

## Conclusion

Through this very simple procedure it was possible to see where the Coffee shop could be placed, by using a  few tables to get an overview of were there was a medium amount of Coffee shops while remaining near something to see for the people who prefer the walk and talk. <br>
For the case selected the Coffee shop should focus towards selling most of the Coffee as To-go as there is some beautiful scenery close by. 
