
<h1 align=center><font size = 5>Segmenting Starbucks in New York City</font></h1>

## Table of Contents

1. Introduction
2. Data acquisition and cleaning
3. Methodology
4. Results
5. Discussions
6. Conclusion

# 1. Introduction

## 1.1 Description of the problem and a discussion of the background

I am looking for a new appartment in New York and my problem is that I am addicted to coffee and I would like to find a suitable location where I can find the best coffee. My favourite coffee shop is **Starbucks** so **I would like to find and appartment that is very close to a Starbucks cofee** using a data science approach.

Of course this approach is valid for anyone who is supporter of any similar store or company and wants to find a close location to it.

I will use the Foursquare API to explore the neighborhoods in New York City and I will use its functions to get the distribution of the Starbucks locations in New York and finally I will identify whether the location I am looking for an appartment is the best one, having a Starbucks nearby, or I should look for a different location.

# 2. Data acquisition and cleaning

## 2.1 Description of the data and how it will be used to solve the problem <a id="4"></a>

For this project it will be utilised the dataset from New York available in the following link. 

Link: https://geo.nyu.edu/catalog/nyu_2451_34572

It will be also used a random location for an appartment in order to measure the distance between this apparment and the nearest Starbucks coffee shop.

Once the dataset from New York is extracted and processed, the Starbucks coffee shops of New York will be displayed in a map and after that it will be measured the distance between the appartment and the nearest Starbucks coffee shop.

## 2.2 Data acquisition

Before we get the data and start exploring it, we will download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

The files are in .json format are already available in a server (link: https://cocl.us/new_york_dataset) so they will be extracted running a `wget` to command and access the data.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Once the data are downloaded they will be incorporated in a dedicated file.

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Since we know that all the relevant data is in the *features* key, which is basically a list of the neighborhoods. We will define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Taking a look at the first item in the list it will be confirmed that the information required is the right one.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

## 2.3 Data cleaning

Once the data have been extracted the next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. We will start by creating an empty dataframe.

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

The dataframe will have the following structure:

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Once the dataframe is initialized the following step will be to fill the dataframe one row at a time.

In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

We will examine the head of the resulting dataframe.

In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Since we know that New York contains 5 boroughs and 306 neighborhoods, we will check that our dataframe contains the same information.

In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


# 3. Methodology

## 3.1 Exploratory Data Analysis

Initially the dataset will be shown in a map in order to confirm that the information is correct.
It will provide a location of all the neighborhoods of New York on a map.

We will use the Foursquare API to explore neighborhoods in New York City and extract the relevant information about Starbucks coffee shops in each of them.

We will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. 
This information will be used then to show these places in a map and finally to calculate the distance with our appartment.

#### Use of geopy library to get the latitude and longitude values of New York City.

The **geopy** library will be used to get the latitude and longitude values of New York City.
In order to define an instance of the **geocoder**, we need to define a user_agent. We will name our agent <em>ny_explorer</em>.

In [12]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7308619, -73.9871558.


#### Create a map of New York with neighborhoods superimposed on top.

After that a map of New York will be created superimposing all the neighborhoods on top of it.

In [13]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

However, for illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [14]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's get the geographical coordinates of Manhattan.

In [15]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


As we did with all of New York City, let's visualizat Manhattan the neighborhoods in it.

In [16]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

We will initialize Foursquare with our credentials.

In [17]:
CLIENT_ID = '3Y2Z21EYGPKDKK30YNMF4VKKOMRZOWHZWXCPJPRLXH13OLPP' # your Foursquare ID
CLIENT_SECRET = 'HK3LBKMREJ4TY35FUURH0LRUCA5LK4KMTZDNYOJ42MGXBSZ1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3Y2Z21EYGPKDKK30YNMF4VKKOMRZOWHZWXCPJPRLXH13OLPP
CLIENT_SECRET:HK3LBKMREJ4TY35FUURH0LRUCA5LK4KMTZDNYOJ42MGXBSZ1


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [18]:
manhattan_data.loc[0, 'Neighborhood']

'Marble Hill'

Get the neighborhood's latitude and longitude values.

In [19]:
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


#### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [20]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=3Y2Z21EYGPKDKK30YNMF4VKKOMRZOWHZWXCPJPRLXH13OLPP&client_secret=HK3LBKMREJ4TY35FUURH0LRUCA5LK4KMTZDNYOJ42MGXBSZ1&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=500&limit=100'

Double-click __here__ for the solution.
<!-- The correct answer is:
LIMIT = 100 # limit of number of venues returned by Foursquare API
-->

<!--
radius = 500 # define radius
-->

<!--
\\ # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL
--> 

Send the GET request and examine the resutls

In [21]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c94ace6db04f53b135d2cf0'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Marble Hill',
  'headerFullLocation': 'Marble Hill, New York',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 25,
  'suggestedBounds': {'ne': {'lat': 40.88105078329964,
    'lng': -73.90471933917806},
   'sw': {'lat': 40.87205077429964, 'lng': -73.91659997808156}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b4429abf964a52037f225e3',
       'name': "Arturo's",
       'location': {'address': '5198 Broadway',
        'crossStreet': 'at 225th St.',
        'lat': 40.87441177110231,
        'lng': -73.91027100981574,
        'labeledLatLngs': [{'label'

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [23]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Arturo's,Pizza Place,40.874412,-73.910271
1,Bikram Yoga,Yoga Studio,40.876844,-73.906204
2,Tibbett Diner,Diner,40.880404,-73.908937
3,Dunkin' Donuts,Donut Shop,40.877136,-73.906666
4,Starbucks,Coffee Shop,40.877531,-73.905582


And how many venues were returned by Foursquare?

In [24]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

25 venues were returned by Foursquare.


<a id='item2'></a>

## 2. Explore Neighborhoods in New York

#### Let's create a function to repeat the same process to all the neighborhoods in New York

We will create a function to explore all the venues in New York and extract this information to a dataframe.

In [25]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *new_york_venues*.

We will use this function to get all the neighborhoods of New York.

In [None]:
# Function to obtain all the neighborhoods of New York.

new_york_neigh = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )
new_york_venues=new_york_neigh

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park


Double-click __here__ for the solution.
<!-- The correct answer is:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
--> 

#### Let's check the size of the resulting dataframe

In [32]:
print(new_york_venues.shape)
new_york_venues.head()

(10250, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Cooler Runnings Jamaican Restaurant Inc,40.898276,-73.850381,Caribbean Restaurant
4,Wakefield,40.894705,-73.847201,SUBWAY,40.890656,-73.849192,Sandwich Place


In [33]:
new_york_Starbucks = new_york_venues[new_york_venues['Venue']=='Starbucks']
print(new_york_Starbucks.shape)

(40, 7)


In [34]:
new_york_Starbucks.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
131,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
135,Marble Hill,40.876551,-73.91066,Starbucks,40.873755,-73.908613,Coffee Shop
832,Belmont,40.857277,-73.888452,Starbucks,40.860636,-73.89027,Coffee Shop
1382,Brighton Beach,40.576825,-73.965094,Starbucks,40.577841,-73.961204,Coffee Shop
1896,Brooklyn Heights,40.695864,-73.993782,Starbucks,40.692469,-73.990971,Coffee Shop


## Dataset with all the Starbucks coffee shops

This table shows the location of all the Starbucks coffee shops located in New York.
It will be used to identify the best location for our appartment.

In [35]:
new_york_Starbucks

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
131,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
135,Marble Hill,40.876551,-73.91066,Starbucks,40.873755,-73.908613,Coffee Shop
832,Belmont,40.857277,-73.888452,Starbucks,40.860636,-73.89027,Coffee Shop
1382,Brighton Beach,40.576825,-73.965094,Starbucks,40.577841,-73.961204,Coffee Shop
1896,Brooklyn Heights,40.695864,-73.993782,Starbucks,40.692469,-73.990971,Coffee Shop
2478,Bath Beach,40.599519,-73.998752,Starbucks,40.595227,-74.000017,Coffee Shop
3048,Georgetown,40.623845,-73.916075,Starbucks,40.625874,-73.91746,Coffee Shop
3547,Washington Heights,40.851903,-73.9369,Starbucks,40.850961,-73.93833,Coffee Shop
3937,Upper East Side,40.775639,-73.960508,Starbucks,40.773533,-73.95981,Coffee Shop
4045,Yorkville,40.77593,-73.947118,Starbucks,40.772356,-73.949984,Coffee Shop


## Map with all Starbucks coffees

Once we have extracted the information about the location of all the Starbucks coffee shops in New York, we can use this information to display them.
The location of each venue will be displayed in a map of New York.

In [53]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7308619, -73.9871558.


In [54]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], new_york_Starbucks['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  

map_newyork

## Find the location of our appartments in regards to Coffee shops

Once we have obtained the distribution of the Starbuck coffee shops in New York, we will check which of our two possible appartments is located nearer one coffe shop.

We will use two dummy addresses for our appartments located in:
- Appartment 1: 198-100 W 76th Street"
- Appartment 2: 26 Tinton Avenue, new York

Firstly we will depict the location of appartment 1 with the Starbucks locations.

We will obtain first the coordinates of the apparment: 

In [55]:
address = '298 Mulberry Street, New York'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_Myapp = location.latitude
longitude_Myapp = location.longitude
print('The geograpical coordinate of my apparment 1 are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of my apparment 1 are 40.7308619, -73.9871558.


In [56]:
folium.Marker([latitude_Myapp, longitude_Myapp], popup='Appartment 1').add_to(map_newyork)
 
map_newyork

Secondly we will depict the location of appartment 2 with the Starbucks locations.

We will obtain first the coordinates of the apparment: 

In [57]:
address = '26 Tinton avenue, new york'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_Myapp = location.latitude
longitude_Myapp = location.longitude
print('The geograpical coordinate of my apparment 2 are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of my apparment 2 are 40.7308619, -73.9871558.


Once we know the address of our appartment we will depicted within the Starbucks distribution to get an idea whether this neighborhood contains this great coffee shop.

In [58]:

folium.Marker([latitude_Myapp, longitude_Myapp], popup='Appartment 2').add_to(map_newyork)
 
map_newyork

So looking to the map it is clear that the second apparment located in "Titon Avenue" has much more Starbucks near it than the one located in "W 76th Street".

# 6. Results

We can see in the map that the two possible locations for our appartment are shown within the location of all the Starbucks coffee shops.
It provides a first impression whether the selected location if appropriate taking into account our initial requirements.

So looking to the map it is clear that the second apparment located in "Titon Avenue" has much more Starbucks near it than the one located in "W 76th Street".

In [59]:
map_newyork

**So the selected apparment will be the one in "Titon Avenue"**

# 7. Discussion

The results are displayed showing the location of our appartment within the others places that we are interested, in this case, Starbucks cofee shop. 

The map provides a well first impression whether the selected location is adequate taking into account our initial premises. 
These initial requirements can be extended including any other type of venue that we may be interested in having close.

# 8. Conclusions

This project has used two main applications of data science as it the management and visualization of big amount of data. In this case it has been managed all the venues in New York in order to find the best suitable location for our appartment.

As a continuation of this project it could be extended with more automatic means to show the real distance between our location and the group of places that we are interested and even clustering these places in order to identify the neighborhoods more populated with this kind of venue.