# Data Science Capstone - Battle of the neighborhoods

## Which major city should you visit if you love Japanese food?

## Table of contents

- Introduction
- Data
- Methodology
- Analysis
- Results and Discussion
- Conclusion
- Resources

### *Introduction*

So, you love Japanese food so much that you can eat it for every meal. And you would like to visit a major city for a nice vacation and would like you endulge in your favorite cuisine but dont know where to go. This is the problem I aim to solve with my report by analyzing the density of Japanese restaurants in some major US cities. While not important to some, this report will help the tourist that would like to visit a popular destination in the US, that has a huge appetite for sushi and the like. 

### *Data*

I will use the FourSquare API to collect data about locations of Japanese restaurants in 5 major US cities which are: New York, San Francisco, Las Vegas, Seattle and Chicago. These are some of the most popular US cities to visit according to Trip Advisor and I believe that these cities have some of the best Japanese food the United States can offer. I will use the latitude and longitude of the cities within an API request to pull back the name, address and geospatial data on Japanese restaurants in the area. We will use this data to determine which city would be best to visit and reserve a hotel room.

### *Methodology*

We will be using statistics to solve the majority of our problem. We will obtain the mean location of the Japanese restaurants and then take the average of the distance of the venues to the mean coordinates in order to illustrate the density. 

We will then plot our results on a map so we can visualize our data in order to make the results clear.

Machine learning doesn't appear to be required for this particular issue as knowing the density of the restaurants should suffice to reach our answer.

### *Analysis*

Before we analyze any data, we will need to import the libraries required for the analysis. We will need the following libraries:

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

##### The cell below is hidden to secure FourSquare credentials

In [3]:
# The code was removed by Watson Studio for sharing.

##### This is where we will make a request to the FourSquare API and store the data in a json file

In [4]:
LIMIT = 500 
cities = ["New York, NY", 'Chicago, IL', 'San Francisco, CA', 'Las Vegas, NV', 'Seattle, WA']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d111941735") # This is the Category ID for Japanese Restaurant
    results[city] = requests.get(url).json()

##### This is where we will normalize our results in the json file in order map out our results. 

In [5]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

##### Let's use Folium to plot the results we have for our Japanese restaurant geospatial data on a map of their respective cities.

We can also take a look at how many Japanese restaurants are located in each city at the same time.

In [20]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of Japanese restaurants in {city} = ", results[city]['response']['totalResults'])
    print("The Map below will show the Top 100 results")

Total number of Japanese restaurants in New York, NY =  328
The Map below will show the Top 100 results
Total number of Japanese restaurants in Chicago, IL =  260
The Map below will show the Top 100 results
Total number of Japanese restaurants in San Francisco, CA =  270
The Map below will show the Top 100 results
Total number of Japanese restaurants in Las Vegas, NV =  262
The Map below will show the Top 100 results
Total number of Japanese restaurants in Seattle, WA =  237
The Map below will show the Top 100 results


##### Map of New York

In [19]:
maps[cities[0]]

##### Map of Chicago

In [21]:
maps[cities[1]]

##### Map of San Francisco

In [22]:
maps[cities[2]]

##### Map of Las Vegas

In [23]:
maps[cities[3]]

##### Map of Seattle

In [24]:
maps[cities[4]]

Now that we can visualize the density of Japanese restaurants in each city, we can see that New York appears to be the best option due to the high number of Japanese restaurants within close proximity to each other. This will make vacation planing easy as our eatery of choice will not inhibit the ability to search for the best hotel that will suit the price point for any tourist.

However, let's make sure we are correct by getting an accurate measure of the restaurant density in each city and then visualize them by plotting them on a map.

We will be using statistics to solve this portion of our problem. We will obtain the mean location of the Japanese restaurants and then take the average of the distance of the venues to the mean coordinates in order to illustrate the density.

In [25]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="red", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])
    
# Let's see what our results are without ploting them 
    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

New York, NY
Mean Distance from Mean coordinates
0.01809379278802192
Chicago, IL
Mean Distance from Mean coordinates
0.04497019707422612
San Francisco, CA
Mean Distance from Mean coordinates
0.02742247138198929
Las Vegas, NV
Mean Distance from Mean coordinates
0.0719585335788342
Seattle, WA
Mean Distance from Mean coordinates
0.037010855726453955


##### Map of New York

In [26]:
maps[cities[0]]

##### Map of Chicago

In [27]:
maps[cities[1]]

##### Map of San Francisco

In [28]:
maps[cities[2]]

##### Map of Las Vegas

In [29]:
maps[cities[3]]

##### Map of Seattle

In [30]:
maps[cities[4]]

### *Results and Discussion*

So what is the result of our analysis you say? We can see that by figuring the mean distance from the mean coordinates of Japanese restaurants in each city that New York has the highest density of restaurants which appears to make it ideal for a tourist that must have sushi or tempura to survive the day. 

The next best city to visit for those who see themselves as connoisseiurs of Japanese food would be San Francisco. It has the second highest density of Japanese restaurants. 

One thing that both of these great cities have in common when it comes to Japanese cuisine, they both have some of the best ports of the world. They are each world renowned for having the best sea food arrive at their ports each day.

### *Conclusion*

You really can't go wrong with any of these cities as they each have something great to offer outside of Japanese cuisine. However, the clear winner is good ol' New York City.

### *Resources*

- FourSquare API
- TripAdvisor's Top 25 most popular travel destinations in America