# Capstone Project - The Battle of the Neighborhoods - Week 1
#### Applied Data Science Capstone by IBM/Coursera

## Table of Contents

* [Introduction: Business Problem](#Introduction)
* [Data](#Data)

# 1. Introduction: Business Problem

The parent company for the business I work for are owned by a company in Japan. Therefore, upper management from Japan tend to make a lot of business trips to Dallas. In my interactions with a few of them, I tend to get asked where is some good Ramen or Sushi places to get a bite. Although there are serveral locations, I was curious if we could try to find an optimal location for a ramen/sushi shop. Specifically, this report will be for those interested in opening a **Ramen/Sushi Restaurant** in **Dallas, Texas**. 

Dallas is one of the top 5 largest Cities in Texas. Because of this there are so many different restaurants in Dallas, but our focus will be **locations that are not already crowded with restaurants**. We are also particularly looking into **areas with no Ramen/Sushi restaurants in nearby**. Ideally the new restaurant would open **as close to the center of Dallas as possible**, assuming that first two conditions are met.

We will use data science tools to fetch the raw data, visualize it then generate a few most promising areas based on above criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

# 2. Data

## 2.a What data is used and how will the problem be solved?
* We will be completely working on Foursquare data to explore and try to locate a spot for our new ramen/sushi restaurant, as stated before, at a location that is not already crowded with similar restaurants. In addition, being closer to hotels would be beneficial. 
* We will looking for the central area of our choice of venues to locate a new spot for our restaurant. Before we do so, we will first focus on pulling all venues present in and around central Dallas.

## 2.b Importing Libraries

In [1]:
# Import libraries
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

from bs4 import BeautifulSoup # scraping library

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import json # JSON files manipulation

from sklearn.cluster import KMeans # clustering algorithm

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

#! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


## 2.c Credentials and Core location

#### Credentials:

In [2]:
CLIENT_ID = 'XXXXXX' # your Foursquare ID
CLIENT_SECRET = 'XXXXXX' # your Foursquare Secret
ACCESS_TOKEN = 'XXXXXX' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 150

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent foursquare_agent, as shown below.

In [3]:
address = "Dallas, TX"

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude =location.latitude
longitude =location.longitude

dal='Dallas location : {},{}'.format(latitude,longitude)

print(dal)

Dallas location : 32.7762719,-96.7968559


## 2.d Search for a Ramen and Sushi Restaurants within 8 KM (5 Mile) Radius

#### Since we're looking in Dallas TX, let's find out if there is any ramen and sushi spots within 8KM

In [4]:
search_query_hotel = 'hotel'
search_query_ramen = 'Ramen'
search_query_sushi = 'Sushi'

radius = 8000
print(search_query_hotel + ' .... OK!')
print(search_query_ramen + ' .... OK!')
print(search_query_sushi + ' .... OK!')

hotel .... OK!
Ramen .... OK!
Sushi .... OK!


#### Define the corresponding URL

In [5]:
url_hotel = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_hotel, radius, LIMIT)
url_ramen = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_ramen, radius, LIMIT)
url_sushi = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_sushi, radius, LIMIT)

#### Send the GET Request and examine the results

In [6]:
results_hotel = requests.get(url_hotel).json()
results_ramen = requests.get(url_ramen).json()
results_sushi = requests.get(url_sushi).json()

#### Get relevant part of JSON and transform it into a pandas dataframe

In [7]:
# assign relevant part of JSON to venues
venues_hotel = results_hotel['response']['venues']
venues_ramen = results_ramen['response']['venues']
venues_sushi = results_sushi['response']['venues']

# tranform venues into a dataframe and merging both data
dataframe_hotel = pd.json_normalize(venues_hotel)
dataframe_ramen = pd.json_normalize(venues_ramen)
dataframe_sushi = pd.json_normalize(venues_sushi)

dataframe = pd.concat([dataframe_hotel,dataframe_ramen])
dataframe = pd.concat([dataframe,dataframe_sushi])

print("There are {} Ramen and Sushi Restaurants in Dallas".format(dataframe.shape[0]))

There are 87 Ramen and Sushi Restaurants in Dallas


#### Define information of interest and filter dataframe

In [8]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

#dataframe_filtered
df=dataframe_filtered[['name','categories','distance','lat','lng','id']]
df

Unnamed: 0,name,categories,distance,lat,lng,id
0,Hotel Indigo Dallas Downtown,Hotel,673,32.781939,-96.794337,4b74b9aff964a520bdee2de3
1,AC Hotel by Marriott Dallas Downtown,Hotel,457,32.780342,-96.796207,59dc1293a0215b62b5c7fe52
2,Omni Dallas Hotel,Hotel,709,32.775332,-96.804357,4eaf21a68b81d80bd7a1922b
3,Magnolia Hotel,Hotel,472,32.780029,-96.799215,4b1aa47cf964a52031ee23e3
4,Lorenzo Hotel,Hotel,405,32.772853,-96.795352,589c9d716c682b5414768746
5,Sheraton Dallas Hotel,Hotel,1001,32.785128,-96.794956,4ab8e946f964a520527d20e3
6,Cambria Hotel Downtown Dallas,Hotel,691,32.782402,-96.795681,59bb0324da5ede214149e1e9
7,Hotel Pool,Hotel Pool,746,32.775824,-96.804817,4fe771a4e4b0ed119cfb9225
8,Sweettooth Hotel,Pop-Up Shop,1567,32.785702,-96.809288,5b0081ff345cbe0038c03a43
9,Hotel ZaZa,Hotel,2031,32.79409,-96.801524,4b0679d2f964a52027ec22e3


#### Let's visualize the ramen and sushi spots that are nearby

In [9]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around Dallas

# add a red circle marker to represent the center of Dallas, TX
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Dallas',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the ramen/sushi restaurants as blue circle markers
for lat, lng, label in zip(df.lat, df.lng, df.name):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)
# display map
venues_map