# Capstone Project - Battle of the Neighborhoods
### IBM/Coursera

## Table of Contents
- Introduction 
- Data
- Methodology
- Results and Discussion
- Conclusion

## Introduction

For this capstone project, we will try and find the optimal location to open a coffee shop in Berkeley, California, specifically targetting the area around the UC Berkeley campus. As this is an area that usually has a lot of foot traffic due to the university campus, it is a popular place for stakeholders looking to open a new location. 

In order to avoid overcrowding of coffee shops, we will be looking for areas with varying density of previously existing coffee shops in order to determine where the best location would be. We will also be measuring and taking into account the distance from the university of each prospective location, putting higher priority on nearby locations. 

Through this analysis, we will be able to find several locations to choose from that would fit the criteria of a good spot to open a new coffee shop. 

## Data

As stated in the introduction, the factors that we will be taking into account for our analysis are: 
- density of nearby restuarants and shops, coffee shops and others
- distance to UC Berkeley campus

We will be using the Foursquare API to obtain this information, both for the density of nearby restuarants and coffee shops, as well as for the distance to the UC Berkeley campus. 

In [154]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library

import math

### Foursquare Credentials
Hidden below

In [156]:
address = 'Berkeley, CA'
geolocator = Nominatim(user_agent="foursquare_agent")
latitude = 37.871794
longitude = -122.259988
def new_query(query, limit= 100): 
    LIMIT = limit
    search_query = query
    radius = 2000
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    # assign relevant part of JSON to venues
    venues = results['response']['venues']

    # tranform venues into a dataframe
    dataframe = json_normalize(venues)

    # keep only columns that include venue name, and anything that is associated with location
    filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
    dataframe_filtered = dataframe.loc[:, filtered_columns]

    # function that extracts the category of the venue
    def get_category_type(row):
        try:
            categories_list = row['categories']
        except:
            categories_list = row['venue.categories']
            
        if len(categories_list) == 0:
            return None
        else:
            return categories_list[0]['name']

    # filter the category for each row
    dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean column names by keeping only last term
    dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

    return dataframe_filtered

## Methodology

For this capstone project, we are specifically looking at areas around the University of California, Berkeley campus that have low density of coffee shops, but relatively higher density of other restaurants. The idea behind these parameters is to find various locations that would be ideal to open up a new coffee shop. 

We have setup a function to gather the necessary data through the Foursquare API above, primarily consisting of both coffee shop and restaurant name, type, and location in latitude and longitude values. 

We will next be exploring the locations surrounding our area of interest to detect the blocks or neighborhoods which are ideal to open a new shop. 

## Analysis

We are first querying to populate dataframes with both coffee shops and restaurants in the surrounding area.

In [157]:
coffee = new_query("Coffee")
restaurant = new_query("Restaurant")

In [158]:
coffee.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Peet's Coffee & Tea,Coffee Shop,"2501 Telegraph Ave.,",at Dwight Way,37.865053,-122.258319,"[{'label': 'display', 'lat': 37.865053, 'lng':...",764,94704,US,Berkeley,CA,United States,"[2501 Telegraph Ave., (at Dwight Way), Berkele...",,4af355c3f964a520b4ec21e3
1,Coffee Bean & Tea Leaf,Coffee Shop,,,37.86888,-122.259319,"[{'label': 'display', 'lat': 37.86888020816575...",329,94704,US,Berkeley,CA,United States,"[Berkeley, CA 94704]",,4f8b5202e4b0c71a7382c40f
2,Tully's Coffee,Coffee Shop,2475 U C Berkeley,at Telegraph Ave,37.868832,-122.25928,"[{'label': 'display', 'lat': 37.86883210583114...",335,94720,US,Berkeley,CA,United States,"[2475 U C Berkeley (at Telegraph Ave), Berkele...",,4cb7389d9c7ba35dd7879706
3,Peet's Coffee & Tea,Coffee Shop,"2255 Shattuck Avenue,",,37.868454,-122.267605,"[{'label': 'display', 'lat': 37.868454, 'lng':...",765,94704,US,Berkeley,CA,United States,"[2255 Shattuck Avenue,, Berkeley, CA 94704]",,4af357a4f964a520bfec21e3
4,Peet's Coffee & Tea,Coffee Shop,Sutardja Dai Hall,Hearst,37.874699,-122.258876,"[{'label': 'display', 'lat': 37.87469914631602...",337,94609,US,Berkeley,CA,United States,"[Sutardja Dai Hall (Hearst), Berkeley, CA 94609]",,4c6973982c29d13a493d0b41


In [159]:
restaurant.head()

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,crossStreet,id
0,Chang Luong Restaurant,Chinese Restaurant,2517 Durant Ave Ste D,37.868199,-122.258291,"[{'label': 'display', 'lat': 37.868199, 'lng':...",427,94704,US,Southside,Berkeley,CA,United States,"[2517 Durant Ave Ste D, Berkeley, CA 94704]",,4a8705e2f964a520260220e3
1,Saul's Restaurant & Deli,Deli / Bodega,1475 Shattuck Ave,37.880686,-122.269057,"[{'label': 'display', 'lat': 37.88068565173106...",1270,94709,US,,Berkeley,CA,United States,"[1475 Shattuck Ave (at Vine St), Berkeley, CA ...",at Vine St,40ff0380f964a520430b1fe3
2,Amanda's Feel Good Fresh Food Restaurant,Burger Joint,2122 Shattuck Ave,37.870582,-122.268249,,738,94704,US,,Berkeley,CA,United States,"[2122 Shattuck Ave (btw Center & Addison), Ber...",btw Center & Addison,4b342081f964a5204d2525e3
3,Venus Restaurant,American Restaurant,2327 Shattuck Ave,37.867137,-122.267488,"[{'label': 'display', 'lat': 37.86713681801792...",838,94704,US,,Berkeley,CA,United States,[2327 Shattuck Ave (btwn Bancroft Way & Durant...,btwn Bancroft Way & Durant Ave,441692f2f964a52005311fe3
4,Fat Apple's Restaurant & Bakery,Breakfast Spot,1346 Martin Luther King Jr Way,37.881505,-122.274085,"[{'label': 'display', 'lat': 37.88150479043963...",1644,94709,US,,Berkeley,CA,United States,"[1346 Martin Luther King Jr Way (at Rose St), ...",at Rose St,4a62327af964a52057c31fe3


We then add these values to a Folium map to visualize the locations.

In [160]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13)

folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Berkeley, CA',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the coffee shops as blue circle markers
for lat, lng, label in zip(coffee.lat, coffee.lng, coffee.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# add the restaurants as orange circle markers
for lat, lng, label in zip(restaurant.lat, restaurant.lng, restaurant.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='orange',
        popup=label,
        fill = True,
        fill_color='orange',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Just visually, there already appear to be locations with a higher density of restaurants (orange), with no coffee shops in the immediate vicinity (blue).

In [161]:
def calc_distance(lat1, lng1, lat2, lng2): 
    dlat = lat2 - lat1
    dlng = lng2 - lng1
    return math.sqrt(dlat*dlat + dlng*dlng) 

We then calculate the distances between the shops to isolate the locations of interest.

In [162]:
dists = []
iso = []
for i in range(len(restaurant)):
    restaurant_entry = restaurant.loc[i]
    lat1, lng1 = restaurant_entry.lat, restaurant_entry.lng
    minDist = float("inf")
    minIsoDist = float("inf")
    for k in range(len(restaurant)): 
        if k == i: 
            break
        next_rest = restaurant.loc[k]
        lat_next, lng_next = next_rest.lat, next_rest.lng
        iso_dist = calc_distance(lat1, lng1, lat_next, lng_next)
        minIsoDist = min(minIsoDist, iso_dist)
    iso.append(minIsoDist)
    for j in range(len(coffee)):
        coffee_entry = coffee.loc[j]
        lat2, lng2 = coffee_entry.lat, coffee_entry.lng
        dist = calc_distance(lat1, lng1, lat2, lng2)
        minDist = min(minDist, dist)
    dists.append(minDist)
restaurant['dist to closest coffee shop'] = dists
restaurant['iso'] = iso

We now have the top 5 locations of interest.

In [163]:
top5 = restaurant[restaurant['iso'] < 0.002].sort_values('dist to closest coffee shop', ascending=False).head(5)
top5

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,crossStreet,id,dist to closest coffee shop,iso
28,eVe Restaurant,New American Restaurant,1960 University Ave,37.871712,-122.2715,"[{'label': 'entrance', 'lat': 37.871621, 'lng'...",1011,94704.0,US,,Berkeley,CA,United States,"[1960 University Ave, Berkeley, CA 94704]",,4b9b2f72f964a52099f835e3,0.001908,0.000903
26,Mt. Everest Restaurant,Indian Restaurant,2598 Telegraph Ave,37.863341,-122.258875,"[{'label': 'display', 'lat': 37.86334111399608...",946,94704.0,US,,Berkeley,CA,United States,"[2598 Telegraph Ave (Parker St), Berkeley, CA ...",Parker St,560ee0ee498ef7d03fa7a084,0.0018,0.000774
9,Celia's Mexican Restaurant,Mexican Restaurant,1841 Euclid Ave,37.875565,-122.260166,"[{'label': 'display', 'lat': 37.87556465383228...",420,94709.0,US,,Berkeley,CA,United States,"[1841 Euclid Ave (btw Hearst and Ridge), Berke...",btw Hearst and Ridge,4b5f5095f964a52088b329e3,0.001554,0.000321
14,Asiana Garden Restaurant,Food,1841 Euclid Ave,37.875713,-122.260032,"[{'label': 'display', 'lat': 37.875713, 'lng':...",436,94709.0,US,,Berkeley,CA,United States,"[1841 Euclid Ave, Berkeley, CA 94709]",,4f32374b19836c91c7c191b3,0.001538,0.0002
17,Ennor's Restaurant Building,Building,2130 Center,37.870411,-122.267048,"[{'label': 'display', 'lat': 37.870411, 'lng':...",639,,US,,Berkeley,CA,United States,"[2130 Center, Berkeley, CA]",,5644c3e7498ea34252339cde,0.001514,0.001068


In [164]:
for lat, lng, label in zip(top5.lat, top5.lng, top5.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='purple',
        popup=label,
        fill = True,
        fill_color='purple',
        fill_opacity=0.6
    ).add_to(venues_map)
venues_map

Creating a clean map below to more clearly show the final results.

In [165]:
final_map = folium.Map(location=[latitude, longitude], zoom_start=13)
for lat, lng, label in zip(top5.lat, top5.lng, top5.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='purple',
        popup=label,
        fill = True,
        fill_color='purple',
        fill_opacity=0.6
    ).add_to(final_map)
final_map

## Results and Discussion

Our analysis shows that there are few prime locations that have a high density of restaurants with no coffee shops nearby, indicating a location in which a new coffee shop could be sustained. The 4 distinct locations that were chosen are relatively spread out in relation to the university campus, being north, west, and south of campus respectively. This range of locations is not too surprising, as it makes sense that such undertapped areas are spread out from each other. It also gives any prospective stakeholders a wider variety of locations to choose from, in case any particular neighborhood is not to their liking. 

## Conclusion 

The purpose of this capstone project was to detect prime target locations to open a new coffee shop around the UC Berkeley campus area. By leveraging the data made available through the Foursquare API, we were able to identify the restaurants and shops surrounding the campus and calculate their distance from both campus and from each other. Using this location data, we were able to find several locations of interest, which had high densities of restaurants yet low densities of coffee shops, making them prime targets for the construction of a new coffee shop. These stops varied a decent amount in their location relative to the campus, providing a suitable amount of options to stakeholders wanting to pursue the building of a new coffee shop in the area. 