# Capstone Data Science Project - with Foursquare API

## Introduction/Business Problem

Although modern vegan and vegetarian cuisine already is and becomes more popular in San Diego, CA, it seems like there is still much space for fine-dining plant-based restaurants and that some areas in San Diego County could be a good pick for embodying that idea.

The purpose of that project is to analyze chosen areas in San Diego County to find out where it could be a best choice to open upscale vegetarian/vegan restaurant oriented to serve breakfasts and brunches with cafe section, to encourage customers to not only come in for a dinner or lunch, but also to spend more time during sunny, laid back days hanging out in a high quality enviromantal friendly surroundings.

While exploring in search for a good area to start that kind of business we consider:
+ amount of restaurants nearby
+ amount of vegetarian/vegan restaurants nearby
+ median household income in the neighborhood
+ other interesting attractions around

## Data

Following resourses will be used to extract informations needed:

+ **Google Maps API geocoding** to find geolocations of points of interests 
+ **Foursquare API** for exploring neighborhoods, their venues, restaurants and attractions
+ **Median Household Income for San Diego County from the Census Bureau from datausa.io** website - csv including census geoid and median household income
+ **FCC Api** to convert geolocations to census geoid, to extract neighborhoods of interests from the Median Household Income csv


In [535]:
import requests
import pandas as pd
import numpy as np
import folium
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

from functools import reduce

In [124]:
GOOGLE_API_KEY=''

## Methodology
#### Gathering data about neighborhoods of interest

We list neighborhoods which are chosen as potentiall areas for opening the above described business

In [423]:
areas = ['West F Street, Encinitas, California',
'13th St, Del Mar, California',
'Girard Avenue, Village of La Jolla, California',
'Mission Blvd, Pacific Beach, California',
'University Av, Hillcrest, San Diego, California',
'Orange Ave, Coronado, California',
'North Park Way, North Park, San Diego, California',
'Rosecrans St, Point Loma, San Diego, California',
'Plaza St, Solana Beach, California',
'300 Mission Ave, Oceanside, California',
'600 Carlsbad Village Drive, Carlsbad, California',
'600 Fifth Avenue, San Diego, California',
'1900 India Street, San Diego, California']

#### Getting informations about areas geolocation using google geocoding API

In [424]:
def get_coords(areas, neighborhoods):
    neighborhoods = pd.DataFrame([])    
    for area in areas:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(GOOGLE_API_KEY, area)
        response = requests.get(url).json()
        zipcode = response['results'][0]['address_components'][5]['short_name'] if len(response['results'][0]['address_components']) < 7 else response['results'][0]['address_components'][6]['short_name'] 
        neighborhoods = neighborhoods.append({
            'col1': response['results'][0]['address_components'][0]['short_name'],
            'col2': response['results'][0]['address_components'][1]['short_name'],
            'col3': response['results'][0]['address_components'][2]['short_name'],
            'col4': response['results'][0]['address_components'][3]['short_name'],
            'col5': response['results'][0]['address_components'][4]['short_name'],
            'col6': response['results'][0]['address_components'][5]['short_name'],
            'col7': zipcode,
            'lat': response['results'][0]['geometry']['location']['lat'],
            'lng': response['results'][0]['geometry']['location']['lng']
        }, ignore_index=True)
    return neighborhoods

In [426]:
df_neighborhoods = get_coords(areas, neighborhoods)

#### Cleaning dataframe from unnecessary informations and organizing gathered data

In [430]:
df_neighborhoods.drop(['col6', 'col5'], axis=1, inplace=True)

In [435]:
df_neighborhoods.iat[9, 0] = '300 Mission Ave'
df_neighborhoods.iat[9, 1] = 'Oceanside'
df_neighborhoods.iat[10, 0] = '600 Carlsbad Village Dr'
df_neighborhoods.iat[10, 1] = 'Carlsbad'
df_neighborhoods.iat[11, 0] = '600 Fifth Ave'
df_neighborhoods.iat[11, 1] = 'Gaslamp Quarter'
df_neighborhoods.iat[12, 0] = '1900 India St'
df_neighborhoods.iat[12, 1] = 'Little Italy'

In [None]:
df_neighborhoods.drop(['col3', 'col4'], axis=1, inplace=True)

In [437]:
df_neighborhoods.rename(columns={'col1': 'street', 'col2': 'neighborhood', 'col7': 'zipcode'}, inplace=True)

#### Income data
We also want to have informations about the household yearly income in the area, so we can consider that indicator when comparing neighborhoods looking to choose a place to open more expensive venue and have more of target customers around. 
We firstly get data about the census block by latitude and longitude from Federal Communications Commision API and then find the average household income by census block in the data from the Census Bureau from the datausa.io website.  


In [439]:
df_neighborhoods['geoid'] = 0
for i, hood in enumerate(df_neighborhoods.iterrows()):
    area_url = 'https://geo.fcc.gov/api/census/area?lat={}&lon={}&format=json'.format(hood[1]['lat'], hood[1]['lng'])
    results = requests.get(area_url).json()
    
    df_neighborhoods.iat[i,5] = results['results'][0]['block_fips'][:11]

In [440]:
df_income = pd.read_csv('income.csv')

In [456]:
df_neighborhoods['income'] = 0
for i, hood in enumerate(df_neighborhoods.iterrows()):
    geoid = str(hood[1]['geoid'])
    area_income = df_income[df_income['ID Geography'].str.contains(geoid)]
    income = area_income[area_income['Year'] == 2018]['Household Income by Race']
    df_neighborhoods.iat[i,6] = income

In [457]:
df_neighborhoods

Unnamed: 0,street,neigborhood,zipcode,lat,lng,geoid,income
0,W F St,Encinitas,92024,33.043216,-117.294944,6073017501,112770
1,13th St,Del Mar,92014,32.957393,-117.265067,6073017200,110625
2,Girard Ave,Village of La Jolla,92037,32.843179,-117.273384,6073008200,83878
3,Mission Blvd,Pacific Beach,92109,32.79368,-117.254593,6073007910,70074
4,University Ave,Hillcrest,US,32.748499,-117.154809,6073000600,65089
5,Orange Ave,Coronado,92118,32.690154,-117.177271,6073010900,115987
6,North Park Way,North Park,92104,32.747411,-117.127709,6073001500,76887
7,Rosecrans St,Point Loma,US,32.724732,-117.229103,6073021400,72992
8,Plaza St,Solana Beach,92075,32.991698,-117.272541,6073017303,162500
9,300 Mission Ave,Oceanside,US,33.195067,-117.381072,6073018400,48004


We now have all the informations about the areas needed for our analysis.

#### Let's take a look at the map to see where the areas are situated

In [471]:
map_sd = folium.Map(location=[df_neighborhoods.iloc[3]['lat'], df_neighborhoods.iloc[3]['lng']], zoom_start=9)

for lat, lng, neighborhood in zip(df_neighborhoods['lat'], df_neighborhoods['lng'], df_neighborhoods['neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#008000',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sd)  
    
map_sd

## Venues data
Next step will be to gather informations about nearby venues around our points of interests. We will look in the range of 1000m.
For that purpose we use Foursquare API.

In [236]:
# credentials and consts for Foursquare API
CLIENT_ID = 'VM5NAACTYTTFSDBMLO0DOAUHBYVYSKD4XSEXPT504EES04ZQ' 
CLIENT_SECRET = 'PO4FPTWBNZHW5AS134Z3SFO5NLPWEDJGTO1XY2SXCQ4J4AHP'
VERSION = '20180605'
LIMIT = 1000

In [239]:
#function from the course Foursquare tutorial to get venues categories types
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [482]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['neighborhood', 
                  'neighborhood_lat', 
                  'neighborhood_lng', 
                  'venue', 
                  'venue_lat', 
                  'venue_lng', 
                  'venue_category']
    
    return nearby_venues 

In [483]:
sd_venues = getNearbyVenues(names=df_neighborhoods['neighborhood'], 
                                latitudes=df_neighborhoods['lat'], 
                                longitudes=df_neighborhoods['lng'])

Encinitas
Del Mar
Village of La Jolla
Pacific Beach
Hillcrest
Coronado
North Park
Point Loma
Solana Beach
Oceanside
Carlsbad
Gaslamp Quarter
Little Italy


In [492]:
f'Foursquere returned {len(sd_venues)} venues'

'Foursquere returned 1134 venues'

We are mostly interested in how many venues of the following categories are in each neighborhood:
+ restaurants - any kind : to specify if the neighborhood is a pretty popular area for foodies and how much potential competition is already there
+ coffe shops : as the goal of the business is to have partly cafe vibe, where people would like to hang out also for extended time during the day
+ breakfast spots : for the above reasons
+ vegetarian restarants


In [540]:
def get_count_df(category):
    return sd_venues[sd_venues['venue_category'].str.contains(category)].groupby('neighborhood')["venue_category"].count().reset_index(name=category)


In [558]:
restaurants = get_count_df('Restaurant')
coffee_shops = get_count_df('Coffee Shop')
cafes = get_count_df('Café')
breakfast_spots = get_count_df('Breakfast Spot')
vegetarian_restaurants = get_count_df('Vegetarian / Vegan Restaurant')

dfs = [restaurants, coffee_shops, cafes, breakfast_spots, vegetarian_restaurants]

df_counts = reduce(lambda left,right: pd.merge(left, right,on='neighborhood',
                                            how='outer'), dfs).fillna(0)

df_counts = df_counts.astype({'Café': int, 'Breakfast Spot': int, 'Vegetarian / Vegan Restaurant': int})

In [559]:
df_neighborhoods = df_neighborhoods.merge(df_counts, on='neighborhood')

In [560]:
df_neighborhoods

Unnamed: 0,street,neighborhood,zipcode,lat,lng,geoid,income,Restaurant,Coffee Shop,Café,Breakfast Spot,Vegetarian / Vegan Restaurant
0,W F St,Encinitas,92024,33.043216,-117.294944,6073017501,112770,27,8,2,2,2
1,13th St,Del Mar,92014,32.957393,-117.265067,6073017200,110625,17,2,1,1,0
2,Girard Ave,Village of La Jolla,92037,32.843179,-117.273384,6073008200,83878,18,8,3,4,0
3,Mission Blvd,Pacific Beach,92109,32.79368,-117.254593,6073007910,70074,20,4,4,4,0
4,University Ave,Hillcrest,US,32.748499,-117.154809,6073000600,65089,28,6,2,3,1
5,Orange Ave,Coronado,92118,32.690154,-117.177271,6073010900,115987,22,2,2,0,0
6,North Park Way,North Park,92104,32.747411,-117.127709,6073001500,76887,16,7,6,3,1
7,Rosecrans St,Point Loma,US,32.724732,-117.229103,6073021400,72992,24,5,0,1,0
8,Plaza St,Solana Beach,92075,32.991698,-117.272541,6073017303,162500,15,5,3,2,0
9,300 Mission Ave,Oceanside,US,33.195067,-117.381072,6073018400,48004,21,3,1,3,0


Then we want to pick venues categories, which are most interesting for us, not only specifically for the business potentiall competition but also other interesting points in the area to get into the equation attractiveness of the area while comparing neighborhoods.

In [504]:
sd_onehot = pd.get_dummies(sd_venues[['venue_category']], prefix="", prefix_sep="")
sd_onehot['neighborhood'] = sd_venues['neighborhood'] 

# adjust order
fixed_columns = [sd_onehot.columns[-1]] + list(sd_onehot.columns[:-1])
sd_onehot = sd_onehot[fixed_columns]

In [506]:
sd_onehot = sd_onehot[['neighborhood', 'American Restaurant', 'Argentinian Restaurant', 'Art Gallery', 
                       'Arts & Crafts Store', 'Asian Restaurant', 'Bagel Shop', 'Bakery', 
                       'Brazilian Restaurant', 'Breakfast Spot','Café', 'Cocktail Bar', 'Coffee Shop', 
                       'French Restaurant', 'Italian Restaurant', 'Japanese Restaurant','Latin American Restaurant', 
                       'New American Restaurant','Park','Other Great Outdoors', 'Restaurant', 'Seafood Restaurant', 
                       'South American Restaurant', 'Steakhouse', 'Sushi Restaurant', 'Tapas Restaurant', 
                       'Thai Restaurant', 'Theater','Vegetarian / Vegan Restaurant','Whisky Bar', 'Wine Bar']]

In [507]:
sd_onehot.head()

Unnamed: 0,neighborhood,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,Brazilian Restaurant,Breakfast Spot,...,Seafood Restaurant,South American Restaurant,Steakhouse,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar
0,Encinitas,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Encinitas,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Encinitas,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Encinitas,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
4,Encinitas,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [509]:
sd_grouped = sd_onehot.groupby('neighborhood').mean().reset_index()


Lets take a look at top 10 most common venues in each area:

In [561]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [520]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['neighborhood'] = sd_grouped['neighborhood']

for ind in np.arange(sd_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sd_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Carlsbad,Breakfast Spot,Café,Coffee Shop,Italian Restaurant,Wine Bar,Restaurant,Asian Restaurant,American Restaurant,Cocktail Bar,Bakery
1,Coronado,Park,Seafood Restaurant,American Restaurant,Bakery,Café,Coffee Shop,Art Gallery,Asian Restaurant,Bagel Shop,French Restaurant
2,Del Mar,American Restaurant,Seafood Restaurant,Italian Restaurant,Coffee Shop,Park,Breakfast Spot,Café,Latin American Restaurant,Wine Bar,Sushi Restaurant
3,Encinitas,Coffee Shop,American Restaurant,Italian Restaurant,New American Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Bakery,Breakfast Spot,Café,Art Gallery
4,Gaslamp Quarter,Steakhouse,American Restaurant,Café,Coffee Shop,Breakfast Spot,Theater,Brazilian Restaurant,Sushi Restaurant,Seafood Restaurant,Park
5,Hillcrest,Coffee Shop,American Restaurant,Thai Restaurant,Breakfast Spot,Restaurant,Café,Other Great Outdoors,Arts & Crafts Store,Bagel Shop,Bakery
6,Little Italy,Italian Restaurant,Coffee Shop,New American Restaurant,American Restaurant,Breakfast Spot,Park,Seafood Restaurant,Café,French Restaurant,Wine Bar
7,North Park,Coffee Shop,Café,American Restaurant,Breakfast Spot,Park,French Restaurant,Wine Bar,Sushi Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant
8,Oceanside,American Restaurant,Coffee Shop,Breakfast Spot,Seafood Restaurant,Sushi Restaurant,Italian Restaurant,Thai Restaurant,Steakhouse,Restaurant,Bakery
9,Pacific Beach,Seafood Restaurant,Café,Breakfast Spot,Coffee Shop,Park,Sushi Restaurant,American Restaurant,Asian Restaurant,Restaurant,Thai Restaurant


#### Clustering neighborhoods

Now let's cluster areas by the venues of interest and look at the data we gathered

In [615]:
n_clusters = 3

sd_grouped_clustering = sd_grouped.drop('neighborhood', 1)

kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(sd_grouped_clustering)

#cluster labels
kmeans.labels_[0:10] 

array([0, 1, 1, 0, 2, 0, 2, 1, 0, 1], dtype=int32)

In [616]:
neighborhoods_venues_sorted.drop('cluster_labels', 1, inplace=True)

In [617]:
neighborhoods_venues_sorted.insert(0, 'cluster_labels', kmeans.labels_)

In [618]:
sd_merged = df_neighborhoods
sd_merged = sd_merged.merge(neighborhoods_venues_sorted.set_index('neighborhood'), on='neighborhood')
sd_merged

Unnamed: 0,street,neighborhood,zipcode,lat,lng,geoid,income,Restaurant,Coffee Shop,Café,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,W F St,Encinitas,92024,33.043216,-117.294944,6073017501,112770,27,8,2,...,Coffee Shop,American Restaurant,Italian Restaurant,New American Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Bakery,Breakfast Spot,Café,Art Gallery
1,13th St,Del Mar,92014,32.957393,-117.265067,6073017200,110625,17,2,1,...,American Restaurant,Seafood Restaurant,Italian Restaurant,Coffee Shop,Park,Breakfast Spot,Café,Latin American Restaurant,Wine Bar,Sushi Restaurant
2,Girard Ave,Village of La Jolla,92037,32.843179,-117.273384,6073008200,83878,18,8,3,...,Coffee Shop,Seafood Restaurant,Italian Restaurant,Breakfast Spot,Café,New American Restaurant,Park,American Restaurant,Art Gallery,Bagel Shop
3,Mission Blvd,Pacific Beach,92109,32.79368,-117.254593,6073007910,70074,20,4,4,...,Seafood Restaurant,Café,Breakfast Spot,Coffee Shop,Park,Sushi Restaurant,American Restaurant,Asian Restaurant,Restaurant,Thai Restaurant
4,University Ave,Hillcrest,US,32.748499,-117.154809,6073000600,65089,28,6,2,...,Coffee Shop,American Restaurant,Thai Restaurant,Breakfast Spot,Restaurant,Café,Other Great Outdoors,Arts & Crafts Store,Bagel Shop,Bakery
5,Orange Ave,Coronado,92118,32.690154,-117.177271,6073010900,115987,22,2,2,...,Park,Seafood Restaurant,American Restaurant,Bakery,Café,Coffee Shop,Art Gallery,Asian Restaurant,Bagel Shop,French Restaurant
6,North Park Way,North Park,92104,32.747411,-117.127709,6073001500,76887,16,7,6,...,Coffee Shop,Café,American Restaurant,Breakfast Spot,Park,French Restaurant,Wine Bar,Sushi Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant
7,Rosecrans St,Point Loma,US,32.724732,-117.229103,6073021400,72992,24,5,0,...,Coffee Shop,Seafood Restaurant,Sushi Restaurant,American Restaurant,Thai Restaurant,Italian Restaurant,Park,Bakery,Breakfast Spot,New American Restaurant
8,Plaza St,Solana Beach,92075,32.991698,-117.272541,6073017303,162500,15,5,3,...,Coffee Shop,Seafood Restaurant,Café,American Restaurant,Breakfast Spot,Other Great Outdoors,Bakery,Italian Restaurant,Sushi Restaurant,Thai Restaurant
9,300 Mission Ave,Oceanside,US,33.195067,-117.381072,6073018400,48004,21,3,1,...,American Restaurant,Coffee Shop,Breakfast Spot,Seafood Restaurant,Sushi Restaurant,Italian Restaurant,Thai Restaurant,Steakhouse,Restaurant,Bakery


In [619]:
map_clusters = folium.Map(location=[df_neighborhoods.iloc[3]['lat'], df_neighborhoods.iloc[3]['lng']], zoom_start=9)

x = np.arange(n_clusters)
ys = [i + x + (i*x)**2 for i in range(n_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(sd_merged['lat'], sd_merged['lng'], sd_merged['neighborhood'], sd_merged['cluster_labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results
We look at every cluster and consider the area and how they are grouped, most common venues, the amount of potentially competitive venues of interest and the median household income in the area.

In [620]:
sd_merged.loc[sd_merged['cluster_labels'] == 0, sd_merged.columns[[1] + list(range(5, sd_merged.shape[1]))]]


Unnamed: 0,neighborhood,geoid,income,Restaurant,Coffee Shop,Café,Breakfast Spot,Vegetarian / Vegan Restaurant,cluster_labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Encinitas,6073017501,112770,27,8,2,2,2,0,Coffee Shop,American Restaurant,Italian Restaurant,New American Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Bakery,Breakfast Spot,Café,Art Gallery
4,Hillcrest,6073000600,65089,28,6,2,3,1,0,Coffee Shop,American Restaurant,Thai Restaurant,Breakfast Spot,Restaurant,Café,Other Great Outdoors,Arts & Crafts Store,Bagel Shop,Bakery
9,Oceanside,6073018400,48004,21,3,1,3,0,0,American Restaurant,Coffee Shop,Breakfast Spot,Seafood Restaurant,Sushi Restaurant,Italian Restaurant,Thai Restaurant,Steakhouse,Restaurant,Bakery
10,Carlsbad,6073017900,70500,16,3,4,4,0,0,Breakfast Spot,Café,Coffee Shop,Italian Restaurant,Wine Bar,Restaurant,Asian Restaurant,American Restaurant,Cocktail Bar,Bakery


In [621]:
sd_merged.loc[sd_merged['cluster_labels'] == 1, sd_merged.columns[[1] + list(range(5, sd_merged.shape[1]))]]


Unnamed: 0,neighborhood,geoid,income,Restaurant,Coffee Shop,Café,Breakfast Spot,Vegetarian / Vegan Restaurant,cluster_labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Del Mar,6073017200,110625,17,2,1,1,0,1,American Restaurant,Seafood Restaurant,Italian Restaurant,Coffee Shop,Park,Breakfast Spot,Café,Latin American Restaurant,Wine Bar,Sushi Restaurant
2,Village of La Jolla,6073008200,83878,18,8,3,4,0,1,Coffee Shop,Seafood Restaurant,Italian Restaurant,Breakfast Spot,Café,New American Restaurant,Park,American Restaurant,Art Gallery,Bagel Shop
3,Pacific Beach,6073007910,70074,20,4,4,4,0,1,Seafood Restaurant,Café,Breakfast Spot,Coffee Shop,Park,Sushi Restaurant,American Restaurant,Asian Restaurant,Restaurant,Thai Restaurant
5,Coronado,6073010900,115987,22,2,2,0,0,1,Park,Seafood Restaurant,American Restaurant,Bakery,Café,Coffee Shop,Art Gallery,Asian Restaurant,Bagel Shop,French Restaurant
6,North Park,6073001500,76887,16,7,6,3,1,1,Coffee Shop,Café,American Restaurant,Breakfast Spot,Park,French Restaurant,Wine Bar,Sushi Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant
7,Point Loma,6073021400,72992,24,5,0,1,0,1,Coffee Shop,Seafood Restaurant,Sushi Restaurant,American Restaurant,Thai Restaurant,Italian Restaurant,Park,Bakery,Breakfast Spot,New American Restaurant
8,Solana Beach,6073017303,162500,15,5,3,2,0,1,Coffee Shop,Seafood Restaurant,Café,American Restaurant,Breakfast Spot,Other Great Outdoors,Bakery,Italian Restaurant,Sushi Restaurant,Thai Restaurant


In [622]:
sd_merged.loc[sd_merged['cluster_labels'] == 2, sd_merged.columns[[1] + list(range(5, sd_merged.shape[1]))]]


Unnamed: 0,neighborhood,geoid,income,Restaurant,Coffee Shop,Café,Breakfast Spot,Vegetarian / Vegan Restaurant,cluster_labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Gaslamp Quarter,6073005300,54879,21,4,4,3,0,2,Steakhouse,American Restaurant,Café,Coffee Shop,Breakfast Spot,Theater,Brazilian Restaurant,Sushi Restaurant,Seafood Restaurant,Park
12,Little Italy,6073005800,104696,29,5,2,3,1,2,Italian Restaurant,Coffee Shop,New American Restaurant,American Restaurant,Breakfast Spot,Park,Seafood Restaurant,Café,French Restaurant,Wine Bar


Now we can take a look at the clusters and see that the Downtown neighborhoods were grouped together - mostly having restaurants, coffee shops, parks. 
Biggest cluster consists of places which have more significant occurance of coffee shops, a more of outdoors/parks, some art galleries.
The middle-sized cluster looks similar although looks like having more shops, like bakeries, wine bars, some art-related places as well. 

If we look at the income it doesnt really reflect the way areas were grouped. Beside that, looks like areas in the second cluster, and Encinitas from the first one might be most interesting considering targeted customers.

Unfortunately Foursquere is not providing too many restaurants categorized as vegetarian ones. We can see only 4 of those - two in Encinitas and one in for each  North Park, Hillcrest and Little Italy.

Biggest amount of restaurants we can see in Encinitas, Hillcrest and Little Italy. 
We observe pretty similar amount of breakfast spots in all places, except Coronado, where there is none according to Foursquare and only one in Point Loma.

## Discusson

Considering our reqirements and data we may get to the following conclusions.
Having not too much data about vegetarian and vegan restaurants from Foursquare we cannot rely too much on that information - we cant assume, that there is no demand for that kind of restaurant. 
Encinitas looks like an interesting area for our business - it seems like an interesting place with art galleries pretty common around, lot of breakfast places and most vegetarian spots suggest that there is a demend for it, and also the median income suggests that it might have a lot of potential customers for an upscale but relaxed restaurant.
We can have similar observations about Little Italy, although in both places that amount of restaurants may make it a highly competitive area for restaurant business.

Lets have a closer look at Del Mar, Coronado and Solana Beach, as the median household income suggests that those might be good picks.
Coronado shows a lot of restaurants venues but nothing about breakfasts also smaller amount of coffee places. It does have a lot of parks, some art galleries, but overall doesnt really seem like best pick, as we would probably like to see that there are already some similar businesses thriving around, but not to the extend, that we would have to spend extra resources on marketing etc, because of big competition.
Del Mar and Solana Beach looks most promising for us. They both have high median household income, good amount of restaurants already up and going but not as much as in other areas. There are some breakfast spots and coffee shops around but not too much, so our business providing that with some fresh idea might be interesting and competitive to those few which already exist there.


## Conclusion

After analysis of the neighborhoods we think that Solana Beach and Del Mar seem like best picks to open a fine-dine, vegetarian restaurant putting a focus on breakfasts with a relaxed venue to spend days with friends or family. 
Both are more of an expensive neighborhoods, which makes it a better place for an upscale restaurant, but they are not overcrowded with other dining venues.
We think though, that the areas would need further investigation. What should be taken into consideration is to research more about the vegetarian options around - maybe try another API, which would provide more specific or multi-category for restaurants. Other restaurants and their menus and prices should be considered - how many more pricy venues are already in the area and if they have satisfying vegetarian options. 
Other data to consider could be the average age of residents, their education level, occupation or industries they work in, as all of those can tell some informations about how enviromentally the neighborhood is oriented 