# IBM DS Capstone Project  
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [1. Introduction](#introduction)
* [1.1. Business Problem](#business_problem)
* [1.2. Problem Discussion](#problem_discussion)
* [1.3. Target Audience](#target_audience)
* [2. Data](#data)
* [3. Methodology](#methodology)
* [4. Results and Discussion](#results)
* [5. Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

Belgium is known all over the world for making unbeatable chocolates. It is paradise for the chocolate lovers. The country has a long and illustrious history of chocolate making. With around 2,000 chocolate companies and shops all over Belgium, the country remains one of the reigning producers and exporters of chocolate in the world. Based on available figures, Belgium exports more than 400,000 tons of chocolate with an annual turnover of over 4 billion euros.  
Behind every top chocolate brand, stands a team of top chocolatiers. They use their knowledge, experience and craftsmanship to create the finest and sophisticated pralines, using the best products: high quality Belgian chocolate. They don’t shy away from the latest innovation and technological developments in the chocolate sector. And that makes them award-winning in several international competitions like the Patisserie World Cup.

### 1.1. Business Problem <a name="business_problem"></a>

A successful Belgian chocolatier is going to expand his business into the United States. Los Angeles is decided to be the starting point to open a new Belgian coffee shop combined with chocolate shop. Since Los Angeles is so big and has lots of different coffee shops and chocolate chops developed by famous brands, my client needs deeper insight from available data in order to decide where to establish his first Belgian coffee shop in the US. Another problem is that LA has very high lease rents for retail property.  
To solve this business problem, we are going to cluster LA neighborhoods in order to recommend venues and the current average rent of lease in order business owner could make a decision to start a coffee shop. For this purpose, we will try to find the optimal solution in terms of competitive location, comfortable lease rents, as well as surrounding venues.

### 1.2. Problem Discussion <a name="problem_discussion"></a>

Let's discuss the above mentioned problem statements. First of all, we know that our client, famous Belgian chocolatier, wants to lease a retail place for his unique coffee shop combined with chocolate shop. Also he needs to find out the level of competition - how many coffee shops and restaurants are there in different neighborhoods. If there are more than 2-5 coffee shops / café / dessert Shop in a neighborhood, then that would be a great risk to open new coffee shops in that neighborhood. Selecting a place where there is less or no coffee shops / café / dessert shop would be of great choice, considering the lease rent of neighborhood too. Places like Downtown, Movie theatre, Parks, Malls & Gas stations would help his business running.

### 1.3. Target Audience  <a name="target_audience"></a>

The target audience is broad, it ranges from any company which is going to open new business entity in LA, tourists and those who are passionate about coffe shops with wide range of Belgian chocolate.

## 2. Data <a name="data"></a>

This project will rely on public data from real estate agencies and Foursquare.



For this project we just need to analyse the current lease rent range. So I collect the lease rent data from open sources like https://www.rentcafe.com/average-rent-market-trends/us/ca/san-francisco/ and https://www.zillow.com/research/data/ according to neighborhoods, so that it's easy for us to check the lease rent data. Prepared data I have uploaded on my github repository.  

Los Angeles is really large city (has more than 100 neighborhoods) and due to the limitations in the number of calls for the Foursquare API, we're going to analyze only 50 neighborhoods excluding known in adance the most expensive locations like Santa Monica, North of Montana, Pacific Palisades, etc.

The Foursquare API will be used to obtain the geographical location data for Los Angeles. These will be used to explore the venues in the neighbourhoods of LA. The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the Belgian coffee shop.

The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the Belgian coffee shop.

The data from the lease rent dataset and location, as well as Foursquare will be explored by considering the venues within the neighborhoods of LA. These neighborhoods' coffee shops / restaurants would be checked in terms of the types of coffee shops / café / dessert Shop within a certain mile radius and the size of lease rent. Due to Foursquare restrictions, the number of venues will be limited to 100 venues. The proximity to Downtown, Movie theatre, Parks, Malls & Gas stations and other amenities would be considered.

In [2]:
import numpy as np # library for vectorized computation

import pandas as pd # library to process data as dataframes
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim     # convert an address into latitude and longitude values

!pip -q install geocoder
import geocoder

import requests # library to handle requests
from pandas.io.json import json_normalize    # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt # plotting library
import matplotlib.cm as cm
import matplotlib.colors as colors

# map rendering library
!conda install -c conda-forge folium=0.5.0 --yes
import folium                                    

from sklearn.cluster import KMeans

from bs4 import BeautifulSoup # html parsing library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    geopy-1.19.0               |             py_0          53 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.19.0-py_0       conda-forge

The following packages will be UPDATED:

   

#### Explore and Understand Data

After importing the necessary libraries, we download the data from my Github repository as follows:

In [3]:
git = 'https://raw.githubusercontent.com/tarastsukarev/Coursera_Capstone/master/rentdata.csv'
LA_rentdata = pd.read_csv(git)
LA_rentdata.head(10)

Unnamed: 0,State,City,Neighborhood,Average Rent (per SqFoot)
0,CA,Los Angeles,Reseda,2.03
1,CA,Los Angeles,Eagle Rock,2.05
2,CA,Los Angeles,Vermont - Slauson,2.06
3,CA,Los Angeles,Van Nuys,2.1
4,CA,Los Angeles,Tarzana,2.11
5,CA,Los Angeles,Gramercy Park,2.13
6,CA,Los Angeles,Mount Washington,2.14
7,CA,Los Angeles,Baldwin Hills - Crenshaw,2.15
8,CA,Los Angeles,Montecio Heights,2.2
9,CA,Los Angeles,West Hills,2.21


In obtaining the location data of the locations, the Geocoder package is used with the arcgis_geocoder to obtain the latitude and longitude of the needed locations.

These will help to create a new dataframe that will be used subsequently for LA neighborhoods.

In [3]:
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Los Angeles, United States'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

Then we proceed to store the location data - latitude and longitude as follows. The obtained coordinates are then joined to LA_rentdata to create new data frame.

In [4]:
coord = LA_rentdata['Neighborhood']    
coordinates = [get_latlng(coord) for coord in coord.tolist()]

# This will store LA dataframe with coordinates
df_LA = LA_rentdata
df_LA_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_LA['Latitude'] = df_LA_coordinates['Latitude']
df_LA['Longitude'] = df_LA_coordinates['Longitude']
df_LA.head(10)

Unnamed: 0,State,City,Neighborhood,Average Rent (per SqFoot),Latitude,Longitude
0,CA,Los Angeles,Reseda,2.03,34.19384,-118.54754
1,CA,Los Angeles,Eagle Rock,2.05,34.13927,-118.21087
2,CA,Los Angeles,Vermont - Slauson,2.06,33.989175,-118.237705
3,CA,Los Angeles,Van Nuys,2.1,34.18439,-118.44652
4,CA,Los Angeles,Tarzana,2.11,34.17529,-118.5501
5,CA,Los Angeles,Gramercy Park,2.13,34.0339,-118.31258
6,CA,Los Angeles,Mount Washington,2.14,34.09904,-118.21134
7,CA,Los Angeles,Baldwin Hills - Crenshaw,2.15,34.01157,-118.33646
8,CA,Los Angeles,Montecio Heights,2.2,34.09198,-118.20101
9,CA,Los Angeles,West Hills,2.21,34.20036,-118.62933


In [5]:
#Let's now take only Neighorhood, Average Rent (per SqFoot) and coordinates
df_LA = df_LA[['Neighborhood','Average Rent (per SqFoot)', 'Latitude', 'Longitude']]
df_LA.head(10)

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Latitude,Longitude
0,Reseda,2.03,34.19384,-118.54754
1,Eagle Rock,2.05,34.13927,-118.21087
2,Vermont - Slauson,2.06,33.989175,-118.237705
3,Van Nuys,2.1,34.18439,-118.44652
4,Tarzana,2.11,34.17529,-118.5501
5,Gramercy Park,2.13,34.0339,-118.31258
6,Mount Washington,2.14,34.09904,-118.21134
7,Baldwin Hills - Crenshaw,2.15,34.01157,-118.33646
8,Montecio Heights,2.2,34.09198,-118.20101
9,West Hills,2.21,34.20036,-118.62933


Let's get the geographical coordinates of Los Angeles.

In [6]:
address = 'Los Angeles, United States'

geolocator = Nominatim(user_agent="LA_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Los Angeles are 34.0536909, -118.2427666.


Let's visualize Los Angeles neighborhoods.

In [7]:
# create map of Los Angeles using latitude and longitude values
map_LA = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df_LA['Latitude'], df_LA['Longitude'], df_LA['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_LA)  
    
map_LA

In [None]:
# The code was removed by Watson Studio for sharing.

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


Let's create a function to get the top 100 venues in every neighborhood within a radius of 500 meters.

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now let's run the above function on each neighborhood and create a new dataframe called toronto_venues.

In [11]:
LA_venues = getNearbyVenues(names = df_LA['Neighborhood'],
                                   latitudes = df_LA['Latitude'],
                                   longitudes = df_LA['Longitude']
                                  )

Reseda
Eagle Rock
Vermont - Slauson
Van Nuys
Tarzana
Gramercy Park
Mount Washington
Baldwin Hills - Crenshaw
Montecio Heights
West Hills
Lake Balboa
West Adams
Valley Glen
Northridge
Vermont Square
South Park
Chatsworth
Lincoln Heights
Canoga Park
Highland Park
Koreatown
Encino
Porter Ranch
Windsor Square
Los Feliz
North Hollywood
East Hollywood
Sherman Oaks
Atwater Village
Silver Lake
Westlake
Griffith Park
Larchmont
Mid-City
Echo Park
Woodland Hills
Hollywood Hills
Beverlywood
Studio City
Palms
Valley Village
Rancho Park
Cheviot Hills
Hancock Park
Hollywood
Sunkist Park
Studio Village
Park West
Fox Hills
Elysian Park


In [13]:
print(LA_venues.shape)
LA_venues.head(10)

(1263, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Reseda,34.19384,-118.54754,YMCA West Valley,34.193438,-118.543146,Gym
1,Eagle Rock,34.13927,-118.21087,Milkfarm,34.138996,-118.212384,Deli / Bodega
2,Eagle Rock,34.13927,-118.21087,The Oinkster,34.139458,-118.210484,American Restaurant
3,Eagle Rock,34.13927,-118.21087,Four Cafe,34.139047,-118.212857,American Restaurant
4,Eagle Rock,34.13927,-118.21087,One Down Dog,34.139031,-118.213691,Yoga Studio
5,Eagle Rock,34.13927,-118.21087,Taco Spot,34.139144,-118.210796,Mexican Restaurant
6,Eagle Rock,34.13927,-118.21087,5 Line Tavern,34.138892,-118.213333,Bar
7,Eagle Rock,34.13927,-118.21087,Room 31,34.138766,-118.213341,Speakeasy
8,Eagle Rock,34.13927,-118.21087,Snow Station,34.139026,-118.212525,Ice Cream Shop
9,Eagle Rock,34.13927,-118.21087,Leanna Lin's Wonderland,34.137762,-118.214294,Gift Shop


Let's check how many venues were returned for each neighborhood.

In [14]:
LA_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Atwater Village,38,38,38,38,38,38
Baldwin Hills - Crenshaw,48,48,48,48,48,48
Beverlywood,8,8,8,8,8,8
Canoga Park,37,37,37,37,37,37
Chatsworth,17,17,17,17,17,17
Cheviot Hills,2,2,2,2,2,2
Eagle Rock,38,38,38,38,38,38
East Hollywood,20,20,20,20,20,20
Echo Park,46,46,46,46,46,46
Elysian Park,4,4,4,4,4,4


Now let's analyze each neighborhood.

In [15]:
# one hot encoding
LA_onehot = pd.get_dummies(LA_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
LA_onehot['Neighborhood'] = LA_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [LA_onehot.columns[-1]] + list(LA_onehot.columns[:-1])
LA_onehot = LA_onehot[fixed_columns]

LA_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Adult Boutique,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Assisted Living,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beer Store,Big Box Store,Bike Shop,Board Shop,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Cajun / Creole Restaurant,Candy Store,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Donburi Restaurant,Dongbei Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio,Financial or Legal Service,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Historic Site,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Mountain,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Optical Shop,Paper / Office Supplies Store,Park,Pet Service,Pet Store,Pharmacy,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Poke Place,Pool,Post Office,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Reseda,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Eagle Rock,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Eagle Rock,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Eagle Rock,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Eagle Rock,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [16]:
LA_onehot.shape

(1263, 229)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [66]:
LA_grouped = LA_onehot.groupby('Neighborhood').mean().reset_index()
LA_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Adult Boutique,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Assisted Living,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beer Store,Big Box Store,Bike Shop,Board Shop,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Cajun / Creole Restaurant,Candy Store,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Donburi Restaurant,Dongbei Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio,Financial or Legal Service,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Historic Site,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Mountain,Movie Theater,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Optical Shop,Paper / Office Supplies Store,Park,Pet Service,Pet Store,Pharmacy,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Poke Place,Pool,Post Office,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Atwater Village,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.026316,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.026316,0.0,0.0,0.0
1,Baldwin Hills - Crenshaw,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0,0.041667,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.020833,0.0,0.041667,0.020833,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.020833,0.020833
2,Beverlywood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Canoga Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.054054,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.162162,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chatsworth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's print each neighborhood along with the top 5 most common venues.

In [18]:
num_top_venues = 5

for hood in LA_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = LA_grouped[LA_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Atwater Village----
                     venue  freq
0              Coffee Shop  0.08
1    Vietnamese Restaurant  0.05
2               Taco Place  0.03
3  New American Restaurant  0.03
4                Bookstore  0.03


----Baldwin Hills - Crenshaw----
                             venue  freq
0                 Department Store  0.06
1             Fast Food Restaurant  0.06
2               Mexican Restaurant  0.04
3  Southern / Soul Food Restaurant  0.04
4               Chinese Restaurant  0.04


----Beverlywood----
                venue  freq
0   Mobile Phone Shop  0.12
1      Cosmetics Shop  0.12
2  Seafood Restaurant  0.12
3         Coffee Shop  0.12
4            Pharmacy  0.12


----Canoga Park----
                 venue  freq
0    Indian Restaurant  0.16
1  Rental Car Location  0.05
2               Bakery  0.05
3         Burger Joint  0.05
4            Pet Store  0.05


----Chatsworth----
                  venue  freq
0  Fast Food Restaurant  0.18
1   Japanese Restaurant  0.12


Let's put that into a pandas dataframe.  
First, let's write a function to sort the venues in descending order.

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood:

In [54]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = LA_grouped['Neighborhood']

for ind in np.arange(LA_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(LA_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Atwater Village,Coffee Shop,Vietnamese Restaurant,Juice Bar,Pub,Italian Restaurant,Mexican Restaurant,Bookstore,Boutique,Mediterranean Restaurant,Sporting Goods Shop
1,Baldwin Hills - Crenshaw,Fast Food Restaurant,Department Store,Shoe Store,Southern / Soul Food Restaurant,Sandwich Place,Lingerie Store,Chinese Restaurant,Mexican Restaurant,Women's Store,Mobile Phone Shop
2,Beverlywood,Museum,Seafood Restaurant,Mobile Phone Shop,Cosmetics Shop,Pharmacy,Grocery Store,Japanese Restaurant,Coffee Shop,Electronics Store,Dumpling Restaurant
3,Canoga Park,Indian Restaurant,Pet Store,Rental Car Location,Burger Joint,Asian Restaurant,Bakery,Theater,Mexican Restaurant,Fried Chicken Joint,Big Box Store
4,Chatsworth,Fast Food Restaurant,Japanese Restaurant,Breakfast Spot,Assisted Living,Pharmacy,Diner,Mexican Restaurant,Food & Drink Shop,Sporting Goods Shop,Sushi Restaurant


#### Modeling

Cluster Neighborhoods.  
Run k-means to cluster the neighborhood into 5 clusters:

In [55]:
# set number of clusters
kclusters = 5

LA_grouped_clustering = LA_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(LA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 0, 1, 1, 1, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [56]:
# add clustering labels
neighborhoods_venues_sorted.insert(1, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted
LA_merged = df_LA

# merge LA_grouped with LA data to add latitude/longitude for each neighborhood
LA_merged = LA_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how = 'right')

LA_merged # check the last columns!

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Reseda,2.03,34.19384,-118.54754,4,Gym,Donut Shop,Flower Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant,Fast Food Restaurant,Farmers Market
1,Eagle Rock,2.05,34.13927,-118.21087,1,Mexican Restaurant,Dessert Shop,Coffee Shop,American Restaurant,Pet Store,South American Restaurant,Burger Joint,Café,Chinese Restaurant,Ramen Restaurant
2,Vermont - Slauson,2.06,33.989175,-118.237705,1,Convenience Store,Japanese Restaurant,Park,Food,Women's Store,Drugstore,Flea Market,Fish Market,Financial or Legal Service,Film Studio
3,Van Nuys,2.1,34.18439,-118.44652,1,Convenience Store,Chinese Restaurant,Hot Dog Joint,BBQ Joint,Spanish Restaurant,Skating Rink,Mexican Restaurant,Shoe Store,Latin American Restaurant,Shipping Store
4,Tarzana,2.11,34.17529,-118.5501,1,Sushi Restaurant,Pizza Place,Mediterranean Restaurant,Restaurant,Convenience Store,Coffee Shop,Middle Eastern Restaurant,Breakfast Spot,Café,Motel
5,Gramercy Park,2.13,34.0339,-118.31258,1,Pizza Place,Food Truck,Market,Drugstore,Flower Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant
6,Mount Washington,2.14,34.09904,-118.21134,3,Trail,Donut Shop,Light Rail Station,Playground,Pizza Place,Sandwich Place,Grocery Store,Drugstore,Dumpling Restaurant,Eastern European Restaurant
7,Baldwin Hills - Crenshaw,2.15,34.01157,-118.33646,1,Fast Food Restaurant,Department Store,Shoe Store,Southern / Soul Food Restaurant,Sandwich Place,Lingerie Store,Chinese Restaurant,Mexican Restaurant,Women's Store,Mobile Phone Shop
8,Montecio Heights,2.2,34.09198,-118.20101,2,Park,Market,Convenience Store,Business Service,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Financial or Legal Service,Film Studio
9,West Hills,2.21,34.20036,-118.62933,1,Home Service,Heliport,Burrito Place,Sandwich Place,Women's Store,Donut Shop,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant


In [57]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(LA_merged['Latitude'], LA_merged['Longitude'], LA_merged['Neighborhood'], LA_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

#### Cluster 1

In [61]:
LA_merged.loc[LA_merged['Cluster Labels'] == 0, LA_merged.columns[[0] + [1] + list(range(4, LA_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,Cheviot Hills,3.07,0,Irish Pub,New American Restaurant,Women's Store,Drugstore,Flower Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant


#### Cluster 2

In [62]:
LA_merged.loc[LA_merged['Cluster Labels'] == 1, LA_merged.columns[[0] + [1] + list(range(4, LA_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Eagle Rock,2.05,1,Mexican Restaurant,Dessert Shop,Coffee Shop,American Restaurant,Pet Store,South American Restaurant,Burger Joint,Café,Chinese Restaurant,Ramen Restaurant
2,Vermont - Slauson,2.06,1,Convenience Store,Japanese Restaurant,Park,Food,Women's Store,Drugstore,Flea Market,Fish Market,Financial or Legal Service,Film Studio
3,Van Nuys,2.1,1,Convenience Store,Chinese Restaurant,Hot Dog Joint,BBQ Joint,Spanish Restaurant,Skating Rink,Mexican Restaurant,Shoe Store,Latin American Restaurant,Shipping Store
4,Tarzana,2.11,1,Sushi Restaurant,Pizza Place,Mediterranean Restaurant,Restaurant,Convenience Store,Coffee Shop,Middle Eastern Restaurant,Breakfast Spot,Café,Motel
5,Gramercy Park,2.13,1,Pizza Place,Food Truck,Market,Drugstore,Flower Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant
7,Baldwin Hills - Crenshaw,2.15,1,Fast Food Restaurant,Department Store,Shoe Store,Southern / Soul Food Restaurant,Sandwich Place,Lingerie Store,Chinese Restaurant,Mexican Restaurant,Women's Store,Mobile Phone Shop
9,West Hills,2.21,1,Home Service,Heliport,Burrito Place,Sandwich Place,Women's Store,Donut Shop,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant
10,Lake Balboa,2.21,1,Mexican Restaurant,Thai Restaurant,Donut Shop,Furniture / Home Store,Fast Food Restaurant,Diner,Liquor Store,Theater,Film Studio,Filipino Restaurant
11,West Adams,2.23,1,Wine Bar,Fried Chicken Joint,Fast Food Restaurant,Bus Stop,Sandwich Place,Women's Store,Donut Shop,Fish Market,Financial or Legal Service,Film Studio
12,Valley Glen,2.25,1,Pizza Place,Pharmacy,Mexican Restaurant,Pet Store,Coffee Shop,Fast Food Restaurant,Middle Eastern Restaurant,Cajun / Creole Restaurant,Supermarket,Shipping Store


#### Cluster 3

In [63]:
LA_merged.loc[LA_merged['Cluster Labels'] == 2, LA_merged.columns[[0] + [1] + list(range(4, LA_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Montecio Heights,2.2,2,Park,Market,Convenience Store,Business Service,Women's Store,Dumpling Restaurant,Flea Market,Fish Market,Financial or Legal Service,Film Studio
15,South Park,2.31,2,Park,Women's Store,Donut Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant,Fast Food Restaurant,Farmers Market
49,Elysian Park,3.19,2,Park,Record Shop,Disc Golf,Women's Store,Donut Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant


#### Cluster 4

In [64]:
LA_merged.loc[LA_merged['Cluster Labels'] == 3, LA_merged.columns[[0] + [1] + list(range(4, LA_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Mount Washington,2.14,3,Trail,Donut Shop,Light Rail Station,Playground,Pizza Place,Sandwich Place,Grocery Store,Drugstore,Dumpling Restaurant,Eastern European Restaurant
22,Porter Ranch,2.5,3,Supermarket,Pharmacy,Trail,Gym,Women's Store,Donut Shop,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant
31,Griffith Park,2.66,3,Trail,Automotive Shop,Mountain,Bus Stop,Scenic Lookout,Donut Shop,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant


#### Cluster 5

In [65]:
LA_merged.loc[LA_merged['Cluster Labels'] == 4, LA_merged.columns[[0] + [1] + list(range(4, LA_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Average Rent (per SqFoot),Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Reseda,2.03,4,Gym,Donut Shop,Flower Shop,Flea Market,Fish Market,Financial or Legal Service,Film Studio,Filipino Restaurant,Fast Food Restaurant,Farmers Market


## 3. Methodology  <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Los Angeles that have low coffee shops / café / dessert shop density. We will limit our analysis to area 500 meters around center of neighborhood as well as get the top 100 venues in every neighborhood.

In first step we have collected the required data: location and type of every venue within 500m around center of neighborhood. 

Second step in our analysis will be exploration of 'coffee shops density' across different neighborhoods of LA - we will identify a few promising neighborhoods with low number of shops / café / dessert shop in general and focus our attention on those areas.

In third and final step we will focus on most promising neighborhoods and within those create clusters of locations that meet some basic requirements established in discussion with our client: we will take into consideration locations with no more than 2-5 coffee shops / café / dessert shop in radius of 500 meters, and we want appropriate lease rent. Then we will present map of all such locations but also create clusters (using k-means clustering) of those locations.

## 4. Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of coffee shops / café / dessert shop in Los Angeles, in some areas it was found that there are of low coffee shop density. The highest concentration of coffee shops / café / dessert shop as well as different kinds of restaurants was detected in Cluster 2. At the same time not all neighborhoods in Cluster 2 have enough quantity of coffee shops but they have necessary amenities for creating coffee shops (parks, hotels, hostels, etc.). Considering the various amenities in Cluster 2, you must also consider the amount of lease rent. So, the most attractive neighbohoods in Cluster 2 are Vermont-Slauson, West Hills, Vermont Square, Canoga Park. The average lease rents in this areas are acceptable (in range from 2.06 till 2.38). 
  
In Cluster 5 we identified potentially interesting neighborhood, Reseda, which offer a combination of interesting venues - Gym, Flower Shop, Flea Market, Financial or Legal Service, Film Studio. The average lease rent in Reseda is very attractive - 2.03.

Another attractive areas were found in Cluster 3 - Montecio Heights, South Park. In these neighborhoods there are park zones, markets, stores, and almost no any coffee shops / café / dessert shops. The average lease rents are 2.20 and 2.31 respectively.

Cluster 4 also has two acceptable neighborhoods - Mount Washington and Porter Ranch. In these neighborhoods we didn't detected many coffee shops. So, this is a very good result.

Finally, Cheviot Hills from Cluster 1 due to its very high lease rent. 

Result of all this is 9 neighborhoods containing largest number of potential new coffee shops locations based on number of and distance to existing venues - Downtown, Movie theatre, Parks, Malls & Gas stations. This, of course, does not imply that those neighborhoodss are actually optimal locations for a new Belgian coffee shop! Purpose of this analysis was to only provide info on areas with acceptible lease rents but not crowded with existing coffee shops / café / dessert shops - it is entirely possible that there is a very good reason for small number of coffee shops in any of those areas, reasons which would make them unsuitable for a new coffee shop regardless of lack of competition in the neighborhood. Recommended neighborhoods should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## 5. Conclusion  <a name="conclusion"></a>

Purpose of this project was to identify Los Angeles neighborhoods with low number of coffee shops / café / dessert shop in order to aid client in narrowing down the search for optimal location for a new Belgian coffee shops combined with Belgian chocolate shop. By analysing coffee shops / café / dessert shop density distribution from Foursquare data we have first identified general neighborhoods that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by client.

Final decision on optimal coffee shop location will be made by the client based on specific characteristics of neighborhoods in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to Downtown, Movie theatre, Parks, Malls & Gas), proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.