# **Coursera Capstone Project - Downtown Los Angeles Office Relocation**
Use of FourSquare location data to recommend a new office location in Downtown Los Angeles

## **Table of Contents**

* [Introduction & Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### **Introduction & Business Problem** <a name="introduction"></a>

Downtown Los Angeles (DTLA) is a diverse neighborhood in Los Angeles, broken out into 13 districts. A company currently has an office in DTLA within the Arts District. Their lease is ending soon and they have decided to find a new office location for various reasons. They would like to stay in DTLA so as to not have to relocate or inconvenience any of their employees, but they are open to any district within DTLA. Their employees often comment about how much they love where their office is located because of the venues they have close by for meals and entertainment. They want to make sure their employees continue to be happy with the office location so they are encouraged to come in frome time to time. As such, they would like to understand which other districts in DTLA are most similar to the Arts District. With this information, they will then look into available office space to explore their options.

### **Data** <a name="data"></a>

Geographical location data for each district, as well as data about the venues in each district, will be used to create clusters of the districts in Downtown Los Angeles (DTLA).

*Downtown Los Angeles Districts and geographical location data:*

A csv file containing DTLA district names and geographical coordinates will be used to create a Pandas dataframe including the following data points:

* district: Name of district
* latitude: District latitude
* longitude: District longitude

*Venue data:*

The district geographical location data will be used to get local venue information, using the FourSquare API. The 'explore' endpoint will be used to pull a list of recommended venues within each district. The following data will be captured into a dataframe, along with the district data:

* venue: Venue name
* venue_category: Venue category
* venue_lat: Venue latitude
* venue_lng: Venue longitude

#####*Importing libraries*
Before we get started, we need to import the necessary libraries for our analysis. I've commented out the libraries I already have installed on my computer. 

In [2]:
#Import necessary libraries
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests
from bs4 import BeautifulSoup
import numpy as np # library to handle data in a vectorized manner
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't downloaded before
import requests # library to handle requests
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't downloaded before
import folium # map rendering library
print('Libraries imported.')

Libraries imported.


##### *Downtown Los Angeles Districts and geographical location data*
Downtown Los Angeles districts and the geographical location data will be read from a csv file and used to create a Pandas dataframe. 


In [3]:
#Import libraries to upload a csv to load to a Pandas df
from google.colab import files
import io
#Initiate file upload
uploaded = files.upload()

Saving DTLA Districts.csv to DTLA Districts.csv


In [4]:
#Read csv file to Pandas dataframe to obtain lat/long coordinates
DTLA_data = pd.read_csv(io.BytesIO(uploaded['DTLA Districts.csv']))
DTLA_data

Unnamed: 0,District,Latitude,Longitude
0,Arts District,34.04117,-118.23298
1,Bunker Hill,34.052035,-118.250347
2,Civic Center,34.054139,-118.24465
3,Fashion District,34.037168,-118.256404
4,Financial District,34.050833,-118.255
5,Flower District,34.040268,-118.249826
6,Gallery Row,34.048161,-118.247371
7,Historic Core,34.05349,-118.245319
8,Jewelry District,34.045833,-118.254444
9,Little Tokyo,34.050556,-118.239444


##### *Venue Data*
Using the latitude and longitude data for each of the districts, the FourSquare API ('explore' endpoint) will be used to get recommended venues within each district. 


In [5]:
CLIENT_ID = '*****' # your Foursquare ID
CLIENT_SECRET = '*****' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: *****
CLIENT_SECRET:*****


In [6]:
#Create a function to explore the venues for all districts in Downtown Los Angeles
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
#call the function for boroughs with Toronto
DTLA_venues = getNearbyVenues(names=DTLA_data['District'],
                                   latitudes=DTLA_data['Latitude'],
                                   longitudes=DTLA_data['Longitude']
                                  )
DTLA_venues.head()

Arts District
Bunker Hill
Civic Center
Fashion District
Financial District
Flower District
Gallery Row
Historic Core
Jewelry District
Little Tokyo
Skid Row
South Park
Toy District


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arts District,34.04117,-118.23298,Bavel,34.041506,-118.232955,Mediterranean Restaurant
1,Arts District,34.04117,-118.23298,Mr. Speedy Plumbing & Rooter Inc.,34.042538,-118.233864,Home Service
2,Arts District,34.04117,-118.23298,Zinc Café & Market,34.039425,-118.232631,Café
3,Arts District,34.04117,-118.23298,Verve Roastery Del Sur,34.041433,-118.232694,Coffee Shop
4,Arts District,34.04117,-118.23298,Urth Caffé,34.041916,-118.235218,Coffee Shop


### **Methodology** <a name="methodology"></a>

Given the goal is to identify other Downtown Los Angeles (DTLA) districts that are similar to the Arts District with respect to meals and entertainment, venue data by district will be used to identify those districts. 

The first part of our analysis will be an exploration of the districts and venue information to identify trends and/or anything that could impact the next steps in the analysis. 

The second part of the analysis will be to use the venue information, along with additional insights drawn from the previous exploratory analysis, to cluster similar districts based on local venues. A k-means algorithm will be used to cluster the districts. 

Finally, the clusters will be reviewed and explored to understand why they were clustered together before making a list of recommended districts to consider for the new office. 

### **Analysis** <a name="analysis"></a>

#####*Plot the DTLA districts on a map*
Before any clustering is done, the districts are plotted on a map to visualize and understand the physical proximity of the Arts District to other districts in DTLA. 

In [8]:
#Download library to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

#Get geograhical coordinates of Downtown Los Angeles which will be used to create a map
LA_address = 'Downtown Los Angeles, Los Angeles'

geolocator = Nominatim(user_agent="dtt_explorer")
location = geolocator.geocode(LA_address)
LA_lat = location.latitude
LA_lon = location.longitude
print('The geograpical coordinate of Downtown Los Angeles are {}, {}.'.format(LA_lat, LA_lon))

The geograpical coordinate of Downtown Los Angeles are 34.0708781, -118.44684973165106.


In [10]:
#Unfortunately the lat/lon for DTLA are not correct because they are for UCLA. 
#Manually enter values to create a map of the Downtown Los Angles districts, offering a visual of how the districts are dispersed
map_DTLA = folium.Map(location=[34.05, -118.25], zoom_start=14)

# add markers to map
for lat, lng, label in zip(DTLA_data['Latitude'], DTLA_data['Longitude'], DTLA_data['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_DTLA)  
    
map_DTLA

As is shown on the map, the Arts District is a bit removed from the other districts, which means they are unlikely to share common venues in the venue data set. This means that physical proximity should not impact which districts are identified to be similar to the Arts District. 

#####*Explore the venue data*


In [11]:
#Check how many venues by neighborhood
DTLA_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arts District,32,32,32,32,32,32
Bunker Hill,76,76,76,76,76,76
Civic Center,18,18,18,18,18,18
Fashion District,26,26,26,26,26,26
Financial District,82,82,82,82,82,82
Flower District,44,44,44,44,44,44
Gallery Row,79,79,79,79,79,79
Historic Core,43,43,43,43,43,43
Jewelry District,100,100,100,100,100,100
Little Tokyo,100,100,100,100,100,100


There is only one district with very few venues, which is Skid Row. Skid Row contains one of the largest stable populations of homeless people in the United States, which is likely why there are so few venues. While Skid Row will not likely be listed in the recommendations, more analysis is needed before eliminating the district from the analysis. 

In [17]:
#Check number of unique categores
print('There are {} uniques categories.'.format(len(DTLA_venues['Venue Category'].unique())))

There are 156 uniques categories.


There is a diverse number of venues based on category, which will help when clustering districts based on venues. 


#####*Explore each district*

As we will be clustering the districts using the k-means algorithm, and k-means only works with numeric values, we need transform the data using one-hot encoding.

In [18]:
# transform using one hot encoding
DTLA_onehot = pd.get_dummies(DTLA_venues[['Venue Category']], prefix="", prefix_sep="")

# add district column back to dataframe
DTLA_onehot['District'] = DTLA_venues['District'] 

# move district column to the first column
DTLA_onehot = DTLA_onehot.set_index('District').reset_index()

DTLA_onehot.head()

Unnamed: 0,District,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Garden,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Candy Store,Cheese Shop,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Coworking Space,Cycle Studio,Deli / Bodega,Dessert Shop,Diner,Dive Bar,Dog Run,Donut Shop,Escape Room,Event Space,Fabric Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden,Gastropub,Gay Bar,General Entertainment,Gift Shop,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Health & Beauty Service,Hobby Shop,Home Service,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Other Nightlife,Outdoor Sculpture,Park,Performing Arts Venue,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Plaza,Poke Place,Ramen Restaurant,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Smoothie Shop,Snack Place,Spa,Speakeasy,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tech Startup,Thai Restaurant,Theater,Toy / Game Store,Train,Train Station,Tunnel,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Whisky Bar,Wine Bar,Women's Store,Yoga Studio,Yoshoku Restaurant
0,Arts District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Arts District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Arts District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Arts District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Arts District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
#Check the shape
DTLA_onehot.shape

(779, 157)

With the one-hot encoding done, the data can now be grouped by district using the mean of the frequency of each category. This is the first step in exploring the most common venue types in each district. 

In [27]:
#Group rows by district and by taking the mean of the frequency of occurrence of each category
DTLA_grouped = DTLA_onehot.groupby('District').mean().reset_index()
print(DTLA_grouped.shape) #check shape to make sure we still have 13 neighborhoods and 156 unique categories
DTLA_grouped

(13, 157)


Unnamed: 0,District,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Beer Bar,Beer Garden,Beer Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Café,Cajun / Creole Restaurant,Candy Store,Cheese Shop,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Coworking Space,Cycle Studio,Deli / Bodega,Dessert Shop,Diner,Dive Bar,Dog Run,Donut Shop,Escape Room,Event Space,Fabric Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden,Gastropub,Gay Bar,General Entertainment,Gift Shop,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Health & Beauty Service,Hobby Shop,Home Service,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Other Nightlife,Outdoor Sculpture,Park,Performing Arts Venue,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Plaza,Poke Place,Ramen Restaurant,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Smoothie Shop,Snack Place,Spa,Speakeasy,Sports Bar,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tech Startup,Thai Restaurant,Theater,Toy / Game Store,Train,Train Station,Tunnel,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Whisky Bar,Wine Bar,Women's Store,Yoga Studio,Yoshoku Restaurant
0,Arts District,0.0,0.09375,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0625,0.09375,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0
1,Bunker Hill,0.013158,0.013158,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.026316,0.0,0.013158,0.013158,0.0,0.0,0.026316,0.0,0.0,0.039474,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.039474,0.0,0.013158,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.026316,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.026316,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.026316,0.013158,0.013158,0.013158,0.0,0.0,0.039474,0.013158,0.013158,0.0,0.026316,0.0,0.0,0.013158,0.013158,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.039474,0.013158,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.026316,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0
2,Civic Center,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Fashion District,0.038462,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.038462,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0
4,Financial District,0.012195,0.0,0.012195,0.0,0.012195,0.0,0.012195,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.012195,0.0,0.02439,0.0,0.0,0.0,0.012195,0.0,0.012195,0.0,0.060976,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.036585,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.012195,0.0,0.0,0.02439,0.036585,0.0,0.0,0.0,0.0,0.060976,0.02439,0.0,0.012195,0.04878,0.012195,0.0,0.012195,0.012195,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.012195,0.0,0.012195,0.0,0.0,0.012195,0.0,0.036585,0.02439,0.0,0.012195,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.012195,0.012195,0.0,0.012195,0.0,0.0,0.012195,0.0,0.060976,0.012195,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.036585,0.0,0.0,0.0,0.0,0.0,0.012195,0.02439,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0
5,Flower District,0.022727,0.022727,0.0,0.068182,0.022727,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.045455,0.0,0.068182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.022727,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.068182,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.045455,0.022727,0.0
6,Gallery Row,0.012658,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.050633,0.0,0.0,0.0,0.025316,0.0,0.025316,0.012658,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.075949,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.0,0.012658,0.012658,0.0,0.0,0.0,0.012658,0.0,0.012658,0.012658,0.0,0.0,0.0,0.0,0.037975,0.0,0.0,0.0,0.0,0.025316,0.012658,0.0,0.0,0.0,0.0,0.012658,0.025316,0.0,0.012658,0.0,0.012658,0.0,0.0,0.025316,0.0,0.037975,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.025316,0.0,0.012658,0.0,0.0,0.0,0.050633,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.012658,0.0,0.025316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.025316,0.025316,0.0,0.0,0.0,0.0,0.025316,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.025316,0.0,0.0,0.012658,0.025316,0.0,0.0,0.0,0.0,0.012658,0.025316,0.0,0.012658,0.0,0.0,0.0,0.025316,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0
7,Historic Core,0.023256,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.023256,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.046512,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.0,0.023256,0.0,0.023256,0.023256,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.023256,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Jewelry District,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.01,0.02,0.0,0.01,0.02,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.04,0.02,0.02,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.0
9,Little Tokyo,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.02,0.0,0.0,0.0,0.01,0.03,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.01,0.01,0.12,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.06,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.08,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01


The data is now sorted to identify the top 15 venue types for each district. This data will be used to inform the k-means clustering algorithm. 

In [44]:
#function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#Create df for top 15 venues
num_top_venues = 15

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
districts_venues_sorted = pd.DataFrame(columns=columns)
districts_venues_sorted['District'] = DTLA_grouped['District']

for ind in np.arange(DTLA_grouped.shape[0]):
    districts_venues_sorted.iloc[ind, 1:] = return_most_common_venues(DTLA_grouped.iloc[ind, :], num_top_venues)

print(districts_venues_sorted.shape)
districts_venues_sorted

(13, 16)


Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Arts District,Art Gallery,Coffee Shop,Italian Restaurant,Cocktail Bar,Event Space,Gym,Climbing Gym,Fruit & Vegetable Store,Comic Shop,Café,Food Truck,Smoothie Shop,Mediterranean Restaurant,Bridge,Brewery
1,Bunker Hill,Mexican Restaurant,Hotel,Coffee Shop,Café,Sandwich Place,French Restaurant,Bookstore,Italian Restaurant,Train Station,Building,Art Museum,Farmers Market,Gym,Hobby Shop,Hotel Bar
2,Civic Center,American Restaurant,Lounge,Performing Arts Venue,Park,Plaza,Opera House,Concert Hall,Coffee Shop,Sandwich Place,School,Music Venue,Building,Speakeasy,Bookstore,Dog Run
3,Fashion District,Mediterranean Restaurant,Italian Restaurant,Event Space,American Restaurant,Pizza Place,Hotel,Flea Market,Restaurant,Coworking Space,Coffee Shop,Clothing Store,Chinese Restaurant,Moving Target,Shoe Store,Movie Theater
4,Financial District,Sandwich Place,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,New American Restaurant,Sushi Restaurant,French Restaurant,Whisky Bar,Vegetarian / Vegan Restaurant,Bakery,Seafood Restaurant,Train Station,Hotel Bar,Café
5,Flower District,Arts & Crafts Store,Italian Restaurant,Coffee Shop,Clothing Store,Women's Store,Flower Shop,Men's Store,Food Court,Bar,American Restaurant,Café,Cajun / Creole Restaurant,Mexican Restaurant,Salon / Barbershop,French Restaurant
6,Gallery Row,Coffee Shop,Mexican Restaurant,Bar,Italian Restaurant,French Restaurant,Bookstore,Lounge,Sandwich Place,Nightclub,Smoke Shop,Breakfast Spot,Speakeasy,Residential Building (Apartment / Condo),Taco Place,Restaurant
7,Historic Core,Mexican Restaurant,Coffee Shop,American Restaurant,Ramen Restaurant,Latin American Restaurant,Music Venue,School,Sandwich Place,Cheese Shop,Lounge,Market,Concert Hall,Building,Plaza,Pizza Place
8,Jewelry District,Bar,Coffee Shop,New American Restaurant,Hotel,Theater,French Restaurant,American Restaurant,Gym,Salon / Barbershop,Hotel Bar,Salad Place,Italian Restaurant,Gym / Fitness Center,Bookstore,Gastropub
9,Little Tokyo,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Gift Shop,Ice Cream Shop,Coffee Shop,Boutique,Dessert Shop,Theater,Shopping Mall,Supermarket,Bubble Tea Shop,Café,Bar,Bakery


It is not surprising to see that 'Art Gallery' is the most common venue in the Arts District. The next step is clustering all of the districts to see what other commonalities there are. 

#####*Cluster districts using k-means*

One important component of the k-means algorithm is deciding on the number of clusters. Given there are only 13 districts, the right number of clusters will need to be created to offer the company a few other options to consider, without grouping too many districts into one cluster. 

In [40]:
#Run k-means to cluster into 4 clusters

# set number of clusters
kclusters = 4

# drop the district column as the k-means algorithm only works with numerical values
DTLA_grouped_clustering = DTLA_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DTLA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 1, 1, 1, 1, 1, 3], dtype=int32)

With 4 clusters, all districts but two are in one cluster, thus another value nees to be used. 

In [41]:
#Run k-means to cluster into 6 clusters

# set number of clusters
kclusters = 6

# drop the district column as the k-means algorithm only works with numerical values
DTLA_grouped_clustering = DTLA_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DTLA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 0, 5, 1, 5, 1, 1, 1, 3], dtype=int32)

Using 6 clusters, the Arts district is in its own cluster and almost half are in one cluster. Let's try again with 5 clusters. 

In [45]:
#Run k-means to cluster into 5 clusters

# set number of clusters
kclusters = 5

# drop the district column as the k-means algorithm only works with numerical values
DTLA_grouped_clustering = DTLA_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DTLA_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 3, 1, 0, 1, 0, 0, 0, 2], dtype=int32)

Now that the clusters have been formed, a new dataframe will be created that includes the district, cluster, top 15 venues, and latitude/longitude coordinates of the district. 

In [46]:
# add clustering labels to the venues dataframe
districts_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# create a new DTLA_merged dataframe using the DTLA_data dataframe as a starting point
DTLA_merged = DTLA_data

# merge DTLA_grouped with DTLA_data to add latitude/longitude for each district
DTLA_merged = DTLA_merged.join(districts_venues_sorted.set_index('District'), on='District')

DTLA_merged

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Arts District,34.04117,-118.23298,1,Art Gallery,Coffee Shop,Italian Restaurant,Cocktail Bar,Event Space,Gym,Climbing Gym,Fruit & Vegetable Store,Comic Shop,Café,Food Truck,Smoothie Shop,Mediterranean Restaurant,Bridge,Brewery
1,Bunker Hill,34.052035,-118.250347,0,Mexican Restaurant,Hotel,Coffee Shop,Café,Sandwich Place,French Restaurant,Bookstore,Italian Restaurant,Train Station,Building,Art Museum,Farmers Market,Gym,Hobby Shop,Hotel Bar
2,Civic Center,34.054139,-118.24465,3,American Restaurant,Lounge,Performing Arts Venue,Park,Plaza,Opera House,Concert Hall,Coffee Shop,Sandwich Place,School,Music Venue,Building,Speakeasy,Bookstore,Dog Run
3,Fashion District,34.037168,-118.256404,1,Mediterranean Restaurant,Italian Restaurant,Event Space,American Restaurant,Pizza Place,Hotel,Flea Market,Restaurant,Coworking Space,Coffee Shop,Clothing Store,Chinese Restaurant,Moving Target,Shoe Store,Movie Theater
4,Financial District,34.050833,-118.255,0,Sandwich Place,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,New American Restaurant,Sushi Restaurant,French Restaurant,Whisky Bar,Vegetarian / Vegan Restaurant,Bakery,Seafood Restaurant,Train Station,Hotel Bar,Café
5,Flower District,34.040268,-118.249826,1,Arts & Crafts Store,Italian Restaurant,Coffee Shop,Clothing Store,Women's Store,Flower Shop,Men's Store,Food Court,Bar,American Restaurant,Café,Cajun / Creole Restaurant,Mexican Restaurant,Salon / Barbershop,French Restaurant
6,Gallery Row,34.048161,-118.247371,0,Coffee Shop,Mexican Restaurant,Bar,Italian Restaurant,French Restaurant,Bookstore,Lounge,Sandwich Place,Nightclub,Smoke Shop,Breakfast Spot,Speakeasy,Residential Building (Apartment / Condo),Taco Place,Restaurant
7,Historic Core,34.05349,-118.245319,0,Mexican Restaurant,Coffee Shop,American Restaurant,Ramen Restaurant,Latin American Restaurant,Music Venue,School,Sandwich Place,Cheese Shop,Lounge,Market,Concert Hall,Building,Plaza,Pizza Place
8,Jewelry District,34.045833,-118.254444,0,Bar,Coffee Shop,New American Restaurant,Hotel,Theater,French Restaurant,American Restaurant,Gym,Salon / Barbershop,Hotel Bar,Salad Place,Italian Restaurant,Gym / Fitness Center,Bookstore,Gastropub
9,Little Tokyo,34.050556,-118.239444,2,Japanese Restaurant,Sushi Restaurant,Ramen Restaurant,Gift Shop,Ice Cream Shop,Coffee Shop,Boutique,Dessert Shop,Theater,Shopping Mall,Supermarket,Bubble Tea Shop,Café,Bar,Bakery


As expected, Skid Row is in its own cluster. Two other districts (Fashion District and Flower District) have also been identified as potential options to consider. A review of the above dataframe shows that there is likely more in common betweeen the Fashion and Flower Districts than with the Arts District, but some common venues with the Arts District as well.






#####*Visualize the clusters*
The clusters can be added to the DTLA map to visualize their physical proximity to each other, thus informing the analysis further. 

In [49]:
# create map using the DTLA coordinates
map_clusters = folium.Map(location=[34.05, -118.25], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(DTLA_merged['Latitude'], DTLA_merged['Longitude'], DTLA_merged['District'], DTLA_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

By visualizing the clusters, it is clear that the clustering was heavily influenced by physical proximity. However, the two districts in the same cluster as the Arts District are not the closest in distance, so location did not over influence the clustering. 

### **Results and Discussion** <a name="results"></a>

Our analysis uncovered two other districts that have similar venue types to the Arts District, those being the Fashion District and Flower District. A review of the top venue types between the three districts revealed more in common between the latter two districts, though there are some venue types in common with the Arts District, as would be expected. Unfortunately, the most common venue type in the Arts District, 'Art Gallery' is not included in the list for any other district. This is likely a unique characteristic to the Arts District, thus the name, and will not likely be something that can be found in another area of Downtown Los Angeles. This will need to be highlighted in the recommendation. 

One shortcoming of the k-means algorithm is that it is not able to recognize the similarity between some venue types. For example, 'Art Museum' is similar to an 'Art Gallery'. There are also other venues associated with the arts (e.g. museums, music) that may be appealing to the employees, but they are not factored into the clustering. Fortunately, this information can be explored using the dataframe created to show the top venue types for each district, and it can be included in the recommendation.  

Finally, a review of the clusters visualized on a map revealed that though the districts in the same cluster as the Arts District are not closest in physical proximity, they are also not too far from the current office location, which may be favorable from a commuting perspective. 

### **Conclusion** <a name="conclusion"></a>

As a reminder, the purpose of this project was to help a company find a new office location by identifying other Downtown Los Angeles (DTLA) districts that are similar to the Arts District, where their current office is located. Through a combination of exploring top venue types per district and the use of clustering, the following districts are recommended as areas to explore further, factoring in the additional information provided in line with the company's priorities. 

*Recommendations based on clustering (similar restaurants and close physical proximity)* 

These districts have similar restaurants and are physically close to each other. Unfortunately, neither have 'Art Gallery' as a top venue, or many other arts venues. As such, if access to such venues is important for employees, these may not be ideal locations for the new office. 
*   Fashion District
*   Flower District

*Recommendations based on arts venues*

Those these districts do not have 'Art Gallery' as a top venue either, they have other arts venues that may be appealing to the employees if deemed important. 
*   Bunker Hill - Art Museum, which is similar to an art gallery, is in the top 15 venue types
*   Civic Center - 4 performing arts and music venues are in the top 15 venue types
*   Historic Core - 2 music venues are in the top 15 venue types

Note that Civic Center and Historic Core are very close to each other, thus they may share some venues. 

*Conclusion*

Using the above information, the company will be able to prioritize the similarities that are most important (restaurant type, physical proximity, arts venues) when deciding which other districts are acceptable alternatives to their current location. 




