# Air Travel to Southern California

## 1. Introduction

### 1.1 Background

In 2017, 48.5 million tourists traveled to California and that number continues to grow annually[1]. This is due to the nice, warm weather year round, the many attractions and beaches, and the exploding corporate industries. Whether one is traveling for business or leisure, the many attractions across Southern California are extremely accessible to visitors by the numerous airports across the region.


### 1.2 Problem

Although the plethora of airports in Southern California make traveling there easier, choosing the right airport for one’s specific needs can be a frustrating and daunting task. This project aims to simplify the process and help the traveler(s) choose the best airport so that they can focus more on enjoying themselves.

### 1.3 Interest

Those interested in this report are air travelers heading to Southern California for either business or leisure.

## 2. Data

In order to find a solution to our problem, we need to know the location of the airports in Southern California and  a list of the top attractions in the region.
For this project, I will use the Foursquare API to locate the airports across Southern California as well as determine and find the most popular attractions so that visitors will know which airport to fly into depending on their itineraries. I will then populate a hypothetical dataframe with this information and use a segmentation and clustering algorithmto determine which airport to fly into depending on the attraction you plan on visiting(assuming you will be staying in close vicinity of that attraction).


### 2.1 Locating and Mapping Airports 

In [None]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library
#create dataframe
airport_data = [('LAX', 'Los Angeles', '34.0522', '-118.2437'),('BUR', 'Burbank', '34.1983', '-118.3574'),('SNA', 'Santa Ana', '33.675556', '-117.868333'),('LGB', 'Long Beach', '33.817778','-118.151667'),('ONT', 'Ontario', '34.056111',' -118.151667-117.601111'),('SAN', 'San Diego', '32.733611','-117.189722')]
labels = ['airport_code','city','latitude','longitude']
df_airports = pd.DataFrame.from_records(airport_data, columns = labels)
df_airports

### 2.2 Locating Popular Southern California Attractions

In [None]:
#Use Foursquare API to obtain venues
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

In [None]:
CLIENT_ID = 'BRGR1LFW5FO5FXFV3UGTK10RLPUL03M4ND5NQ0E5ZVVPLH04' # your Foursquare ID
CLIENT_SECRET = '5WPK0YWJOWN4TVQRGTP44SRIRCRKUPUZEXKNUU4KEV4F3GQB' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentials')
      
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
#Searching for attractions near LAX
address = '1 World Way, Los Angeles, CA 90045'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

search_query = 'Hilton Hotels'
radius = 100000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

results = requests.get(url).json()
results

In [12]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df_LAX = json_normalize(venues)
df_LAX.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4aaed4b0f964a520786320e3,1707 4th St,US,Santa Monica,United States,,11167,"[1707 4th St, Santa Monica, CA 90401, United S...","[{'label': 'display', 'lat': 34.0117436, 'lng'...",34.011744,-118.489113,,90401,CA,DoubleTree Suites by Hilton Hotel Santa Monica,v-1568526784,
1,[],False,4c9638c301dd76b08592c148,9336 Civic Center Dr,US,"California,",United States,"Beverly Hills,",14585,"[9336 Civic Center Dr (Beverly Hills,), Califo...","[{'label': 'display', 'lat': 34.07628764404890...",34.076288,-118.392676,,90210,CA,Hilton Hotels Corp,v-1568526784,
2,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4b234130f964a520b25424e3,2800 Via Cabrillo Marina,US,San Pedro,United States,,27435,"[2800 Via Cabrillo Marina, San Pedro, CA 90731...","[{'label': 'display', 'lat': 33.718573, 'lng':...",33.718573,-118.282252,,90731,CA,DoubleTree by Hilton Hotel San Pedro - Port of...,v-1568526784,
3,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4b1b5060f964a520c3fa23e3,5711 W Century Blvd,US,Los Angeles,United States,at Aviation Blvd,1554,"[5711 W Century Blvd (at Aviation Blvd), Los A...","[{'label': 'display', 'lat': 33.9461767, 'lng'...",33.946177,-118.381616,West Los Angeles,90045,CA,Hilton Los Angeles Airport,v-1568526784,
4,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",False,52ea6af4498e29d798d144a2,,US,Los Angeles,United States,,1540,"[Los Angeles, CA 90045, United States]","[{'label': 'display', 'lat': 33.94614991344601...",33.94615,-118.381765,,90045,CA,Hilton Los Angeles Airport Hotel Health Center,v-1568526784,


In [None]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
df_LAX_filtered = df_LAX.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_LAX_filtered['categories'] = df_LAX_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_LAX_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
df_LAX_filtered

In [None]:
#subset of dataframe
df1_LAX = df_LAX_filtered[['name','city', 'distance','lat','lng']]
df1_LAX

In [15]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around LAX

# add a red circle marker to represent LAX
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Hilton Hotels as blue circle markers
for lat, lng, label in zip(df_LAX_filtered.lat, df_LAX_filtered.lng, df_LAX_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

NameError: name 'df_LAX_filtered' is not defined

In [16]:
#Searching for attractions near BUR
address = '2627 N Hollywood Wy, Burbank, CA 91505'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)
search_query = 'Hilton Hotels'
radius = 100000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df_BUR = json_normalize(venues)
df_BUR.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_BUR.columns if col.startswith('location.')] + ['id']
df_BUR_filtered = df_BUR.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_BUR_filtered['categories'] = df_BUR_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_BUR_filtered.columns = [column.split('.')[-1] for column in df_BUR_filtered.columns]
#subset of dataframe
df1_BUR = df_BUR_filtered[['name','city', 'distance','lat','lng']]
df1_BUR

34.1674848 -118.3468132
Hilton Hotels .... OK!


Unnamed: 0,name,city,distance,lat,lng
0,DoubleTree Suites by Hilton Hotel Santa Monica,Santa Monica,21740,34.011744,-118.489113
1,Hilton Hotels Corp,"California,",10996,34.076288,-118.392676
2,DoubleTree by Hilton Hotel San Pedro - Port of...,San Pedro,50326,33.718573,-118.282252
3,Hilton Los Angeles/Universal City,Universal City,3602,34.136646,-118.358679
4,Hilton Los Angeles North/Glendale & Executive ...,Glendale,8373,34.158749,-118.256519
5,DoubleTree by Hilton Hotel Santa Ana - Orange ...,Santa Ana,68363,33.700441,-117.866176
6,Hilton Garden Inn Los Angeles/Hollywood,Los Angeles,6824,34.106633,-118.337792
7,Hilton Fitness Center,Universal City,3615,34.136373,-118.358065
8,Hilton Garden Inn Burbank Downtown,Burbank,3966,34.178007,-118.305662
9,The Beverly Hilton,Beverly Hills,12783,34.066407,-118.412652


In [17]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around BUR

# add a red circle marker to represent BUR
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Hilton Hotels as blue circle markers
for lat, lng, label in zip(df_BUR_filtered.lat, df_BUR_filtered.lng, df_BUR_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [131]:
#Searching for attractions near SNA
address = '18601 Airport Way, Santa Ana, CA 92707'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

search_query = 'Hilton Hotels'
radius = 100000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df_SNA = json_normalize(venues)
df_SNA.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_SNA.columns if col.startswith('location.')] + ['id']
df_SNA_filtered = df_SNA.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_SNA_filtered['categories'] = df_SNA_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_SNA_filtered.columns = [column.split('.')[-1] for column in df_SNA_filtered.columns]
#subset of dataframe
df1_SNA = df_SNA_filtered[['name','city', 'distance','lat','lng']]
df1_SNA

33.682329 -117.859163
Hilton Hotels .... OK!


Unnamed: 0,name,city,distance,lat,lng
0,DoubleTree by Hilton Hotel Santa Ana - Orange ...,Santa Ana,2118,33.700441,-117.866176
1,DoubleTree Suites by Hilton Hotel Santa Monica,Santa Monica,68823,34.011744,-118.489113
2,DoubleTree by Hilton Hotel San Pedro - Port of...,San Pedro,39390,33.718573,-118.282252
3,Hilton Irvine/Orange County Airport,Irvine,853,33.674785,-117.860783
4,Hilton Hotels Corp,"California,",65987,34.076288,-118.392676
5,Hilton Garden Inn,Irvine,997,33.680504,-117.848625
6,Hilton Hotel- Monarch Ballroom,Irvine,842,33.674906,-117.86091
7,Hilton Orange County/Costa Mesa,Costa Mesa,2337,33.683279,-117.884372
8,Hilton Hotel - Crystal Ballroom,Irvine,846,33.67484,-117.860725
9,Hilton Anaheim,Anaheim,14271,33.800684,-117.918432


In [18]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around SNA

# add a red circle marker to represent SNA
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Hilton Hotels as blue circle markers
for lat, lng, label in zip(df_SNA_filtered.lat, df_SNA_filtered.lng, df_SNA_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

NameError: name 'df_SNA_filtered' is not defined

In [133]:
#Searching for attractions near LGB
address = '4100 Donald Douglas Dr. Long Beach, CA 90808'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

search_query = 'Hilton Hotels'
radius = 100000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df_LGB = json_normalize(venues)
df_LGB.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_LGB.columns if col.startswith('location.')] + ['id']
df_LGB_filtered = df_LGB.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_LGB_filtered['categories'] = df_LGB_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_LGB_filtered.columns = [column.split('.')[-1] for column in df_LGB_filtered.columns]
#subset of dataframe
df1_LGB = df_LGB_filtered[['name','city', 'distance','lat','lng']]
df1_LGB

33.8172681 -118.1418025
Hilton Hotels .... OK!


Unnamed: 0,name,city,distance,lat,lng
0,DoubleTree Suites by Hilton Hotel Santa Monica,Santa Monica,38705,34.011744,-118.489113
1,DoubleTree by Hilton Hotel San Pedro - Port of...,San Pedro,17018,33.718573,-118.282252
2,Hilton Hotels Corp,"California,",36987,34.076288,-118.392676
3,DoubleTree by Hilton Hotel Santa Ana - Orange ...,Santa Ana,28632,33.700441,-117.866176
4,Hilton Long Beach & Executive Meeting Center,Long Beach,7765,33.768026,-118.201263
5,Hilton Anaheim,Anaheim,20742,33.800684,-117.918432
6,Homewood Suites by Hilton Long Beach Airport,Long Beach,1115,33.827158,-118.143755
7,Hotel Maya - a DoubleTree by Hilton Hotel,Long Beach,8544,33.75679,-118.198665
8,Hilton Los Angeles Airport,Los Angeles,26402,33.946177,-118.381616
9,Hilton Hotel Pool & Spa,Anaheim,20751,33.800965,-117.918306


In [134]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around LGB
# add a red circle marker to represent LGB
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Hilton Hotels as blue circle markers
for lat, lng, label in zip(df_LGB_filtered.lat, df_LGB_filtered.lng, df_LGB_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [135]:
#Searching for attractions near ONT
address = '2500 E Airport Dr. Ontario, CA 91761'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

search_query = 'Hilton Hotels'
radius = 100000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df_ONT = json_normalize(venues)
df_ONT.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_ONT.columns if col.startswith('location.')] + ['id']
df_ONT_filtered = df_ONT.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_ONT_filtered['categories'] = df_ONT_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_ONT_filtered.columns = [column.split('.')[-1] for column in df_ONT_filtered.columns]
#subset of dataframe
df1_ONT = df_ONT_filtered[['name','city', 'distance','lat','lng']]
df1_ONT


34.062531 -117.600645
Hilton Hotels .... OK!


Unnamed: 0,name,city,distance,lat,lng
0,DoubleTree Suites by Hilton Hotel Santa Monica,Santa Monica,82153,34.011744,-118.489113
1,DoubleTree by Hilton Hotel Santa Ana - Orange ...,Santa Ana,47189,33.700441,-117.866176
2,DoubleTree by Hilton Hotel Ontario Airport,Ontario,942,34.066027,-117.609952
3,Hilton Hotels Corp,"California,",73050,34.076288,-118.392676
4,DoubleTree by Hilton Hotel San Bernardino,San Bernardino,29509,34.065231,-117.28067
5,Hilton Garden Inn,Rancho Cucamonga,4637,34.078965,-117.554427
6,DoubleTree by Hilton Hotel San Pedro - Port of...,San Pedro,73709,33.718573,-118.282252
7,Poolside Ontario Doubletree By Hilton,Ontario,893,34.06576,-117.609518
8,Embassy Suites by Hilton Ontario Airport,Ontario,2585,34.065463,-117.572827
9,DoubleTree by Hilton Hotel Claremont,Claremont,12423,34.10759,-117.723924


In [136]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around ONT

# add a red circle marker to represent ONT
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Hilton Hotels as blue circle markers
for lat, lng, label in zip(df_SNA_filtered.lat, df_ONT_filtered.lng, df_ONT_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [137]:
#Searching for attractions near SAN
address = '3225 N Harbor Dr San Diego, CA 92101'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

search_query = 'Hilton Hotels'
radius = 100000
print(search_query + ' .... OK!')

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df_SAN = json_normalize(venues)
df_SAN.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df_SAN.columns if col.startswith('location.')] + ['id']
df_SAN_filtered = df_SAN.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_SAN_filtered['categories'] = df_SAN_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_SAN_filtered.columns = [column.split('.')[-1] for column in df_SAN_filtered.columns]
#subset of dataframe
df1_SAN = df_SAN_filtered[['name','city', 'distance','lat','lng']]
df1_SAN

32.7292147341561 -117.190403029551
Hilton Hotels .... OK!


Unnamed: 0,name,city,distance,lat,lng
0,DoubleTree by Hilton Hotel San Diego Downtown,San Diego,2480,32.722373,-117.165196
1,DoubleTree by Hilton Hotel San Diego - Hotel C...,San Diego,3499,32.758521,-117.17687
2,DoubleTree by Hilton Hotel San Diego - Mission...,San Diego,5351,32.770011,-117.160175
3,DoubleTree by Hilton Hotel San Diego - Del Mar,San Diego,23388,32.93573,-117.236432
4,Hilton San Diego Bayfront,San Diego,4146,32.70337,-117.158518
5,Hilton Garden Inn - San Diego Downtown/Bayside,San Diego,1780,32.72625,-117.17172
6,Hilton San Diego Airport/Harbor Island,San Diego,1803,32.725182,-117.209054
7,Hilton San Diego Mission Valley,San Diego,4988,32.763896,-117.156665
8,Hilton San Diego Gaslamp Quarter,San Diego,3641,32.708026,-117.160777
9,Hilton Garden Inn San Diego Old Town/SeaWorld ...,San Diego,3319,32.75834,-117.198


In [81]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around SAN

# add a red circle marker to represent SAN
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Hilton Hotels as blue circle markers
for lat, lng, label in zip(df_SAN_filtered.lat, df_SAN_filtered.lng, df_SAN_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### K-means Clustering

In [None]:
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

print('Libraries imported.')