### Installing and Importing required Libraries

In [1]:
!pip install beautifulsoup4
!pip install lxml
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import csv
print("Libraries imported.")

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 5.9MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.3 soupsieve-2.0.1
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/64/28/0b761b64ecbd63d272ed0e7a6ae6e4402fc37886b59181bfdf274424d693/lxml-4.6.1-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 4.6MB/s eta 0:00:01     |██████████████▎                 | 2.5MB 4.6MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully

### Setting maxcolwidth to 800 for good viewability

In [2]:
pd.set_option('max_colwidth', 800)

### Web scraping of Wikipedia page to get table of Neighbourhoods in the city of San Diego

Getting the source webpage and assigining the variable source to it and iniatilizing the beautifulsoup object to soup

In [3]:
source = requests.get('https://en.wikipedia.org/wiki/Category:Neighborhoods_in_San_Diego').text 
soup = BeautifulSoup(source, 'lxml')

### Initializing the csv_writer object and writing the name of the columns on it as the first row

In [4]:
csv_file = open('San-Diego', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Neighbourhood'])

15

### Scraping the page to extracting the list of neighbourhoods in San Diego

In [5]:
mwcg = soup.find_all(class_ = "mw-category-group")

length = len(mwcg) # Gets the length of number of `mw-category-groups` present

for i in range(1, length):  # Gets all the neighbourhoods
    lists = mwcg [i].find_all('a')
    for list in lists:
        nbd = list.get('title') # Gets the title of the neighbourhood
        csv_writer.writerow([nbd]) # Writes the name of the neighbourhood in the csv file


### Closing the csv file

In [6]:
csv_file.close()

### Creating the pandas dataframe

In [7]:
df = pd.read_csv('San-Diego.csv')

In [8]:
df1 = df.drop([df.index[0], df.index[1]])

In [9]:
df1.reset_index(drop=True, inplace=True)

In [10]:
df1.head()

Unnamed: 0,Neighbourhood
0,"Allied Gardens, San Diego"
1,"Alta Vista, San Diego"
2,"Alvarado Estates, San Diego"
3,"Azalea Park, San Diego"
4,"Bankers Hill, San Diego"


In [11]:
df1.shape

(127, 1)

### Importing other libraries

In [12]:
!conda install -c conda-forge geopy --yes
!pip install geocoder
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import geocoder

# library for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

### Get the geographical coordinates

In [13]:
# define a function to get coordinates
def get_latlng(neighbourhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, San Diego, United States'.format(neighbourhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [14]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighbourhood) for neighbourhood in df1["Neighbourhood"].tolist() ]

In [15]:
coords

[[32.79633000000007, -117.09450999999996],
 [32.693160000000034, -117.06778999999995],
 [32.77722977470506, -117.05725645116517],
 [32.73284000000007, -117.10775999999998],
 [32.72849000000008, -117.16141999999996],
 [32.69467000000003, -117.13807999999995],
 [32.69166000000007, -117.04083999999995],
 [32.813120000000026, -117.26510999999999],
 [32.793150000000026, -117.15431999999998],
 [32.99950000000007, -117.14514999999994],
 [32.73105000000004, -117.05379999999997],
 [32.73375974652356, -117.12734315331029],
 [32.97883842044885, -117.08523692633462],
 [32.715050000000076, -117.09302999999994],
 [32.749690000000044, -117.10714999999999],
 [32.777372655863424, -117.14773216733681],
 [32.83457000000004, -117.19462999999996],
 [32.791557606429606, -117.06606530901291],
 [32.72231709163447, -117.16740310892881],
 [32.71568000000008, -117.16170999999997],
 [32.72316477734785, -117.15877165368126],
 [32.78762000000006, -117.06416999999999],
 [32.948390000000074, -117.25991999999997],
 [3

In [16]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [17]:
# merge the coordinates into the original dataframe
df1['Latitude'] = df_coords['Latitude']
df1['Longitude'] = df_coords['Longitude']

In [18]:
# check the neighborhoods and the coordinates
print(df1.shape)
df1.head()

(127, 3)


Unnamed: 0,Neighbourhood,Latitude,Longitude
0,"Allied Gardens, San Diego",32.79633,-117.09451
1,"Alta Vista, San Diego",32.69316,-117.06779
2,"Alvarado Estates, San Diego",32.77723,-117.057256
3,"Azalea Park, San Diego",32.73284,-117.10776
4,"Bankers Hill, San Diego",32.72849,-117.16142


In [19]:
# save the DataFrame as CSV file
df1.to_csv("San-Diego_neighbourhoods.csv", index=False)

### Creating a map of San Diego with neighborhoods superimposed on top

In [20]:
# get the coordinates of San Diego
address = 'San Diego, United States'

geolocator = Nominatim(user_agent="default")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Diego, United States {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Diego, United States 32.7174209, -117.1627714.


In [21]:
# create map of San Diego using latitude and longitude values
map_sandiego = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighbourhood in zip(df1['Latitude'], df1['Longitude'], df1['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_sandiego)  
    
map_sandiego

In [22]:
# save the map as HTML file
map_sandiego.save('map_sandiego.html')

###  Using the Foursquare API to explore the neighborhoods

In [23]:
# define Foursquare Credentials and Version
CLIENT_ID = 'DNPILJCJPN0QTVVXABURUAFICVXO2ADWQKI3OAWZRAWW53MC' # your Foursquare ID
CLIENT_SECRET = 'ZTGXLL1F2WIXXW24OYZWGLNPGSHEEL30SHDX0GT2W4ZR4S51' # your Foursquare Secret
VERSION = '20201029' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DNPILJCJPN0QTVVXABURUAFICVXO2ADWQKI3OAWZRAWW53MC
CLIENT_SECRET:ZTGXLL1F2WIXXW24OYZWGLNPGSHEEL30SHDX0GT2W4ZR4S51


### Now, we get the top 100 venues that are within a radius of 2000 meters

In [24]:
import json

radius = 2000
LIMIT = 100

venues = []

for lat, long, neighbourhood in zip(df1['Latitude'], df1['Longitude'], df1['Neighbourhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighbourhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [25]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighbourhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(9819, 7)


Unnamed: 0,Neighbourhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Allied Gardens, San Diego",32.79633,-117.09451,Emiliano's Mexican Restaraunt,32.794619,-117.097013,Mexican Restaurant
1,"Allied Gardens, San Diego",32.79633,-117.09451,Cuppa Cuppa Drive-Thru Espresso Bar,32.793145,-117.097884,Coffee Shop
2,"Allied Gardens, San Diego",32.79633,-117.09451,Troy's Greek Restaurant,32.792591,-117.09886,Greek Restaurant
3,"Allied Gardens, San Diego",32.79633,-117.09451,Gaglione Brothers,32.791799,-117.099091,Sandwich Place
4,"Allied Gardens, San Diego",32.79633,-117.09451,Einstein Bros Bagels,32.792202,-117.098305,Bagel Shop


### Checking how many venues were returned for each neighorhood

In [26]:
venues_df.groupby(["Neighbourhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Allied Gardens, San Diego",94,94,94,94,94,94
"Alta Vista, San Diego",50,50,50,50,50,50
"Alvarado Estates, San Diego",86,86,86,86,86,86
"Azalea Park, San Diego",73,73,73,73,73,73
"Bankers Hill, San Diego",100,100,100,100,100,100
...,...,...,...,...,...,...
"University Heights, San Diego",100,100,100,100,100,100
"Valencia Park, San Diego",48,48,48,48,48,48
Village of La Jolla,100,100,100,100,100,100
"Webster, San Diego",99,99,99,99,99,99


### To find how many unique categories can be curated from all the returned venues

In [27]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 351 uniques categories.


In [28]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Mexican Restaurant', 'Coffee Shop', 'Greek Restaurant',
       'Sandwich Place', 'Bagel Shop', 'Vietnamese Restaurant',
       'Pet Store', 'Golf Course', 'Poke Place', 'American Restaurant',
       'Pilates Studio', 'Indian Restaurant', 'Seafood Restaurant',
       'Vegetarian / Vegan Restaurant', 'Garden', 'Brewery', 'Juice Bar',
       'Butcher', 'Hawaiian Restaurant', 'Pharmacy',
       'Fried Chicken Joint', 'Donut Shop', 'Italian Restaurant',
       'Liquor Store', 'Church', 'Martial Arts School', 'Burger Joint',
       'Sushi Restaurant', 'Farmers Market', 'Convenience Store',
       'Thai Restaurant', 'Furniture / Home Store', 'Breakfast Spot',
       'Climbing Gym', 'Outdoor Supply Store', 'BBQ Joint', 'Nightclub',
       'Gym / Fitness Center', 'Grocery Store', 'Pizza Place', 'Park',
       'Steakhouse', 'Pub', 'Fast Food Restaurant', 'ATM',
       'Performing Arts Venue', 'Electronics Store', 'Cocktail Bar',
       'Pool', 'Video Store'], dtype=object)

In [29]:
# check if the results contain "Indian Restaurant"
"Indian Restaurant" in venues_df['VenueCategory'].unique()

True

### One-hot Encoding and analyzing each Neighborhood

In [30]:
# one hot encoding
sandiego_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sandiego_onehot['Neighbourhood'] = venues_df['Neighbourhood'] 

# move neighborhood column to the first column
col_name="Neighbourhood"
first_col = sandiego_onehot.pop(col_name)
sandiego_onehot.insert(0, col_name, first_col)

print(sandiego_onehot.shape)
sandiego_onehot.head()

#cols = list(df.columns)
#cols = [cols[-1]] + cols[:-1]
#df = df[cols]

(9819, 352)


Unnamed: 0,Neighbourhood,ATM,Accessories Store,Adult Boutique,Airport,Airport Lounge,Airport Service,American Restaurant,Amphitheater,Antique Shop,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Allied Gardens, San Diego",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Allied Gardens, San Diego",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Allied Gardens, San Diego",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Allied Gardens, San Diego",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Allied Gardens, San Diego",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [31]:
sandiego_grouped = sandiego_onehot.groupby(["Neighbourhood"]).mean().reset_index()

print(sandiego_grouped.shape)
sandiego_grouped

(127, 352)


Unnamed: 0,Neighbourhood,ATM,Accessories Store,Adult Boutique,Airport,Airport Lounge,Airport Service,American Restaurant,Amphitheater,Antique Shop,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Allied Gardens, San Diego",0.010638,0.0,0.00,0.0,0.0,0.0,0.031915,0.00,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.00,0.0,0.00
1,"Alta Vista, San Diego",0.000000,0.0,0.00,0.0,0.0,0.0,0.000000,0.00,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.020000,0.0,0.00,0.0,0.00
2,"Alvarado Estates, San Diego",0.011628,0.0,0.00,0.0,0.0,0.0,0.034884,0.00,0.0,...,0.000000,0.0,0.011628,0.011628,0.0,0.023256,0.0,0.00,0.0,0.00
3,"Azalea Park, San Diego",0.013699,0.0,0.00,0.0,0.0,0.0,0.013699,0.00,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.027397,0.0,0.00,0.0,0.00
4,"Bankers Hill, San Diego",0.000000,0.0,0.00,0.0,0.0,0.0,0.040000,0.01,0.0,...,0.000000,0.0,0.010000,0.010000,0.0,0.000000,0.0,0.00,0.0,0.13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
122,"University Heights, San Diego",0.000000,0.0,0.01,0.0,0.0,0.0,0.080000,0.00,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.00,0.0,0.00
123,"Valencia Park, San Diego",0.000000,0.0,0.00,0.0,0.0,0.0,0.000000,0.00,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.00,0.0,0.00
124,Village of La Jolla,0.000000,0.0,0.00,0.0,0.0,0.0,0.010000,0.00,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.01,0.0,0.00
125,"Webster, San Diego",0.000000,0.0,0.00,0.0,0.0,0.0,0.000000,0.00,0.0,...,0.010101,0.0,0.000000,0.000000,0.0,0.000000,0.0,0.00,0.0,0.00


In [32]:
len(sandiego_grouped[sandiego_grouped["Indian Restaurant"] > 0])

25

### Create a new DataFrame for Indian Restaurant data only

In [33]:
sandiego_res = sandiego_grouped[["Neighbourhood","Indian Restaurant"]]
sandiego_res.head()

Unnamed: 0,Neighbourhood,Indian Restaurant
0,"Allied Gardens, San Diego",0.010638
1,"Alta Vista, San Diego",0.0
2,"Alvarado Estates, San Diego",0.011628
3,"Azalea Park, San Diego",0.0
4,"Bankers Hill, San Diego",0.0


### Cluster Neighborhoods

Run k-means to cluster the neighborhoods in San Diego into 3 clusters.

In [34]:
# set number of clusters
kclusters = 3

sandiego_clustering = sandiego_res.drop(["Neighbourhood"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sandiego_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 2, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [35]:
# create a new dataframe that includes the cluster as well as the top 100 venues for each neighborhood.
sandiego_merged = sandiego_res.copy()

# add clustering labels
sandiego_merged["Cluster Labels"] = kmeans.labels_

In [36]:
#sandiego_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
sandiego_merged.head()

Unnamed: 0,Neighbourhood,Indian Restaurant,Cluster Labels
0,"Allied Gardens, San Diego",0.010638,2
1,"Alta Vista, San Diego",0.0,0
2,"Alvarado Estates, San Diego",0.011628,2
3,"Azalea Park, San Diego",0.0,0
4,"Bankers Hill, San Diego",0.0,0


In [37]:
# merge sandiego_grouped with sandiego_data to add latitude/longitude for each neighborhood
sandiego_merged = sandiego_merged.join(df1.set_index("Neighbourhood"), on="Neighbourhood")

print(sandiego_merged.shape)
sandiego_merged.head() # check the last columns!

(127, 5)


Unnamed: 0,Neighbourhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,"Allied Gardens, San Diego",0.010638,2,32.79633,-117.09451
1,"Alta Vista, San Diego",0.0,0,32.69316,-117.06779
2,"Alvarado Estates, San Diego",0.011628,2,32.77723,-117.057256
3,"Azalea Park, San Diego",0.0,0,32.73284,-117.10776
4,"Bankers Hill, San Diego",0.0,0,32.72849,-117.16142


In [38]:
# sort the results by Cluster Labels
print(sandiego_merged.shape)
sandiego_merged.sort_values(["Cluster Labels"], inplace=True)
sandiego_merged

(127, 5)


Unnamed: 0,Neighbourhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
126,"Wooded Area, San Diego",0.000000,0,32.714520,-117.244550
73,"North City, San Diego",0.000000,0,33.045690,-117.278740
72,"Normal Heights, San Diego",0.000000,0,32.763140,-117.110050
71,"Nestor, San Diego",0.000000,0,32.576490,-117.088410
70,"Navajo, San Diego",0.000000,0,32.801818,-117.049937
...,...,...,...,...,...
86,"Point Loma Heights, San Diego",0.012821,2,32.733062,-117.249529
88,"Rancho Bernardo, San Diego",0.011765,2,33.024470,-117.085050
92,"Rolando Park, San Diego",0.010000,2,32.763420,-117.060240
54,"Loma Portal, San Diego",0.010101,2,32.740200,-117.238660


### Finally, let's visualize the resulting clusters

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sandiego_merged['Latitude'], sandiego_merged['Longitude'], sandiego_merged['Neighbourhood'], sandiego_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [40]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### Examine Clusters

#### Cluster 0

In [41]:
sandiego_merged.loc[sandiego_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighbourhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
126,"Wooded Area, San Diego",0.0,0,32.714520,-117.244550
73,"North City, San Diego",0.0,0,33.045690,-117.278740
72,"Normal Heights, San Diego",0.0,0,32.763140,-117.110050
71,"Nestor, San Diego",0.0,0,32.576490,-117.088410
70,"Navajo, San Diego",0.0,0,32.801818,-117.049937
...,...,...,...,...,...
120,"Tri-City, San Diego County, California",0.0,0,32.785446,-117.079706
28,"Egger Highlands, San Diego",0.0,0,32.581310,-117.096990
27,"East Village, San Diego",0.0,0,32.711530,-117.149690
37,"Golden Hill, San Diego",0.0,0,32.715860,-117.131910


#### Cluster 1

In [42]:
sandiego_merged.loc[sandiego_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighbourhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
105,"Sorrento Valley, San Diego",0.055556,1,32.89198,-117.19538
63,"Miramar, San Diego",0.077778,1,32.89361,-117.1334


#### Cluster 2

In [43]:
sandiego_merged.loc[sandiego_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighbourhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
104,"Sorrento Mesa, San Diego",0.023256,2,32.727955,-117.251791
2,"Alvarado Estates, San Diego",0.011628,2,32.77723,-117.057256
16,"Clairemont, San Diego",0.011765,2,32.83457,-117.19463
25,Dryden Historic District (San Diego),0.01,2,32.75366,-117.19448
26,"East Elliott, San Diego",0.01,2,32.742984,-117.220945
29,"El Cerrito, San Diego",0.01,2,32.75467,-117.07291
30,El Pueblo Ribera,0.02,2,32.79574,-117.25351
34,Fairway Village,0.01,2,32.837568,-117.262388
39,"Grantville, San Diego",0.01,2,32.78747,-117.09773
42,"Hillcrest, San Diego",0.01,2,32.74996,-117.16511


## Observations

As observations noted from the map in the Results section, most Indian restaurants are concentrated in
cluster 1 area of San Diego city, and moderate number in cluster 2. On the other hand, cluster 0 has a
deficient number of Indian restaurants in the neighborhoods. This represents a great opportunity and
high potential areas to open new Indian restaurants as it is very little to no competition from existing 
Indian restaurants. Meanwhile, Indian restaurants in cluster 1 are likely suffering from intense
competition due to oversupply and high Indian restaurants' concentration. The results also show that
the plethora of Indian restaurants mostly happened in the city's central area, with the suburb area still
have very few Indian restaurants. Therefore, this project recommends people in business to capitalize
on these findings to open new Indian restaurants in neighborhoods in cluster 0 with little to no
competition. Business people with unique selling propositions to stand out from the game can also open
new Indian restaurants in neighborhoods in cluster 2 with moderate competition. Lastly, people in
business are advised to avoid neighborhoods in cluster 1 that already have high Indian restaurants'
concentration and suffer intense competition.