# Segmenting and Clustering Neighborhoods in Santo Domingo, Dominican Republic

### Junior Peña

#### Explore neighborhoods in Santo Domingo.

In [71]:
!pip install lxml
import lxml
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


##### We are going to import a local csv file containing the following columns: Postal Code, Borough, Neighborhood, Latitude and Longitude. 

In [2]:
df = pd.read_csv('SD_Neighborhoods.csv',  encoding='latin-1')
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,10101,Nuevo Distrito Nacional,Centro de Los Heroes
1,10102,Nuevo Distrito Nacional,Mata Hambre
2,10103,Nuevo Distrito Nacional,Zona Universitaria
3,10104,Nuevo Distrito Nacional,San Geronimo
4,10105,Nuevo Distrito Nacional,Ciudad Universitaria


##### Now, we are going to check if there are null values in the columns.

In [3]:
df.isnull().sum()

Postal Code     0
Borough         0
Neighborhood    0
dtype: int64

##### Let us now check how many rows and columns has the csv file.

In [4]:
df.shape

(546, 3)

##### More than one neighborhood can exist in one postal code area, so when this happens we are going to combine them into one row with the different neighborhoods separated with a comma.

In [5]:
#Combine all neighbourhoods where postcode and Borough are the same
df = df.groupby(['Postal Code', 'Borough'])['Neighborhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()
df.head(100)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,10101,Nuevo Distrito Nacional,Centro de Los Heroes
1,10102,Nuevo Distrito Nacional,Mata Hambre
2,10103,Nuevo Distrito Nacional,Zona Universitaria
3,10104,Nuevo Distrito Nacional,San Geronimo
4,10105,Nuevo Distrito Nacional,Ciudad Universitaria
5,10106,Nuevo Distrito Nacional,Los Robles
6,10107,Nuevo Distrito Nacional,"Esperilla, El Vergel"
7,10108,Nuevo Distrito Nacional,"Esperilla, El Manguito"
8,10109,Nuevo Distrito Nacional,La Julia
9,10110,Nuevo Distrito Nacional,El Embajador


##### Now, we need to get the latitude and longitude coordinates of each neighborhood in order to utilize the Foursquare location data. First, we are going to upload the csv file with the coordinates.

In [6]:
postalcodes = pd.read_csv('SD_PostalCodes.csv',  encoding='latin-1')
postalcodes.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,10101,18.44938,-69.926
1,10102,18.4552,-69.92749
2,10103,18.46051,-69.91781
3,10104,18.46982,-69.96397
4,10105,18.46114,-69.91707


##### Now, we are going to create a new dataframe with the neighborhoods and coordinates.

In [7]:
import io

santodomingo = df.join(postalcodes.set_index('Postal Code'), on='Postal Code')
santodomingo.head(20)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,10101,Nuevo Distrito Nacional,Centro de Los Heroes,18.44938,-69.926
1,10102,Nuevo Distrito Nacional,Mata Hambre,18.4552,-69.92749
2,10103,Nuevo Distrito Nacional,Zona Universitaria,18.46051,-69.91781
3,10104,Nuevo Distrito Nacional,San Geronimo,18.46982,-69.96397
4,10105,Nuevo Distrito Nacional,Ciudad Universitaria,18.46114,-69.91707
5,10106,Nuevo Distrito Nacional,Los Robles,18.4781,-69.93069
6,10107,Nuevo Distrito Nacional,"Esperilla, El Vergel",18.48061,-69.98381
7,10108,Nuevo Distrito Nacional,"Esperilla, El Manguito",18.48061,-69.98381
8,10109,Nuevo Distrito Nacional,La Julia,18.46149,-69.92675
9,10110,Nuevo Distrito Nacional,El Embajador,18.45683,-69.93451


In [8]:
print('The Santo Domingo dataframe has {} boroughs and {} neighborhoods.'.format(
len(santodomingo['Borough'].unique()),
santodomingo.shape[0]))

The Santo Domingo dataframe has 4 boroughs and 207 neighborhoods.


#### Cluster the neighborhoods in Santo Domingo.

##### Now, we are going to use geopy library to get the latitude and longitude values of Santo Domingo. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent to_explorer, as shown below.

In [9]:
address = 'Santo Domingo, DR'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Santo Domingo are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Santo Domingo are 18.4801972, -69.942111.


##### Let's create a map of Santo Domingo with neighborhoods superimposed on top.

In [10]:
# create map of Santo Domingo using latitude and longitude values
map_santodomingo = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(santodomingo['Latitude'], santodomingo['Longitude'], santodomingo['Borough'], santodomingo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_santodomingo)  
    
map_santodomingo

##### Define Foursquare Credentials and Version

In [11]:
CLIENT_ID = 'TK4EDFVI5PLNCRV0TTWQDBEQUK2DD4S0OPRLEOUKZW05YJGU' # your Foursquare ID
CLIENT_SECRET = 'KTR0PVEOO3QW1AMNY45T4B5MTCDAZWRKKNU01D3UEYDMOCT4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TK4EDFVI5PLNCRV0TTWQDBEQUK2DD4S0OPRLEOUKZW05YJGU
CLIENT_SECRET:KTR0PVEOO3QW1AMNY45T4B5MTCDAZWRKKNU01D3UEYDMOCT4


##### Let's explore the first neighborhood in our dataframe.

In [12]:
santodomingo.loc[0, 'Neighborhood']

'Centro de Los Heroes'

##### Get the neighborhood's latitude and longitude values.

In [13]:
neighborhood_latitude = santodomingo.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = santodomingo.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = santodomingo.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Centro de Los Heroes are 18.449379999999998, -69.926.


##### Now, let's get the top 50 venues that are in Centro de Los Heroes within a radius of 1000 meters. First, we need to create the GET request URL. Name your URL url.

In [14]:
LIMIT = 50 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=TK4EDFVI5PLNCRV0TTWQDBEQUK2DD4S0OPRLEOUKZW05YJGU&client_secret=KTR0PVEOO3QW1AMNY45T4B5MTCDAZWRKKNU01D3UEYDMOCT4&v=20180605&ll=18.449379999999998,-69.926&radius=1000&limit=50'

##### Send the GET request and examine the resutls

In [15]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f0276c9d4c9e93750ca5059'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Mata Hambre',
  'headerFullLocation': 'Mata Hambre, Santo Domingo',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 47,
  'suggestedBounds': {'ne': {'lat': 18.458380009000006,
    'lng': -69.91653007518448},
   'sw': {'lat': 18.44037999099999, 'lng': -69.93546992481552}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c071c6e8b4520a1df898697',
       'name': 'Bella Italia',
       'location': {'address': 'José Contreras 165',
        'crossStreet': 'Av. Jiménez Moya',
        'lat': 18.456010629452443,
        'lng': -69.92958814616736,
        'labele

##### Let's borrow the get_category_type function from the Foursquare lab.

In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

##### Now we are ready to clean the json and structure it into a pandas dataframe.

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Bella Italia,Italian Restaurant,18.456011,-69.929588
1,VIP Room,Nightclub,18.454348,-69.923552
2,Salt & Pepper,BBQ Joint,18.443298,-69.930927
3,Meson de la Cava,Spanish Restaurant,18.453167,-69.93318
4,Albert's licores,Wine Bar,18.45507,-69.930221


In [18]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

47 venues were returned by Foursquare.


##### Let's create a function to repeat the same process to all the neighborhoods in Santo Domingo

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

##### Now write the code to run the above function on each neighborhood and create a new dataframe called santodomingo_venues.

In [21]:
santodomingo_venues = getNearbyVenues(names=santodomingo['Neighborhood'],
                                   latitudes=santodomingo['Latitude'],
                                   longitudes=santodomingo['Longitude']
                                  )

Centro de Los Heroes
Mata Hambre
Zona Universitaria
San Geronimo
Ciudad Universitaria
Los Robles
Esperilla, El Vergel
Esperilla, El Manguito
La Julia
El Embajador
Bella Vista
Bella Vista
Sarasota
Mirador Norte
Rocamar, El Portal, Atala
Ciudad Gandera, 30 de Mayo
Hoduras, Ensanche La Paz
Los Arrecifes, El Cacique, Costa Brava, Antillas
Naco
La Yuca
Arboleda (naco)
Naco, Centro Olimpico
Naco
Naco
Lopez de Vega
Paraiso
Piantini
Carmelita
Urbanizacion Fernandez
Los Praditos, Julieta Morales
Los Prados
Los Prados
Castellana
San Geronimo
Urbanizacion Las Praderas, Encarnacion, Ciudad Moderna
Residencial Rosmil
Los Restauradores
Manganagua
Los Millones
Los Millones
Los Millones
Los Millones
Estela Marina (los Millones)
Los Millones
Ensanche Quisqueya
Ensanche Quisqueya
Evaristo Morales
Piantini
El Millon
Piantini
Don Bosco
Don Bosco
Miraflores
Gazcue (plaza de La Cultura)
Gazcue
Ensanche Independencia
La Primavera
Ciudad Nueva
Ensanche Lugo
Zona Colonial
San Lazaro
San Miguel
Borojol
Villa Fr

##### Let's check the size of the resulting dataframe

In [22]:
print(santodomingo_venues.shape)
santodomingo_venues.head()

(6440, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro de Los Heroes,18.44938,-69.926,Bella Italia,18.456011,-69.929588,Italian Restaurant
1,Centro de Los Heroes,18.44938,-69.926,VIP Room,18.454348,-69.923552,Nightclub
2,Centro de Los Heroes,18.44938,-69.926,Salt & Pepper,18.443298,-69.930927,BBQ Joint
3,Centro de Los Heroes,18.44938,-69.926,Meson de la Cava,18.453167,-69.93318,Spanish Restaurant
4,Centro de Los Heroes,18.44938,-69.926,Albert's licores,18.45507,-69.930221,Wine Bar


##### Let's check how many venues were returned for each neighborhood.

In [23]:
santodomingo_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"12 de Haina, Residencial Canaan, Villas del Cafe",16,16,16,16,16,16
"16 de Agosto, Aesa, Atlantida, Barrio Invi",50,50,50,50,50,50
"2 de Enero, Altos de Sabana Perdida, Cerros de Sabana Perdida, Colinas del Ozama, Ensanche Cristal, La Barquita, Sabana Centro, Sabana Perdida",4,4,4,4,4,4
24 de Abril,50,50,50,50,50,50
"24 de Abril, Barrio Las Mercedes, El Chucho, Savica, Urbanizacion Las Mercedes, Varia",50,50,50,50,50,50
"Alameda, Barrio Antillano, Batey Bienvenido, Bella Colina, Buenas Noches, Hato Nuevo, La Venta, Manoguayabo, Residencial Alameda, Residencial Almendra, San Miguel, Villa Peravia",9,9,9,9,9,9
"Alma Rosa II, El Rosal, Ivette, Urbanizacion Italia",33,33,33,33,33,33
Altos de Arroyo Hondo II,50,50,50,50,50,50
"Altos del Oeste, El Catorce, La Cienaga, La Concordia, Villas Naco",50,50,50,50,50,50
"Altos del Parque, Barrio Hermanas Mirabal, Barrio Nuevo, Jacagua, Las Palmeras, Los Casabes, Ponce, Proyecto Bnv",24,24,24,24,24,24


##### Let's find out how many unique categories can be curated from all the returned venues.

In [24]:
print('There are {} uniques categories.'.format(len(santodomingo_venues['Venue Category'].unique())))

There are 232 uniques categories.


#### Analyze Each Neighborhood

In [25]:
# one hot encoding
santodomingo_onehot = pd.get_dummies(santodomingo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
santodomingo_onehot['Neighborhood'] = santodomingo_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [santodomingo_onehot.columns[-1]] + list(santodomingo_onehot.columns[:-1])
santodomingo_onehot = santodomingo_onehot[fixed_columns]

santodomingo_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Airport,American Restaurant,Aquarium,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Bridal Shop,Buffet,Burger Joint,Bus Station,Business Service,Butcher,Cable Car,Cafeteria,Café,Car Wash,Caribbean Restaurant,Casino,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Cafeteria,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Event Space,Fabric Shop,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Travel,German Restaurant,Gift Shop,Go Kart Track,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,High School,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Light Rail Station,Liquor Store,Lounge,Market,Massage Studio,Medical Lab,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mobility Store,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,Nightclub,Non-Profit,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paintball Field,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Post Office,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,River,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toy / Game Store,Vegetarian / Vegan Restaurant,Veterinarian,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Centro de Los Heroes,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Centro de Los Heroes,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Centro de Los Heroes,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Centro de Los Heroes,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Centro de Los Heroes,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0


##### And let's examine the new dataframe size.

In [26]:
santodomingo_onehot.shape

(6440, 232)

##### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [27]:
santodomingo_grouped = santodomingo_onehot.groupby('Neighborhood').mean().reset_index()
santodomingo_grouped

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Airport,American Restaurant,Aquarium,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Brewery,Bridal Shop,Buffet,Burger Joint,Bus Station,Business Service,Butcher,Cable Car,Cafeteria,Café,Car Wash,Caribbean Restaurant,Casino,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Cafeteria,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Event Space,Fabric Shop,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Travel,German Restaurant,Gift Shop,Go Kart Track,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,High School,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Light Rail Station,Liquor Store,Lounge,Market,Massage Studio,Medical Lab,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mobility Store,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nightclub,Non-Profit,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paintball Field,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Playground,Plaza,Pool,Pool Hall,Post Office,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,River,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toy / Game Store,Vegetarian / Vegan Restaurant,Veterinarian,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"12 de Haina, Residencial Canaan, Villas del Cafe",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"16 de Agosto, Aesa, Atlantida, Barrio Invi",0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02
2,"2 de Enero, Altos de Sabana Perdida, Cerros de...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,24 de Abril,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.08,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0
4,"24 de Abril, Barrio Las Mercedes, El Chucho, S...",0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02
5,"Alameda, Barrio Antillano, Batey Bienvenido, B...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Alma Rosa II, El Rosal, Ivette, Urbanizacion I...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.121212,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.121212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.060606,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Altos de Arroyo Hondo II,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.08,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0
8,"Altos del Oeste, El Catorce, La Cienaga, La Co...",0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02
9,"Altos del Parque, Barrio Hermanas Mirabal, Bar...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Let's confirm the new size.

In [28]:
santodomingo_grouped.shape

(160, 232)

##### Let's print each neighborhood along with the top 5 most common venues.

In [29]:
num_top_venues = 5

for hood in santodomingo_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = santodomingo_grouped[santodomingo_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----12 de Haina, Residencial Canaan, Villas del Cafe----
                venue  freq
0          Restaurant  0.12
1                Park  0.12
2               Motel  0.12
3  Italian Restaurant  0.06
4                Bank  0.06


----16 de Agosto, Aesa, Atlantida, Barrio Invi----
              venue  freq
0            Bakery  0.08
1     Shopping Mall  0.04
2              Café  0.04
3  Department Store  0.04
4     Jewelry Store  0.02


----2 de Enero, Altos de Sabana Perdida, Cerros de Sabana Perdida, Colinas del Ozama, Ensanche Cristal, La Barquita, Sabana Centro, Sabana Perdida----
                venue  freq
0  Salon / Barbershop  0.25
1   Food & Drink Shop  0.25
2          Hookah Bar  0.25
3           Cable Car  0.25
4     Paintball Field  0.00


----24 de Abril----
                      venue  freq
0  Mediterranean Restaurant  0.08
1        Italian Restaurant  0.06
2                Restaurant  0.06
3                Food Truck  0.04
4               Pizza Place  0.04


----24 de Abril, 

##### Let's put that into a pandas dataframe. First, let's write a function to sort the venues in descending order.

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

##### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [58]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = santodomingo_grouped['Neighborhood']

for ind in np.arange(santodomingo_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(santodomingo_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"12 de Haina, Residencial Canaan, Villas del Cafe",Motel,Park,Restaurant,Pharmacy,Italian Restaurant,Shoe Store,Department Store,Bank,Toll Booth,Pier
1,"16 de Agosto, Aesa, Atlantida, Barrio Invi",Bakery,Shopping Mall,Department Store,Café,Yoga Studio,Pharmacy,Cocktail Bar,Peruvian Restaurant,Pet Store,Pizza Place
2,"2 de Enero, Altos de Sabana Perdida, Cerros de...",Food & Drink Shop,Hookah Bar,Cable Car,Salon / Barbershop,Yoga Studio,Event Space,Food,Flower Shop,Flea Market,Film Studio
3,24 de Abril,Mediterranean Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Bakery,Supermarket,Sushi Restaurant,Food Truck,Pizza Place,Wine Shop
4,"24 de Abril, Barrio Las Mercedes, El Chucho, S...",Bakery,Shopping Mall,Department Store,Café,Yoga Studio,Pharmacy,Cocktail Bar,Peruvian Restaurant,Pet Store,Pizza Place


#### Cluster Neighborhoods

##### Run k-means to cluster the neighborhood into 5 clusters.

In [59]:
# set number of clusters
kclusters = 5

santodomingo_grouped_clustering = santodomingo_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(santodomingo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 1, 1, 0, 1, 1, 0], dtype=int32)

##### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [60]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

santodomingo_merged = santodomingo

# merge santodomingo_grouped with santodomingo_data to add latitude/longitude for each neighborhood
santodomingo_merged = santodomingo_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

santodomingo_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10101,Nuevo Distrito Nacional,Centro de Los Heroes,18.44938,-69.926,0.0,Bank,Casino,Ice Cream Shop,Food Truck,Beer Garden,Italian Restaurant,Fast Food Restaurant,Hotel,Sandwich Place,Beer Store
1,10102,Nuevo Distrito Nacional,Mata Hambre,18.4552,-69.92749,0.0,Spanish Restaurant,Ice Cream Shop,Italian Restaurant,Restaurant,Bank,Argentinian Restaurant,Food Truck,Pizza Place,Empanada Restaurant,Latin American Restaurant
2,10103,Nuevo Distrito Nacional,Zona Universitaria,18.46051,-69.91781,0.0,Fast Food Restaurant,Pizza Place,Ice Cream Shop,Italian Restaurant,Empanada Restaurant,Nightclub,Coffee Shop,Steakhouse,Food Court,Mediterranean Restaurant
3,10104,Nuevo Distrito Nacional,San Geronimo,18.46982,-69.96397,0.0,Ice Cream Shop,Pizza Place,Park,Gym,Bakery,Fast Food Restaurant,Supermarket,Sandwich Place,Italian Restaurant,Empanada Restaurant
4,10105,Nuevo Distrito Nacional,Ciudad Universitaria,18.46114,-69.91707,0.0,Fast Food Restaurant,Pizza Place,Ice Cream Shop,Steakhouse,Italian Restaurant,Department Store,Coffee Shop,Hotel,Indian Restaurant,Food Court


In [61]:
santodomingo_merged = santodomingo_merged[~santodomingo_merged['Cluster Labels'].isnull()]

In [62]:
santodomingo_merged['Cluster Labels'] = santodomingo_merged['Cluster Labels'].astype(int)

##### Finally, let's visualize the resulting clusters.

In [63]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(santodomingo_merged['Latitude'], santodomingo_merged['Longitude'], santodomingo_merged['Neighborhood'], santodomingo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine Clusters

##### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

##### Cluster 1

In [64]:
santodomingo_merged.loc[santodomingo_merged['Cluster Labels'] == 0, santodomingo_merged.columns[[1] + list(range(5, santodomingo_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Nuevo Distrito Nacional,0,Bank,Casino,Ice Cream Shop,Food Truck,Beer Garden,Italian Restaurant,Fast Food Restaurant,Hotel,Sandwich Place,Beer Store
1,Nuevo Distrito Nacional,0,Spanish Restaurant,Ice Cream Shop,Italian Restaurant,Restaurant,Bank,Argentinian Restaurant,Food Truck,Pizza Place,Empanada Restaurant,Latin American Restaurant
2,Nuevo Distrito Nacional,0,Fast Food Restaurant,Pizza Place,Ice Cream Shop,Italian Restaurant,Empanada Restaurant,Nightclub,Coffee Shop,Steakhouse,Food Court,Mediterranean Restaurant
3,Nuevo Distrito Nacional,0,Ice Cream Shop,Pizza Place,Park,Gym,Bakery,Fast Food Restaurant,Supermarket,Sandwich Place,Italian Restaurant,Empanada Restaurant
4,Nuevo Distrito Nacional,0,Fast Food Restaurant,Pizza Place,Ice Cream Shop,Steakhouse,Italian Restaurant,Department Store,Coffee Shop,Hotel,Indian Restaurant,Food Court
8,Nuevo Distrito Nacional,0,Bakery,Gym / Fitness Center,Ice Cream Shop,Italian Restaurant,Spanish Restaurant,Fast Food Restaurant,Supermarket,Restaurant,Furniture / Home Store,Gift Shop
9,Nuevo Distrito Nacional,0,Pizza Place,Gym,Spanish Restaurant,Ice Cream Shop,Argentinian Restaurant,Food Truck,Italian Restaurant,Gym / Fitness Center,Coffee Shop,American Restaurant
10,Nuevo Distrito Nacional,0,Pizza Place,Gym,American Restaurant,Bakery,Spanish Restaurant,Argentinian Restaurant,Ice Cream Shop,Coffee Shop,Post Office,Pie Shop
11,Nuevo Distrito Nacional,0,Pizza Place,Gym,American Restaurant,Bakery,Spanish Restaurant,Argentinian Restaurant,Ice Cream Shop,Coffee Shop,Post Office,Pie Shop
12,Nuevo Distrito Nacional,0,Sandwich Place,Supermarket,Pizza Place,Ice Cream Shop,American Restaurant,Gym,Restaurant,Pie Shop,Food Court,Food & Drink Shop


#### Cluster 2

In [65]:
santodomingo_merged.loc[santodomingo_merged['Cluster Labels'] == 1, santodomingo_merged.columns[[1] + list(range(5, santodomingo_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Nuevo Distrito Nacional,1,Mediterranean Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Bakery,Supermarket,Sushi Restaurant,Food Truck,Pizza Place,Wine Shop
6,Nuevo Distrito Nacional,1,Park,BBQ Joint,Hardware Store,Shopping Mall,Nightclub,Bank,Gym,Snack Place,Diner,Dim Sum Restaurant
7,Nuevo Distrito Nacional,1,Park,BBQ Joint,Hardware Store,Shopping Mall,Nightclub,Bank,Gym,Snack Place,Diner,Dim Sum Restaurant
14,Nuevo Distrito Nacional,1,Mediterranean Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Bakery,Supermarket,Sushi Restaurant,Food Truck,Pizza Place,Wine Shop
15,Nuevo Distrito Nacional,1,Mediterranean Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Bakery,Supermarket,Sushi Restaurant,Food Truck,Pizza Place,Wine Shop
17,Nuevo Distrito Nacional,1,Mediterranean Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Bakery,Supermarket,Sushi Restaurant,Food Truck,Pizza Place,Wine Shop
24,Nuevo Distrito Nacional,1,Mediterranean Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Bakery,Supermarket,Sushi Restaurant,Food Truck,Pizza Place,Wine Shop
25,Nuevo Distrito Nacional,1,Shopping Mall,Gym,Coffee Shop,Café,Restaurant,Mediterranean Restaurant,Bistro,Furniture / Home Store,Bakery,Clothing Store
26,Nuevo Distrito Nacional,1,Bakery,Mediterranean Restaurant,Bistro,Fast Food Restaurant,Wine Shop,Hotel,Italian Restaurant,Salon / Barbershop,Ice Cream Shop,Gym
28,Nuevo Distrito Nacional,1,Department Store,Pie Shop,Gym / Fitness Center,Bakery,Dance Studio,Wings Joint,Souvenir Shop,Shopping Mall,Middle Eastern Restaurant,Café


##### Cluster 3

In [66]:
santodomingo_merged.loc[santodomingo_merged['Cluster Labels'] == 2, santodomingo_merged.columns[[1] + list(range(5, santodomingo_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,Nuevo Distrito Nacional,2,River,Yoga Studio,Empanada Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Film Studio,Fast Food Restaurant


##### Cluster 4

In [67]:
santodomingo_merged.loc[santodomingo_merged['Cluster Labels'] == 3, santodomingo_merged.columns[[1] + list(range(5, santodomingo_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
77,Nuevo Distrito Nacional,3,Baseball Field,Pharmacy,Metro Station,Big Box Store,Gym / Fitness Center,Coffee Shop,Auto Garage,Sports Bar,Fast Food Restaurant,Falafel Restaurant
81,Nuevo Distrito Nacional,3,Big Box Store,Ice Cream Shop,Coffee Shop,Pharmacy,Gym,Food Truck,Deli / Bodega,Food Court,Cupcake Shop,Food & Drink Shop
82,Nuevo Distrito Nacional,3,Baseball Field,Pharmacy,Metro Station,Big Box Store,Gym / Fitness Center,Coffee Shop,Auto Garage,Sports Bar,Fast Food Restaurant,Falafel Restaurant
83,Nuevo Distrito Nacional,3,Baseball Field,Pharmacy,BBQ Joint,Coffee Shop,Big Box Store,Farmers Market,Park,Food Truck,Dim Sum Restaurant,Diner
84,Nuevo Distrito Nacional,3,Bank,Metro Station,Farmers Market,Park,Food Truck,Yoga Studio,Fabric Shop,Food Court,Food & Drink Shop,Food
85,Nuevo Distrito Nacional,3,Bank,BBQ Joint,Food Truck,Baseball Field,Farmers Market,Latin American Restaurant,Clothing Store,Pharmacy,Flea Market,Film Studio
86,Nuevo Distrito Nacional,3,Bank,BBQ Joint,Food Truck,Baseball Field,Farmers Market,Latin American Restaurant,Clothing Store,Pharmacy,Flea Market,Film Studio
108,Nuevo Distrito Nacional,3,Bank,BBQ Joint,Food Truck,Baseball Field,Farmers Market,Latin American Restaurant,Clothing Store,Pharmacy,Flea Market,Film Studio
110,Nuevo Distrito Nacional,3,Pharmacy,Go Kart Track,Gym,Ice Cream Shop,Restaurant,Photography Studio,Italian Restaurant,Shopping Mall,Department Store,Fried Chicken Joint
121,Santo Domingo Oeste,3,Women's Store,Pharmacy,Basketball Court,Bus Station,Fabric Shop,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market


##### Cluster 5

In [68]:
santodomingo_merged.loc[santodomingo_merged['Cluster Labels'] == 4, santodomingo_merged.columns[[1] + list(range(5, santodomingo_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
148,Santo Domingo Norte,4,Go Kart Track,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Film Studio,Fast Food Restaurant


#### Results

The following were the results obtained in each of the five clusters created:

•	First Cluster. In the column of first most common venue we see a tendency of business like different types of restaurants (Spanish, Fast Food, Vegetarian and Caribbean), Ice Cream Shops, Pizza places, Metro Stations, Food Trucks, BBQ and Burger Joints. So basically, in this cluster we can see businesses in the food market.

•	Second Cluster. The businesses most common in this clusters are the ones destined to gather people or to offer a special kind of food like Mediterranean and Chinese restaurants, Bakeries, Bars, Café, Nightclubs, Sports Clubs, Lounges

•	Third Cluster. This one is contained just by one neighborhood with the most common venue as River.

•	Fourth Cluster. Health, money transactions and big gathering places are the most common venues in this clusters, like Baseball Fields, Banks and Pharmacies.

•	Fifth Cluster. This one is contained just by one neighborhood with the most common venue as Go Kart Track.


#### Discussion

Josh is trying to determine the best place to open a bakery in Santo Domingo and by the K-means clustering, we can determine that the second cluster contains the most common bakery venues. Interestingly, these bakeries are not in the Borough of Nuevo Distrito Nacional (where the most people live and the most developed borough) but in Santo Domingo Oeste (one of the least developed Boroughs in Santo Domingo. 

These means that the people who live in the neighborhoods of these borough tend to buy much more in bakeries than in the neighborhoods of different boroughs. If Josh wants to guarantee the success of his business, then the neighborhoods in Santo Domingo Oeste are the ones with the highest possibilities of achieving that.

#### Conclusion

In this study, I analyzed the most common venues of all the neighborhoods in the city of Santo Domingo, Dominican Republic to determined where is the best place to open a Bakery Store. After adjusting the data, I used the K-means clustering method to divide the city into five different groups and determined which one had bakeries as the most common venues. This model could help anybody to analyze the data of any city or country and determine specific characteristics to be able to answer multiple questions.