# Where should I go?

## Analysis of the state cities of Colima

### Our database is composed, in Excel format, of 16 columns and 564 rows. But this database shows us the real estate assets of the state of Colima in the year 2018.


In [11]:
# Download our excel data
import pandas as pd

#Initial Data
df = pd.read_csv('Postcode Colima.csv')
df.head(2) #Initial Data

Unnamed: 0,clave_catastral,FolioReal,Descripcion,Nombre,SuperficieTerrenoM2,SuperficieTerrenoHA,IdEstado,IdMunicipio,NombreAsentamientoHumano,NombreLocalidad,CodigoPostal,NumExterior,NumInterior,Latitud,Longitud,Georeferencia (poligono)
0,01-01-01-030-001-000,68653,SALA DE USOS MULTIPLES CIUDAD DE ARMERÍA,SALA DE USOS MULTIPLES,900.0,0.09,COLIMA,ARMERIA,SALA DE USOS MULTIPLES,CIUDAD DE ARMERÍA,28300,S/N,S/N,18.936781,-103.964283,"POLYGON ((-103.9643650779 18.9367822416,-103.9..."
1,01-01-92-090-456-000,96263,INSTALACIONES RADAR METEOROLOGICO DE CONAGUA C...,INSTALACIONES RADAR METEOROLOGICO DE CONAGUA,1456.3,0.14563,COLIMA,ARMERIA,INSTALACIONES RADAR METEOROLOGICO DE CONAGUA,CUYUTLÁN,28350,S/N,S/N,18.936226,-104.099957,"POLYGON ((-104.1002111171 18.9360290722,-104.1..."


### Given the origin of the information we had repeated values, because obviously the state government can have more than one real estate in the same city and with an address associated with the same postal code. 

### So, it was necessary removed all the columns and rows that did not contain relevant information for the analysis and only kept the columns that had the name of the city, municipality, postal code, latitude and longitude.

In [12]:
# Removed all the columns that did not contain relevant information for the analysis and only kept the columns that had the name of the city, municipality, postal code, latitude and longitude
df.drop(['clave_catastral','FolioReal','Descripcion','Nombre','SuperficieTerrenoM2','SuperficieTerrenoHA','IdEstado'
         ,'NombreAsentamientoHumano','NumExterior','NumInterior','Georeferencia (poligono)'] ,axis=1, inplace = True)

df.rename(columns={'IdMunicipio' : 'Municipality', 'NombreLocalidad': 'City', 'Latitud': 'Latitude',
                   'Longitud' : 'Longitude', 'CodigoPostal' : 'PostalCode'}, inplace = True)
df.head()

Unnamed: 0,Municipality,City,PostalCode,Latitude,Longitude
0,ARMERIA,CIUDAD DE ARMERÍA,28300,18.936781,-103.964283
1,ARMERIA,CUYUTLÁN,28350,18.936226,-104.099957
2,ARMERIA,LOS REYES (ZORRILLOS),28340,18.980788,-104.062704
3,COLIMA,COLIMA,28000,19.243405,-103.726111
4,COLIMA,PISCILA,28600,19.172035,-103.697902


### More than one city can exist in one postal code area. These rows will be combined into one row with the neighborhoods separated with a comma

### To solve this feature we will do the following:

#### We create a table with the account of each different postal code

In [15]:
#Number of rows of our dataframe
N = df.shape[0] 

#We have a table call 'postal' that tells us how many elements there are for each different postal code
postal = df['PostalCode'].value_counts().to_frame()
A = postal.shape[0]
B = list(range(0,A,1))

postal[''] = B
postal['Postcodename']=postal.index
postal.set_index('', inplace=True)

### In two empty lists, called 'Caja1' and 'Caja2', we put the elements of the table that meet the conditions that have the repetition of the postal code greater than 1 or only have a one postal code

In [16]:
# Lists that include postal codes that are repeated more than once and those that do not
Caja1 = []
Caja2 = []

for i in range(0,A):
    if postal.loc[i,'PostalCode'] > 1:
        Caja1.append(postal.loc[i,'Postcodename'])

for i in range(0,A):
    if postal.loc[i,'PostalCode'] == 1:
        Caja2.append(postal.loc[i,'Postcodename'])

### We transform those lists to tables (df_1, df_2)

In [17]:
#Now we have a table that returns the postal codes that appear more than once in the data        
df_1 = pd.DataFrame(Caja1)
df_1.rename(columns={0: 'Code'}, inplace = True)
J = df_1.shape[0]

#Now we have a table that returns the postal codes that appear only once in the data
df_2 = pd.DataFrame(Caja2)
df_2.rename(columns={0: 'Code'}, inplace = True)
P = df_2.shape[0]

### It is time to create the table that shows only the elements that have their postal code repeated

In [18]:
# We create the table called table1 which contains all the elements that have their postal code repeated at least twice
Vacio1 = []
tabla1 = pd.DataFrame(Vacio1)

for i in range(0,N):
    for j in range(0,J):
        if Caja1[j] == df.loc[i,'PostalCode']:
            tabla1 = tabla1.append(df.loc[i,].to_frame().transpose(), ignore_index=True)
        
W = tabla1.shape[0]
tabla1.head() 

Unnamed: 0,Municipality,City,PostalCode,Latitude,Longitude
0,COLIMA,PISCILA,28600,19.172,-103.698
1,COLIMA,LOS TEPAMES,28600,19.0939,-103.623


### It is time to create the table that shows only the elements that do NOT have their postal code repeated

In [19]:
# We create the table called table2 which contains all the elements that have their postal code unique
Vacio2 = []
tabla2 = pd.DataFrame(Vacio2)
for i in range(0,N):
    for j in range(0,P):
        if Caja2[j] == df.loc[i,'PostalCode']:
            tabla2 = tabla2.append(df.loc[i,].to_frame().transpose(), ignore_index=True)

K = tabla2.shape[0]
tabla2.head()

Unnamed: 0,Municipality,City,PostalCode,Latitude,Longitude
0,ARMERIA,CIUDAD DE ARMERÍA,28300,18.9368,-103.964
1,ARMERIA,CUYUTLÁN,28350,18.9362,-104.1
2,ARMERIA,LOS REYES (ZORRILLOS),28340,18.9808,-104.063
3,COLIMA,COLIMA,28000,19.2434,-103.726
4,COMALA,COMALA,28450,19.3298,-103.752


### Since we have both tables it will be necessary to concatenate the elements that have the same postal code

### For that, a table is created for each type of repetition. (Note that the maximum of repetitions  in this case are 2 times)

### And with the following iterations, the elements of the City columns are concatenated and added to a different table 

In [21]:
#We get the table without the repeated postal codes of the case in which they were repeated only 2 times
Vacio3 = []
C2 = tabla1.copy()
T2 = C2.shape[0]
Buena2 = pd.DataFrame(Vacio3)

for i in range(0,T2-1):
    if tabla1.loc[i,'PostalCode'] == tabla1.loc[i+1,'PostalCode']:
        C2.loc[i,'City'] = C2.loc[i,'City'] + ',' + ' ' + C2.loc[i+1,'City']
        Buena2 = Buena2.append(C2.loc[i,], ignore_index=True)

Buena2.head()

Unnamed: 0,City,Latitude,Longitude,Municipality,PostalCode
0,"PISCILA, LOS TEPAMES",19.172035,-103.697902,COLIMA,28600.0


### Finally, since we have all the concatenated and clean tables, it would only be necessary to join them to obtain the final table

In [22]:

Colima_df = tabla2.append(Buena2,ignore_index=True, sort =True)

#Fixing the last details in the final dataframe
Latitude = Colima_df['Latitude']
Longitude = Colima_df['Longitude']
cols = Colima_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
Colima_df = Colima_df[cols]
Colima_df.drop(['Latitude','Longitude'], axis=1, inplace =True)
Colima_df['Latitude'] = Latitude
Colima_df['Longitude'] = Longitude

Colima_df.head()

Unnamed: 0,PostalCode,City,Municipality,Latitude,Longitude
0,28300,CIUDAD DE ARMERÍA,ARMERIA,18.9368,-103.964
1,28350,CUYUTLÁN,ARMERIA,18.9362,-104.1
2,28340,LOS REYES (ZORRILLOS),ARMERIA,18.9808,-104.063
3,28000,COLIMA,COLIMA,19.2434,-103.726
4,28450,COMALA,COMALA,19.3298,-103.752


### Now we can start our clustering analysis to Colima State

In [23]:
# importing the necessary libraries
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!conda install -c conda-forge geopy --yes 
!conda install -c conda-forge folium=0.5.0 --yes 

import numpy as np 
import json 
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium 

from sklearn.cluster import KMeans
from pandas.io.json import json_normalize 
from geopy.geocoders import Nominatim 

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         238 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

In [24]:
CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # your Foursquare ID
CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

### Let's explore our cities in our Colima dataframe and let's get the top 100 venues that are in our municipalities within a radius of 10 000 meters.

In [25]:
# we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

def getNearbyVenues(names, latitudes, longitudes):
    
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 10000 # define radius

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [27]:
Colima_venues = getNearbyVenues(names = Colima_df['City'],
                                   latitudes = Colima_df['Latitude'],
                                   longitudes = Colima_df['Longitude'])

CIUDAD DE ARMERÍA
CUYUTLÁN
LOS REYES (ZORRILLOS)
COLIMA
COMALA
COFRADÍA DE SUCHITLÁN
NOGUERAS
COQUIMATLÁN
CUAUHTÉMOC
BUENAVISTA
QUESERÍA
EL TRAPICHE
IXTLAHUACÁN
MANZANILLO
EL COLOMO
CHANDIABLO
TECOMÁN
MADRID
CIUDAD DE VILLA DE ÁLVAREZ
PISCILA, LOS TEPAMES


In [28]:
Colima_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,CIUDAD DE ARMERÍA,18.936781,-103.964283,Las Morenas,18.93673,-103.966937,Mexican Restaurant
1,CIUDAD DE ARMERÍA,18.936781,-103.964283,Playa El Paraíso,18.875615,-103.992122,Beach
2,CIUDAD DE ARMERÍA,18.936781,-103.964283,Restaurant Las Hamacas del Mayor,18.857309,-103.962682,Seafood Restaurant
3,CIUDAD DE ARMERÍA,18.936781,-103.964283,Los Carrizos,18.928527,-103.885801,Steakhouse
4,CIUDAD DE ARMERÍA,18.936781,-103.964283,El Charco de las Ranas,18.919262,-103.883821,Taco Place


### It's time to analyze each municipality and group rows by municipality and by taking the mean of the frequency of occurrence of each category

In [30]:
# one hot encoding
Colima_onehot = pd.get_dummies(Colima_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Colima_onehot['City'] = Colima_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Colima_onehot.columns[-1]] + list(Colima_onehot.columns[:-1])
Colima_onehot = Colima_onehot[fixed_columns]

In [32]:
Colima_grouped = Colima_onehot.groupby('City').mean().reset_index()
Colima_grouped.head()

Unnamed: 0,City,Wings Joint,Airport,American Restaurant,Aquarium,Argentinian Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boarding House,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Café,Candy Store,Casino,Cave,Cheese Shop,Cocktail Bar,Coffee Shop,Convenience Store,Creperie,Department Store,Diner,Electronics Store,Farm,Fast Food Restaurant,Flea Market,Food Truck,French Restaurant,Fried Chicken Joint,Garden,Garden Center,Gastropub,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Historic Site,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Lake,Lighthouse,Lingerie Store,Liquor Store,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Movie Theater,Nightclub,Outdoor Sculpture,Paella Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Resort,Restaurant,River,Road,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sports Bar,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swiss Restaurant,Taco Place,Theater,Toll Plaza,Vegetarian / Vegan Restaurant,Warehouse Store
0,BUENAVISTA,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.238095,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,CHANDIABLO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.172414,0.034483,0.034483,0.0,0.0,0.034483,0.0,0.0,0.068966,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.103448,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0
2,CIUDAD DE ARMERÍA,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.035714,0.035714,0.071429,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.0
3,CIUDAD DE VILLA DE ÁLVAREZ,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.11,0.01,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.05,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.1,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.13,0.01,0.0,0.0,0.0
4,COFRADÍA DE SUCHITLÁN,0.0,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.016129,0.032258,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.064516,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.016129,0.016129,0.0,0.016129,0.0,0.0,0.016129,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.016129,0.0,0.0,0.0,0.274194,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.016129,0.064516,0.032258,0.016129,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.064516,0.0,0.0,0.0,0.0


### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['City'] = Colima_grouped['City']

for ind in np.arange(Colima_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Colima_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BUENAVISTA,Mexican Restaurant,Brewery,Convenience Store,Seafood Restaurant,Garden,Hotel,Farm,Lounge,Park,Plaza
1,CHANDIABLO,Beach,Resort,Mexican Restaurant,Breakfast Spot,Seafood Restaurant,Japanese Restaurant,Café,Burger Joint,Plaza,Liquor Store
2,CIUDAD DE ARMERÍA,Convenience Store,Taco Place,Beach,Seafood Restaurant,Hotel,Steakhouse,Mexican Restaurant,Restaurant,Sandwich Place,Sculpture Garden
3,CIUDAD DE VILLA DE ÁLVAREZ,Taco Place,Mexican Restaurant,Seafood Restaurant,Park,Pizza Place,Ice Cream Shop,Bar,Argentinian Restaurant,Hotel,Restaurant
4,COFRADÍA DE SUCHITLÁN,Mexican Restaurant,Pizza Place,Taco Place,Coffee Shop,Restaurant,Plaza,Convenience Store,Café,Garden Center,Lake
5,COLIMA,Taco Place,Mexican Restaurant,Seafood Restaurant,Park,Ice Cream Shop,Pizza Place,Argentinian Restaurant,Bar,Sandwich Place,BBQ Joint
6,COMALA,Mexican Restaurant,Taco Place,Pizza Place,Seafood Restaurant,Café,Fast Food Restaurant,Plaza,Restaurant,Italian Restaurant,Park
7,COQUIMATLÁN,Mexican Restaurant,Taco Place,Park,Convenience Store,Sandwich Place,Hotel,Bar,Pedestrian Plaza,Outdoor Sculpture,River
8,CUAUHTÉMOC,Mexican Restaurant,Steakhouse,Brewery,Plaza,Seafood Restaurant,Grocery Store,Garden,Hotel,Convenience Store,Lounge
9,CUYUTLÁN,Garden,Hotel,Seafood Restaurant,Lingerie Store,Surf Spot,Beach,Aquarium,Cheese Shop,Cave,Cocktail Bar


### With the neighborhoods_venues_sorted dataframe we can start our clustering analysis as it shows us the 10 most common venue in each neighborhood

### It's time to start making clusters

In [36]:
# set number of clusters
kclusters = 6

Colima_grouped_clustering = Colima_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Colima_grouped_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Colima_merged = Colima_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Colima_merged = Colima_merged.join(neighborhoods_venues_sorted.set_index('City'), on='City')

Colima_merged.head()

Unnamed: 0,PostalCode,City,Municipality,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,28300,CIUDAD DE ARMERÍA,ARMERIA,18.9368,-103.964,5,Convenience Store,Taco Place,Beach,Seafood Restaurant,Hotel,Steakhouse,Mexican Restaurant,Restaurant,Sandwich Place,Sculpture Garden
1,28350,CUYUTLÁN,ARMERIA,18.9362,-104.1,1,Garden,Hotel,Seafood Restaurant,Lingerie Store,Surf Spot,Beach,Aquarium,Cheese Shop,Cave,Cocktail Bar
2,28340,LOS REYES (ZORRILLOS),ARMERIA,18.9808,-104.063,1,Surf Spot,Beach,Aquarium,Hotel,Seafood Restaurant,Lingerie Store,Warehouse Store,Casino,Cave,Cheese Shop
3,28000,COLIMA,COLIMA,19.2434,-103.726,2,Taco Place,Mexican Restaurant,Seafood Restaurant,Park,Ice Cream Shop,Pizza Place,Argentinian Restaurant,Bar,Sandwich Place,BBQ Joint
4,28450,COMALA,COMALA,19.3298,-103.752,2,Mexican Restaurant,Taco Place,Pizza Place,Seafood Restaurant,Café,Fast Food Restaurant,Plaza,Restaurant,Italian Restaurant,Park


### Since each City was assigned a Cluster Label we can create our map and thus see better what is really happening

In [37]:
# create map
address = 'Colima, Col'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10.1)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Colima_merged['Latitude'], Colima_merged['Longitude'], Colima_merged['City'], Colima_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### We can see that 6 clusters were created and the most numerous cluster is represented by the color blue, then the cluster represented by the color red and color orange, followed by purple and finally we have two individual clusters

### An early analysis would lead us to think that the cities near the coast (beach) would have more things in common and we may come to think that they should belong to the same cluster.It is clear with the map that this does not happen and in fact it has a very reasonable justification. 

### It is time to analyze each cluster one by one.

### Cluster 1: The first cluster represented by the color red has four cities, where all located in the upper part of the state. Among the most famous places stand out the Mexican restaurants, Convenience Stores,Plazas, Breweries and Parks.

In [47]:
Colima_merged.loc[Colima_merged['Cluster Labels'] == 0, Colima_merged.columns[[2] + list(range(5, Colima_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,COFRADÍA DE SUCHITLÁN,0,Mexican Restaurant,Pizza Place,Taco Place,Coffee Shop,Restaurant,Plaza,Bar,Convenience Store,Café,Lake
8,CUAUHTÉMOC,0,Mexican Restaurant,Steakhouse,Brewery,Plaza,Airport,Seafood Restaurant,Grocery Store,Garden,Hotel,Convenience Store
9,BUENAVISTA,0,Mexican Restaurant,Convenience Store,Brewery,Airport,Seafood Restaurant,Garden,Hotel,Farm,Lounge,Park
10,QUESERÍA,0,Mexican Restaurant,Steakhouse,Plaza,Department Store,Convenience Store,Seafood Restaurant,Road,BBQ Joint,Restaurant,Taco Place


### Cluster 2: The second cluster represented by the color purple has two cities, both located at the bottom of the state. Among the most famous places stand out Surf Spots, Hotels, Beaches and Seafood Restaurants.

In [48]:
Colima_merged.loc[Colima_merged['Cluster Labels'] == 1, Colima_merged.columns[[2] + list(range(5, Colima_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,CUYUTLÁN,1,Surf Spot,Hotel,Beach,Seafood Restaurant,Garden,Lingerie Store,Aquarium,City,Cheese Shop,Diner
2,LOS REYES (ZORRILLOS),1,Surf Spot,Hotel,Beach,Seafood Restaurant,Lingerie Store,Aquarium,Cheese Shop,City,Cocktail Bar,Electronics Store


### Cluster 3: The third cluster represented by the color blue has 7 cities, where all are located in the center and upper part of the state. Among the most famous places stand out Mexican food restaurant, Taco Places,  Parks and Seafood Restaurants.

In [49]:
Colima_merged.loc[Colima_merged['Cluster Labels'] == 2, Colima_merged.columns[[2] + list(range(5, Colima_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,COLIMA,2,Mexican Restaurant,Taco Place,Seafood Restaurant,Park,Ice Cream Shop,Pizza Place,Argentinian Restaurant,Sandwich Place,Bar,Fast Food Restaurant
4,COMALA,2,Mexican Restaurant,Taco Place,Pizza Place,Seafood Restaurant,Café,Fast Food Restaurant,Park,Plaza,Restaurant,Italian Restaurant
6,NOGUERAS,2,Taco Place,Mexican Restaurant,Seafood Restaurant,Pizza Place,Fast Food Restaurant,Park,Ice Cream Shop,Argentinian Restaurant,Café,Brewery
7,COQUIMATLÁN,2,Mexican Restaurant,Taco Place,Park,Convenience Store,Bar,Sandwich Place,Hotel,Theater,Burger Joint,Outdoor Sculpture
11,EL TRAPICHE,2,Mexican Restaurant,Seafood Restaurant,Taco Place,Park,Restaurant,Pizza Place,Ice Cream Shop,Argentinian Restaurant,Brewery,Steakhouse
18,CIUDAD DE VILLA DE ÁLVAREZ,2,Taco Place,Mexican Restaurant,Seafood Restaurant,Park,Pizza Place,Ice Cream Shop,Bar,Argentinian Restaurant,Gastropub,Fast Food Restaurant
19,"PISCILA, LOS TEPAMES",2,Mexican Restaurant,Park,Taco Place,Seafood Restaurant,Sandwich Place,Hotel,Pizza Place,Bar,Coffee Shop,Fast Food Restaurant


### Cluster 4: The fourth cluster represented by the Light Blue color has only one city. Among the most famous places in the city stand out rivers, Mexican food restaurants, IT Services and Casinos.

In [50]:
Colima_merged.loc[Colima_merged['Cluster Labels'] == 3, Colima_merged.columns[[2] + list(range(5, Colima_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,MADRID,3,River,Taco Place,IT Services,Mexican Restaurant,Seafood Restaurant,Basketball Court,Wings Joint,Department Store,Casino,Cave


### Cluster 5: The fifth cluster represented by the color green has only one city. Among the most famous places in the city stand out caves, parks, Mexican food restaurants and  Wings Joints.

In [51]:
Colima_merged.loc[Colima_merged['Cluster Labels'] == 4, Colima_merged.columns[[2] + list(range(5, Colima_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,IXTLAHUACÁN,4,Cave,Park,City,Mexican Restaurant,Wings Joint,Diner,Casino,Cheese Shop,Cocktail Bar,Coffee Shop


### Cluster 6 : The sixth and last cluster represented by the color orange has five cities. Among the most famous places stand out Beaches,  Seafood Restaurants, Hotels and Mexican food restaurants.

In [52]:
Colima_merged.loc[Colima_merged['Cluster Labels'] == 5, Colima_merged.columns[[2] + list(range(5, Colima_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CIUDAD DE ARMERÍA,5,Convenience Store,Taco Place,Seafood Restaurant,Beach,Steakhouse,Hotel,Mexican Restaurant,Soup Place,Restaurant,Sandwich Place
13,MANZANILLO,5,Beach,Seafood Restaurant,Mexican Restaurant,Taco Place,Resort,Restaurant,Hotel,Bistro,Pizza Place,Breakfast Spot
14,EL COLOMO,5,Taco Place,Seafood Restaurant,Mexican Restaurant,Beach,Restaurant,Hotel,Pizza Place,Bistro,Resort,Fast Food Restaurant
15,CHANDIABLO,5,Beach,Resort,Mexican Restaurant,Breakfast Spot,Seafood Restaurant,Golf Course,Japanese Restaurant,Café,Burger Joint,Convenience Store
16,TECOMÁN,5,Convenience Store,Seafood Restaurant,Restaurant,Hotel,Steakhouse,Ice Cream Shop,Sculpture Garden,Burger Joint,Soup Place,Plaza


### After this small individual analysis, we can separate the cities of Colima into two large groups. Those that are tourist cities and those that are not so much, since, if we analyze the most popular places we will realize that in clusters 1,3, 4 and 5 the most popular places are in general parks, convenience stores, caves, rivers and restaurants of Mexican food.

### On the other hand in clusters two and six the most popular places are Hotels, Beaches, Seafood Restaurants and Surf Spots.

### Trying to answer the initial question, we can now infer that the cities that we are going to recommend to open a diving business must belong to the second or sixth cluster; since, they are those that present a greater tourist activity in comparison with the other clusters. But to answer the question more accurately we must be able to choose in which cluster (2 or 6) it is more convenient to open a diving business and why?

### In order to answer this question we must analyze both clusters a little more thoroughly. 

### From the position in which most of the cities are in the sixth cluster, it can be said that these cities are tourist but do not have tourism related to aquatic activities and the reason is essentially the type of cities they are. they are cities like manzanillo, with a lot of history and related to trade and not necessarily to tourism for pleasure, but rather business tourism.

### Therefore, it is not uncommon that although they have coasts ,they are tourist centers oriented equally to cultural or business tourism and not only to “aquatic” tourism.

### Finally, if we analyze in depth the cluster two, we can see that both cities are very close to the coast and for both cities their most popular places are all related to aquatic and beach tourism.


### We can now recommend with certainty, to our client, that the group of cities that he must have in mind, to open his dive business, must belong to cluster number two, since the development of the local economy matches the type of deal.