# Finding alternative cities to emigrate from Venezuela

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

#### **Background**
A socioeconomic and political crisis that began in Venezuela during the presidency of Hugo Chávez has continued into the presidency of Nicolás Maduro (still 2020). It is marked by hyperinflation, escalating starvation, disease, crime and mortality rates, resulting in massive emigration from the country. Over the past 2 decades, many Venezuelans (Around 4-6 millions) have flee the country searching a better lifestyle than the precarious one Venezuelan regime offers. Such rapid growth of emigrants, however, has caused a general sense of crowding in certain countries (like Colombia, Peru, etc.).


#### **Problem**
The steep rise in the cost of living, insecurity and low wages are pushing Venezuelans to seek alternative places to live in. The question for this subset of people that leave the country is **how to even get started browsing prospective places to move**, as Spanish speaking countries alone are 22 cities
(Mexico, Colombia, Spain, Argentina, Peru, **Venezuela**, Chile, Guatemala, Ecuador, Cuba, Bolivia, Replica Dominicana, Honduras, El Salvador, Paraguay, Nicaragua, Costa Rica, Puerto Rico, Panama, Uruguay, Guinea Equatorial and Belize). 

To answer this question, we'll start with the assumption that potential Venezuelan emigrants looking to move are still interested in in living in Spanish speaking country and seek to find alternative cities with similar amenities as their current one (In this case, Maracaibo city). Given this scope, we can sample the superset of biggest cities within the spanish speaking countries to create a kind of "fingerprint" of popular venues (such as certain types of restaurants, stores and natural areas) for each city, and then use this to identify potential similarities with other cities. The findings of this exercise could then be used as a recommendation guide for further, in-person demographics research.

#### **Audience**
The primary audience of this study might include Venezuelan immigrants as well as other Latin American emigrants planning on leaving their country. The findings could also be used by Latin American entrepreneurs looking to open new businesses or even a way of fostering outreach and partnerships among Spanish speaking municipal chambers of commerce.

## Data <a name="data"></a>

#### **Sources**

To obtain a list of the biggest cities with all spanish speaking countries, we'll scrape Wikipedia for the list of spanish speaking countries. Then, We'll use the Foursquare venue recommendation API to obtain a list of the most popular venues for each cities and query location data (latitude/longitude) using the Nominatim in order to map all the cities and visualize the clusters.

#### Tools

We'll use the following Python libraries as commented below.

In [None]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests 
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans # import k-means from clustering stage
# !conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
print('Libraries imported.')

Libraries imported.


#### Preparation

First, obtain a list of cities by scraping the Wikipedia pages on those topics. The lists of cities on those pages are structured in tables, so we can easily use Pandas to read in the HTML table and convert it to a dataframe. We'll set up a new dataframe to store the location data for each Latin American city.

In [None]:
# Get a list of cities of Argentina and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Argentina")
argentina = tables[1][:-1]
argentina.columns = ['1','2','City','4','5','6','City1','8','9','10']
argentina = argentina[['City']]
argentina['Country']='Argentina'
argentina['Latitude']=''
argentina['Longitude']=''

In [None]:
# Get a list of cities of Mexico and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Mexico")
mexico = tables[4][:-1]
mexico.columns = ['1','2','City','4','5','6','City1','8','9','10']
mexico = mexico[['City']]
mexico['Country']='Mexico'
mexico['Latitude']=''
mexico['Longitude']=''

In [None]:
# Get a list of cities of colombia and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Colombia")
colombia = tables[4][:-1]
colombia.columns = ['1','2','City','4','5','6','City1','8','9','10']
colombia = colombia[['City']]
colombia['Country']='Colombia'
colombia['Latitude']=''
colombia['Longitude']=''

In [None]:
# Get a list of cities of Spain and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Spain")
spain = tables[3][:-1]
spain.columns = ['1','2','City','4','5','6','City1','8','9','10']
spain = spain[['City']]
spain['Country']='Spain'
spain['Latitude']=''
spain['Longitude']=''

In [None]:
# Get a list of cities of Peru and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Peru")
peru = tables[2][:-1]
peru.columns = ['1','2','City','4','5','6','City1','8','9','10']
peru = peru[['City']]
peru['Country']='Peru'
peru['Latitude']=''
peru['Longitude']=''

In [None]:
# Get a list of cities of Chile and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Chile")
chile = tables[3][:-1]
chile.columns = ['1','2','City','4','5','6','City1','8','9','10']
chile = chile[['City']]
chile['City'][1] = "Valparaíso"
chile['City'][2] = "Concepción"
chile['City'][3] = "La Serena"
chile['City'][5] = "Temuco"
chile['City'][6] = "Rancagua"
chile['City'][9] = "Chillán"
chile['Country']='Chile'
chile['Latitude']=''
chile['Longitude']=''

In [None]:
# Get a list of cities of Guatemala and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Guatemala")
guatemala = tables[8][:-1]
guatemala.columns = ['1','2','City','4','5','6','City1','8','9','10']
guatemala = guatemala[['City']]
guatemala['Country']='Guatemala'
guatemala['Latitude']=''
guatemala['Longitude']=''

In [None]:
# Get a list of cities of Ecuador State and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Ecuador")
ecuador = tables[5][:-1]
ecuador.columns = ['1','2','City','4','5','6','City1','8','9','10']
ecuador = ecuador[['City']]
ecuador['Country']='Ecuador'
ecuador['Latitude']=''
ecuador['Longitude']=''

In [None]:
# Get a list of cities of Cuba and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Cuba")
cuba = tables[5][:-1]
cuba.columns = ['1','2','City','4','5','6','City1','8','9','10']
cuba = cuba[['City']]
cuba['Country']='Cuba'
cuba['Latitude']=''
cuba['Longitude']=''

In [None]:
# Get a list of cities of Bolivia and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Bolivia")
bolivia = tables[6][:-1]
bolivia.columns = ['1','2','City','4','5','6','City1','8','9','10']
bolivia = bolivia[['City']]
bolivia['Country']='Bolivia'
bolivia['Latitude']=''
bolivia['Longitude']=''

In [None]:
# Get a list of cities of Dominican_republic and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Dominican_Republic")
dominican_republic = tables[3][:-1]
dominican_republic.columns = ['1','2','City','4','5','6','City1','8','9','10']
dominican_republic = dominican_republic[['City']]
dominican_republic['Country']='Dominican Republic'
dominican_republic['Latitude']=''
dominican_republic['Longitude']=''

In [None]:
# Get a list of cities of Honduras and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Honduras")
honduras = tables[2][:-1]
honduras.columns = ['1','2','City','4','5','6','City1','8','9','10']
honduras = honduras[['City']]
honduras['Country']='Honduras'
honduras['Latitude']=''
honduras['Longitude']=''

In [None]:
# Get a list of cities of El Salvador and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/El_salvador")
el_salvador = tables[6][:-1]
el_salvador.columns = ['1','2','City','4','5','6','City1','8','9','10']
el_salvador = el_salvador[['City']]
el_salvador['Country']='El Salvador'
el_salvador['Latitude']=''
el_salvador['Longitude']=''

In [None]:
# Get a list of cities of Paraguay and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Paraguay")
paraguay = tables[7][:-1]
paraguay.columns = ['1','2','City','4','5','6','City1','8','9','10']
paraguay = paraguay[['City']]
paraguay['Country']='Paraguay'
paraguay['Latitude']=''
paraguay['Longitude']=''

In [None]:
# Get a list of cities of Nicaragua and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Nicaragua")
nicaragua = tables[6][:-1]
nicaragua.columns = ['1','2','City','4','5','6','City1','8','9','10']
nicaragua = nicaragua[['City']]
nicaragua['Country']='Nicaragua'
nicaragua['Latitude']=''
nicaragua['Longitude']=''

In [None]:
# Get a list of cities of Costa Rica and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Costa_rica")
costa_rica = tables[3][:-1]
costa_rica.columns = ['1','2','City','4','5','6','City1','8','9','10']
costa_rica = costa_rica[['City']]
costa_rica['Country']='Costa Rica'
costa_rica['Latitude']=''
costa_rica['Longitude']=''

In [None]:
# Get a list of cities of Puerto Rico and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Puerto_rico")
puerto_rico = tables[11][:-1]
puerto_rico.columns = ['1','2','City','4','5','6','City1','8','9','10']
puerto_rico = puerto_rico[['City']]
puerto_rico['Country']='Puerto Rico'
puerto_rico['Latitude']=''
puerto_rico['Longitude']=''

In [None]:
# Get a list of cities of Panama and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Panama")
panama = tables[5][:-1]
panama.columns = ['1','2','City','4','5','6','City1','8','9','10']
panama = panama[['City']]
panama['Country']='Panama'
panama['Latitude']=''
panama['Longitude']=''

In [None]:
# Get a list of cities of Uruguay and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Uruguay")
uruguay = tables[3][:-1]
uruguay.columns = ['1','2','City','4','5','6','City1','8','9','10']
uruguay = uruguay[['City']]
uruguay['Country']='Uruguay'
uruguay['Latitude']=''
uruguay['Longitude']=''

In [None]:
# Get a list of cities of Equatorial Guinea and prepare a dataframe
tables = pd.read_html("https://es.wikipedia.org/wiki/Guinea_Ecuatorial")
equatorial_guinea = tables[6][:-2][1:]
equatorial_guinea.columns = ['1','2','City','4','5','6','7','City1','9','10','11']
equatorial_guinea = equatorial_guinea[['City']]
equatorial_guinea['Country']='Equatorial Guinea'
equatorial_guinea['Latitude']=''
equatorial_guinea['Longitude']=''

In [None]:
# Get a list of cities of Belize and prepare a dataframe
tables = pd.read_html("https://en.wikipedia.org/wiki/Belize")
belize = tables[5][:-1]
belize.columns = ['1','2','City','4','5','6','City1','8','9','10']
belize = belize[['City']]
belize['Country']='Belize'
belize['Latitude']=''
belize['Longitude']=''

In [None]:
# Get a list of cities of Venezuela and prepare a dataframe
d = {'City': ['Caracas']}
venezuela = pd.DataFrame(data=d)
venezuela['Country']='Venezuela'
venezuela['Latitude']=''
venezuela['Longitude']=''

Consolidate all the dataframes into a single one.

In [None]:
df_spanish = mexico.copy(deep=True)
df_spanish = df_spanish.append(argentina)
df_spanish = df_spanish.append(colombia)
df_spanish = df_spanish.append(spain)
df_spanish = df_spanish.append(peru)
df_spanish = df_spanish.append(chile)
df_spanish = df_spanish.append(guatemala)
df_spanish = df_spanish.append(ecuador)
df_spanish = df_spanish.append(cuba)
df_spanish = df_spanish.append(bolivia)
df_spanish = df_spanish.append(dominican_republic)
df_spanish = df_spanish.append(honduras)
df_spanish = df_spanish.append(el_salvador)
df_spanish = df_spanish.append(paraguay)
df_spanish = df_spanish.append(nicaragua)
df_spanish = df_spanish.append(costa_rica)
df_spanish = df_spanish.append(puerto_rico)
df_spanish = df_spanish.append(panama)
df_spanish = df_spanish.append(uruguay)
df_spanish = df_spanish.append(equatorial_guinea)
df_spanish = df_spanish.append(belize)
df_spanish = df_spanish.append(venezuela)

df_spanish.reset_index(inplace=True,drop=True)
print(df_spanish.columns)
df_spanish.shape

Index(['City', 'Country', 'Latitude', 'Longitude'], dtype='object')


(211, 4)

Next, use the geolocator to look up the location (in terms of longitude and latitude) of each city.

In [None]:
geolocator = Nominatim(user_agent="ny_explorer")
for index, row in df_spanish.iterrows():
    address = '{}, {}'.format(row['City'], row['Country'])
    location = geolocator.geocode(address)
    try:
      df_spanish.at[index,'Latitude'] = location.latitude
      df_spanish.at[index,'Longitude'] = location.longitude
    except:
      df_spanish.at[index,'Latitude'] = np.nan
      df_spanish.at[index,'Longitude'] = np.nan
df_spanish.dropna(subset = ["Latitude"], inplace=True);
df_spanish.reset_index();

Check the master dataframe.

In [None]:
df_spanish

Unnamed: 0,City,Country,Latitude,Longitude
0,Mexico City,Mexico,19.4326,-99.1332
1,Ecatepec,Mexico,19.5868,-99.0341
2,Guadalajara,Mexico,20.672,-103.338
3,Puebla,Mexico,18.8333,-98.0
4,Juárez,Mexico,19.2935,-100.465
5,Tijuana,Mexico,32.501,-116.965
6,León,Mexico,21.1219,-101.683
7,Monterrey,Mexico,25.6802,-100.315
8,Zapopan,Mexico,20.7032,-103.426
9,Nezahualcóyotl,Mexico,19.4021,-99.017


Map out the cities to ensure location data looks correct.

In [None]:
# create map of the Northwest US using latitude and longitude values
world_map = folium.Map(location=[0, 0], zoom_start=2.5, tiles='OpenStreetMap')

# add markers to map
for lat, lng, name in zip(df_spanish['Latitude'], df_spanish['Longitude'], df_spanish['City']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#f7ad52',
        fill=True,
        fill_color='#fcc786',
        fill_opacity=0.7,
        parse_html=False).add_to(world_map)  
    
world_map

Now we're ready to query the Foursquare API for the top venues of each city.

In [None]:
from datetime import datetime
# To run the remainder of this notebook yourself, obtain Foursquare developer credentials and add them here
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = datetime.today().strftime('%Y%m%d') # Foursquare API version
# VERSION = '20180605' # Foursquare API version

Let's explore the first city in the dataframe to make sure everything is working correctly.

In [None]:
city_latitude = df_spanish.loc[0, 'Latitude'] # City latitude value
city_longitude = df_spanish.loc[0, 'Longitude'] # City longitude value

city_name = df_spanish.loc[0, 'City'] # Name
city_state = df_spanish.loc[0, 'Country'] # State

print('Latitude and longitude values of {}, {} are: {}, {}.'.format(city_name,
                                                               city_state,
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Mexico City, Mexico are: 19.4326296, -99.1331785.


In [None]:
# Get the top 100 venues within the default city radius
LIMIT = 100 # limit of number of venues returned by Foursquare API

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    city_latitude, 
    city_longitude, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ef2ccd5564b600a12855ff3'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b058700f964a520827a22e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/religious_church_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d132941735',
         'name': 'Church',
         'pluralName': 'Churches',
         'primary': True,
         'shortName': 'Church'}],
       'id': '4b058700f964a520827a22e3',
       'location': {'address': 'Plaza de la Constitución S/N',
        'cc': 'MX',
        'city': 'Cuauhtemoc',
        'country': 'México',
        'distance': 99,
        'formattedAddress': ['Plaza de la Constitución S/N',
         '06000 Cuauhtémoc, Distrito Federal',
         'México'],
        'lat': 19.433526472529614,
  

Looks good so far. Let's define a function to extract the category of a given venue.

In [None]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Next we'll structure the returned venue data into a dataframe and filter based on category.

In [None]:
# Clean the data and structure it as a dataframe
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  after removing the cwd from sys.path.


Unnamed: 0,name,categories,lat,lng
0,Catedral Metropolitana de la Asunción de María,Church,19.433526,-99.133204
1,Gran Hotel Ciudad de México,Hotel,19.432137,-99.134468
2,Murales de Diego Rivera en la Secretaría de Ed...,Art Museum,19.432621,-99.131642
3,Plaza de la Constitución (Zócalo),Plaza,19.432745,-99.133658
4,Mercaderes,Restaurant,19.433979,-99.134825


Looks good. Now le'ts set up a function to do this across all Spanish Speaking Country.

In [None]:
# Create a function to repeat the same process to all cities
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

And run it through the full list of spanish speakin cities.

In [None]:
# Run the function on each city and store in new dataframe
# CAUTION: You only get 950 Foursquare API calls per day with a "Sandbox Tier" (free) account
spanish_venues = getNearbyVenues(names=df_spanish['City'],
                                   latitudes=df_spanish['Latitude'],
                                   longitudes=df_spanish['Longitude'])

Mexico City
Ecatepec
Guadalajara
Puebla
Juárez
Tijuana
León
Monterrey
Zapopan
Nezahualcóyotl
Buenos Aires
Córdoba
Rosario
Mendoza
San Miguel de Tucumán
La Plata
Mar del Plata
Salta
Santa Fe
San Juan
Bogotá
Medellín
Cali
Barranquilla
Cartagena
Cúcuta
Soacha
Soledad
Bucaramanga
Bello
Madrid
Barcelona
Valencia
Seville
Zaragoza
Málaga
Murcia
Palma
Las Palmas
Bilbao
Lima
Arequipa
Trujillo
Chiclayo
Huancayo
Iquitos
Piura
Cusco
Chimbote
Tacna
Santiago Metropolis
Valparaíso
Concepción
La Serena
Antofagasta
Temuco
Rancagua
Talca
Arica
Chillán
Guatemala City
Mixco
Villa Nueva
Cobán
Quetzaltenango
Jalapa
Escuintla
San Juan Sacatepéquez
Jutiapa
Petapa
Quito
Guayaquil
Cuenca
Santo Domingo
Ambato
Portoviejo
Durán
Machala
Loja
Manta
Havana
Santiago de Cuba
Camagüey
Holguín
Santa Clara
Guantánamo
Victoria de Las Tunas
Bayamo
Cienfuegos
Pinar del Río
Santa Cruz de la Sierra
El Alto
La Paz
Cochabamba
Oruro
Sucre
Tarija
Potosí
Sacaba
Quillacollo
Santo Domingo
Santiago
La Vega
San Cristóbal
San Pedro de M

In [None]:
# Cache the results in case we need to reload the dataframe (for debugging purposes)
spanish_venues.to_csv('SpanishVenues.csv', sep=',',index=False)

In [None]:
spanish_venues = pd.read_csv('SpanishVenues.csv')

Sanity check on our dataframe of city venues to ensure everything looks in order.

In [None]:
# Check the size of dataframe
print(spanish_venues.shape)
spanish_venues.head()

(6624, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mexico City,19.43263,-99.133178,Catedral Metropolitana de la Asunción de María,19.433526,-99.133204,Church
1,Mexico City,19.43263,-99.133178,Gran Hotel Ciudad de México,19.432137,-99.134468,Hotel
2,Mexico City,19.43263,-99.133178,Murales de Diego Rivera en la Secretaría de Ed...,19.432621,-99.131642,Art Museum
3,Mexico City,19.43263,-99.133178,Plaza de la Constitución (Zócalo),19.432745,-99.133658,Plaza
4,Mexico City,19.43263,-99.133178,Centro Histórico,19.430583,-99.13449,Plaza


Now we can count the number of venues for each city. Some cities have very few venue entries on Foursquare. I tested different limits and found that a city requires at least about 50 venue entries in order to have an adequate venue "profile" for meaningful clustering results with other cities. Given that, we'll drop cities with fewer than 50 venues for the remainder of this study. 

In [None]:
# Tally up the total venues per city
grouped = spanish_venues.groupby('City').count()
print('Original count of cities: {}'.format(len(grouped.index)))

# Drop cities with inadequate amount of venue data (they skew the clustering results)
grouped = grouped[grouped.Venue < 50]
list(grouped.index.values)
spanish_venues = spanish_venues[~spanish_venues['City'].isin(list(grouped.index.values))]
grouped2 = spanish_venues.groupby('City').count()

print('Count of cities with more than 50 venues: {}'.format(len(grouped2.index)))

Original count of cities: 158
Count of cities with more than 50 venues: 62


How many unique categories among the returned venues?

In [None]:
print('There are {} unique categories.'.format(len(spanish_venues['Venue Category'].unique())))

There are 326 unique categories.


## Methodology <a name="methodology"></a>

Now that we've gathered and prepped all the data we need, we're ready to analyze it. We'll use a popular unsupervised machine learning algorithm called [k-means clustering](https://en.wikipedia.org/wiki/K-means_clustering) that enables us to partition observations into a specified number of clusters in order to discover underlying patterns. For our data, we'll find the the top 5 venue categories for each city (based on occurances in the dataset), and use that as each city's vector profile for finding similarities with other cities.

First we need to calculate the average frequency for each venue category across each city. We can quickly do this with a Pandas dataframe by converting each venue category into a boolean (yes/no) column using [One-hot](https://en.wikipedia.org/wiki/One-hot) encoding.

In [None]:
# one hot encoding
spanish_venues_onehot = pd.get_dummies(spanish_venues[['Venue Category']], prefix="", prefix_sep="")

# Add city column back to dataframe
spanish_venues_onehot['City'] = spanish_venues['City'] 

# Move city column to the first column
fixed_columns = [spanish_venues_onehot.columns[-1]] + list(spanish_venues_onehot.columns[:-1])
spanish_venues_onehot = spanish_venues_onehot[fixed_columns]

# Check size of new dataframe
spanish_venues_onehot.shape

(5297, 327)

The dataframe shape looks correct, as the column count matches the number of unique venue categories we calculated earlier.
Next we'll group rows by city mean of frequency for each category.

In [None]:
spanish_grouped = spanish_venues_onehot.groupby('City').mean().reset_index()
spanish_grouped

Unnamed: 0,City,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auditorium,Austrian Restaurant,Auto Garage,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Trail,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Cable Car,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Capitol Building,Caribbean Restaurant,Casino,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Dairy Store,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Dive Bar,Doctor's Office,Donut Shop,Electronics Store,Empada House,Empanada Restaurant,Event Space,Fabric Shop,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Fishing Store,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Internet Cafe,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Leather Goods Store,Library,Lighthouse,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monastery,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,National Park,Nature Preserve,Neighborhood,Nightclub,Nightlife Spot,Non-Profit,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoor Sculpture,Outdoors & Recreation,Outlet Store,Paella Restaurant,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Port,Pub,Public Art,Ramen Restaurant,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Road,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Stationery Store,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Tour Provider,Toy / Game Store,Track Stadium,Trail,Train Station,Tram Station,Tree,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Antofagasta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.034091,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034091,0.0,0.0,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.056818,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034091,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.011364,0.0,0.0,0.0,0.045455,0.0,0.0,0.034091,0.011364,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102273,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.034091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arequipa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031915,0.0,0.0,0.0,0.0,0.0,0.053191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.031915,0.010638,0.0,0.0,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031915,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.031915,0.0,0.234043,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Arica,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.016129,0.016129,0.064516,0.016129,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.032258,0.0,0.016129,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.032258,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0
3,Asunción,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.08,0.02,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barcelona,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.08,0.01,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.02,0.0
5,Bilbao,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0
6,Bogotá,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.03,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
7,Bucaramanga,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.025641,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.025641,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.089744,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064103,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.012821,0.012821,0.0,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Buenos Aires,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.08,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Cali,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0


In [None]:
# New dataframe size
spanish_grouped.shape

(62, 327)

Now we can find the five most common venues for each city.

In [None]:
num_top_venues = 5

for city in spanish_grouped['City']:
    print("----"+city+"----")
    temp = spanish_grouped[spanish_grouped['City'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Antofagasta----
                venue  freq
0          Restaurant  0.10
1  Chinese Restaurant  0.06
2         Pizza Place  0.05
3      Sandwich Place  0.05
4                 Bar  0.03


----Arequipa----
                       venue  freq
0                      Hotel  0.23
1        Peruvian Restaurant  0.06
2            Bed & Breakfast  0.05
3  South American Restaurant  0.04
4         Italian Restaurant  0.04


----Arica----
                           venue  freq
0                            Pub  0.08
1                          Hotel  0.06
2                     Restaurant  0.05
3  Vegetarian / Vegan Restaurant  0.03
4                 History Museum  0.03


----Asunción----
            venue  freq
0           Hotel  0.08
1             Bar  0.07
2          Bakery  0.05
3             Gym  0.04
4  Ice Cream Shop  0.04


----Barcelona----
                venue  freq
0               Hotel  0.08
1    Tapas Restaurant  0.07
2      Ice Cream Shop  0.06
3               Plaza  0.06
4  Italian

The raw data looks good. Now let's sort and structure it for further processing.

In [None]:
# Function to sort venues in decscending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
# Create a dataframe with top 5 venues for each city
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
cities_venues_sorted = pd.DataFrame(columns=columns)
cities_venues_sorted['City'] = spanish_grouped['City']

for ind in np.arange(spanish_grouped.shape[0]):
    cities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(spanish_grouped.iloc[ind, :], num_top_venues)

cities_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Antofagasta,Restaurant,Chinese Restaurant,Sandwich Place,Pizza Place,Café
1,Arequipa,Hotel,Peruvian Restaurant,Bed & Breakfast,Pizza Place,Sandwich Place
2,Arica,Pub,Hotel,Restaurant,History Museum,Vegetarian / Vegan Restaurant
3,Asunción,Hotel,Bar,Bakery,Historic Site,Gym
4,Barcelona,Hotel,Tapas Restaurant,Plaza,Ice Cream Shop,Italian Restaurant


Now we're ready to apply the K-means clustering algorithm. After trying out different `k` values (where `k`= *number of clusters*), I found the clusters to be most meaningful and interesting with around `k=10`. The output of the K-means algorithm is an array of cluster assignments for each row in our dataframe.

In [None]:
# Run K-means to break up into clusters
kclusters = 10

spanish_grouped_clustering = spanish_grouped.drop('City', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(spanish_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:400]

array([9, 0, 2, 2, 1, 1, 5, 2, 3, 2, 6, 2, 9, 9, 3, 9, 2, 0, 9, 3, 8, 7,
       2, 4, 2, 5, 9, 2, 3, 1, 8, 2, 7, 3, 5, 9, 7, 8, 8, 2, 1, 1, 8, 1,
       6, 9, 2, 9, 1, 3, 2, 7, 2, 7, 6, 0, 9, 9, 2, 6, 2, 1], dtype=int32)

In [None]:
cities_venues_sorted.columns

Index(['City', '1st Most Common Venue', '2nd Most Common Venue',
       '3rd Most Common Venue', '4th Most Common Venue',
       '5th Most Common Venue'],
      dtype='object')

Now we can stich the cluster labels back into our dataframe and also combine city location data. With all this info combined we'll be ready to visualize the results.

In [None]:
# Create dataframe that includes the cluster and top 5 venues

# Add clustering labels
cities_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

spanish_merged = df_spanish

# Merge northwest_grouped with northwest_data to add latitude/longitude for each city
spanish_merged = spanish_merged.join(cities_venues_sorted.set_index('City'), on='City')

# Drop cities with no venue data
spanish_merged = spanish_merged.dropna()

spanish_merged

Unnamed: 0,City,Country,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Mexico City,Mexico,19.4326,-99.1332,8.0,Art Museum,Mexican Restaurant,Museum,Ice Cream Shop,Hotel
2,Guadalajara,Mexico,20.672,-103.338,8.0,Mexican Restaurant,Ice Cream Shop,Plaza,Hotel,Café
6,León,Mexico,21.1219,-101.683,8.0,Mexican Restaurant,Hotel,Bar,Latin American Restaurant,Farmers Market
7,Monterrey,Mexico,25.6802,-100.315,8.0,Mexican Restaurant,Taco Place,Seafood Restaurant,Hotel,Candy Store
9,Nezahualcóyotl,Mexico,19.4021,-99.017,8.0,Mexican Restaurant,Taco Place,Bar,Pizza Place,Coffee Shop
10,Buenos Aires,Argentina,-34.6076,-58.4371,3.0,Argentinian Restaurant,Bakery,Café,Ice Cream Shop,Pizza Place
11,Córdoba,Argentina,-31.4173,-64.1833,9.0,Hotel,Coffee Shop,Café,Sandwich Place,Restaurant
15,La Plata,Argentina,-34.9207,-57.9538,9.0,Ice Cream Shop,Brewery,Café,Coffee Shop,Pizza Place
16,Mar del Plata,Argentina,-37.9977,-57.5483,9.0,Café,Restaurant,Pizza Place,Ice Cream Shop,Plaza
19,San Juan,Argentina,-30.7054,-69.1988,1.0,Bar,Caribbean Restaurant,Plaza,Historic Site,Coffee Shop


## Results and discussion <a name="results"></a>

Now we're ready to map out the data to get a feel for the results. We'll use the Python [Folium](https://python-visualization.github.io/folium/) library to render our clusters, using a distinct color for each.

In [None]:
# Create map
map_clusters = folium.Map(location=[0, 0], zoom_start=2.5)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(spanish_merged['Latitude'], spanish_merged['Longitude'], spanish_merged['City'], spanish_merged['Cluster Labels']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Intuitively the results seem promising, in terms of holding some patterns about our dataset. The clusters seem
generally dispersed geographically and balanced in terms of member count. Let's examine the individual clusters to
try and discern how/why they broke out  the way they did.

#### Caracas' Cluster 6: Plaza City
This cluster has "Plaza" as their top 5 venues, so a concusion could be that these cities have various venues to go out.

In [None]:
spanish_merged.loc[spanish_merged['Cluster Labels'] == 6, spanish_merged.columns[[0] + [1] + list(range(5, spanish_merged.shape[1]))]]

Unnamed: 0,City,Country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
32,Valencia,Spain,Plaza,Spanish Restaurant,Italian Restaurant,Restaurant,Tapas Restaurant
33,Seville,Spain,Tapas Restaurant,Plaza,Hotel,Ice Cream Shop,Spanish Restaurant
70,Quito,Ecuador,Restaurant,Hotel,Plaza,History Museum,Church
210,Caracas,Venezuela,Plaza,Bakery,Pharmacy,Theater,Historic Site


## Conclusion <a name="conclusion"></a>

Starting from a list of 210 total cities across all spanish speaking countries, we found 158 cities with Foursquare venue data. A Foursquare query of venues in those cities yielded 5935 venues, however it was necessary to filter out cities with fewer than 50 venues, as their data profile later proved insufficient for meaningful clustering.
After filtering out those cities, only 62 cities remained—less than 30% of our original group of cities. The 62 cities used in the final analysis represented 4272 venues and 326 unique venue types. We used the k-means clustering algorithm to group them into ten distinct clusters. The cluster of Caracas city, capital of Venezuela, has other 3 cities: Valencia and Seville from Spain and Quito from Ecuador. So, base on these study the best choice for a venezuelan immigrant could be these cities above.