<h1> 1. Introduction/Business Problem </h1>


<p>
    
<h3>1.1 Entrepreneurial Ecosystem:</h3>    
An entrepreneurial ecosystem consists of stakeholders such as non profits, government, academia, private practitioners, financial institutions, among other support organizations. These stakeholders provide support in different areas such as education, access to capital, prototyping and even business plan. In order to maximize the resources of these support organizations they should know the business stage, size, and industry of the client they are helping. Specially, they should consider the location of the business since that data provides an additional layer of context. 


<h3>1.2 Location:</h3>
In Puerto Rico, a territory of 3.1 million, there are more than 250 entrepreneurial support organizations. Sometimes, however, these organizations do not have the data in an accessible and digestible form that allows them to make data-driven decisions to improve their services.

<h3>1.3 Use Cases:</h3>
In this project we will be exploring geographic data with the idea of providing input to organizations that could help them to:
<ul>
    <li>provide a better service in a personalized way</li>
    <li>be able to design projects and develop grants based on the data</li>
    <li>hire specialized personnel to serve the geographic area in which they are located</li>
    <li>and develop public policy projects that help businesses in their area.</li>
</ul>
    
</p>



<h1> 2. Data </h1>

<h3>2.1 Data Sources: </h3>
<p> For this project we will be using Foursquare's Places API. The membership tier is the Sandbox. It includes: 
<ul>
    <li>950 Regular Calls/Day</li>
    <li>50 Premium Calls/Day</li>
    <li>1 Photo per Venue</li>
    <li>1 Tip per Venue.</li>
</ul>

With this API it is possible to search, explore, get trends and recommendations regarding venues. In the same way it is possible to get data about users such as details, check-ins, venue history, among other information. 

The zip codes were accessed through https://www.zipdatamaps.com/list-of-zip-codes-in-puerto-rico.php. 

<h3>2.2 Feature Selection:</h3>

In this project the data will be limited to venues in Puerto Rico. The data that will be explored is categories. With this information entrepreneurial support organizations will be able to identify business clusters in different locations across Puerto Rico and areas of opportunity for new business owners or those that are looking to expand established businesses.

<h3>2.3 Data merged:</h3>

The data from ZipDataMaps will be merged with the Foursquare's Place API in order to access information of different venues in Puerto Rico.

</p>


<h1> 3. Methodology </h1>

In [1]:
!pip install beautifulsoup4



In [2]:
import pandas as pd

contents = pd.read_csv("PRZipCodes.csv", dtype={'tablescraper-selected-row': object})

zips_df = pd.DataFrame(contents)


zips_df = zips_df.drop(columns =['tablescraper-selected-row href','tablescraper-selected-row 2','tablescraper-selected-row href 2', 'tablescraper-selected-row href 3'])

zips_df.rename(columns = {'tablescraper-selected-row':'ZIP','tablescraper-selected-row 3':'Borough','tablescraper-selected-row 4':'County'}, inplace = True)
zips_df['ZIP'] = zips_df['ZIP'].astype(str)

zips_df.head()


Unnamed: 0,ZIP,Borough,County
0,601,Adjuntas,Adjuntas
1,602,Aguada,Aguada
2,603,Aguadilla,Aguadilla
3,604,Aguadilla,Aguadilla
4,605,Aguadilla,Aguadilla


In [3]:
LatLong_df = pd.read_csv("codes_latlong.csv", dtype={'tablescraper-selected-row': object})

LatLong_df['ZIP'] = LatLong_df['ZIP'].apply(lambda x: '{0:0>5}'.format(x))
LatLong_df['ZIP'] = LatLong_df['ZIP'].astype(str)

LatLong_df.head()

Unnamed: 0,ZIP,LAT,LNG
0,601,18.180555,-66.749961
1,602,18.361945,-67.175597
2,603,18.455183,-67.119887
3,606,18.158345,-66.932911
4,610,18.295366,-67.125135


In [4]:
zips_df.drop_duplicates()

LatLong_df.drop_duplicates()

Unnamed: 0,ZIP,LAT,LNG
0,00601,18.180555,-66.749961
1,00602,18.361945,-67.175597
2,00603,18.455183,-67.119887
3,00606,18.158345,-66.932911
4,00610,18.295366,-67.125135
...,...,...,...
33139,99923,56.002315,-130.041026
33140,99925,55.550204,-132.945933
33141,99926,55.138352,-131.470424
33142,99927,56.239062,-133.457924


In [5]:
join_df = pd.merge(zips_df, LatLong_df, on='ZIP', how='inner')

join_df.head()

Unnamed: 0,ZIP,Borough,County,LAT,LNG
0,601,Adjuntas,Adjuntas,18.180555,-66.749961
1,602,Aguada,Aguada,18.361945,-67.175597
2,603,Aguadilla,Aguadilla,18.455183,-67.119887
3,606,Maricao,Maricao,18.158345,-66.932911
4,610,Anasco,Anasco,18.295366,-67.125135


In [6]:
!pip install folium



In [7]:
import folium 
# create map of Toronto using latitude and longitude values
map_pr = folium.Map(location=[18.303910000000002, -66.32617900000001], zoom_start=9)

# add markers to map
for lat, lng, borough, county in zip(join_df['LAT'], join_df['LNG'], join_df['Borough'], join_df['County']):
    label = '{}, {}'.format(borough, county)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_pr)  
    
map_pr

In [8]:
CLIENT_ID = 'ONI3JSZV10EKQVW4DX4H3B3H454FPB1W4ZYVDVN5HYJKJXPI' # your Foursquare ID
CLIENT_SECRET = 'PM3SHQOEVNHZXI02J0FW1MCPGFRIVNIWRFTGNRGZROEFCZ1Q' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

radius = 5000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    18.303910000000002, 
    -66.32617900000001,
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=ONI3JSZV10EKQVW4DX4H3B3H454FPB1W4ZYVDVN5HYJKJXPI&client_secret=PM3SHQOEVNHZXI02J0FW1MCPGFRIVNIWRFTGNRGZROEFCZ1Q&v=20180605&ll=18.303910000000002,-66.32617900000001&radius=5000&limit=100'

In [9]:
import requests


results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '610b2aa154f76f571f962930'},
 'response': {'headerLocation': 'Padilla',
  'headerFullLocation': 'Padilla',
  'headerLocationGranularity': 'city',
  'totalResults': 10,
  'suggestedBounds': {'ne': {'lat': 18.348910045000046,
    'lng': -66.2788692956976},
   'sw': {'lat': 18.258909954999957, 'lng': -66.37348870430242}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c10e6eda6c19521746636dd',
       'name': 'El Yagrumo',
       'location': {'address': 'PR-159',
        'lat': 18.326534114706522,
        'lng': -66.34989312233385,
        'labeledLatLngs': [{'label': 'display',
          'lat': 18.326534114706522,
          'lng': -66.34989312233385}],
        'distance': 3552,
        'cc': 'PR',
        'city': 'Corozal',
        'sta

In [10]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
from pandas.io.json import json_normalize

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  """


Unnamed: 0,name,categories,lat,lng
0,El Yagrumo,Caribbean Restaurant,18.326534,-66.349893
1,El Gran Cafe Restaurant,Breakfast Spot,18.340957,-66.316987
2,El Limo Viejo,Bar,18.338022,-66.322291
3,Balalaika Restaurant,Burger Joint,18.341657,-66.305472
4,El Rancho en Corozal,Pool,18.319297,-66.292644


In [12]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

10 venues were returned by Foursquare.


In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
PR_venues = getNearbyVenues(names=join_df['County'],
                                   latitudes=join_df['LAT'],
                                   longitudes=join_df['LNG']
                                  )

Adjuntas
Aguada
Aguadilla
Maricao
Anasco
Arecibo
Arecibo
Barceloneta
Cabo Rojo
Cabo Rojo
Penuelas
Camuy
Lares
Sabana Grande
Ciales
Utuado
Dorado
Guanica
Florida
Arecibo
Guanica
Guayanilla
Hatillo
Hormigueros
Isabela
Jayuya
Lajas
Lares
Las Marias
Manati
Moca
Rincon
Quebradillas
Mayaguez
Mayaguez
San German
San Sebastian
Morovis
Arecibo
Aguadilla
Vega Alta
Vega Baja
Vega Baja
Yauco
Aguas Buenas
Salinas
Aibonito
Maunabo
Arroyo
Ponce
Ponce
Ponce
Naguabo
Naranjito
Orocovis
Patillas
Caguas
Caguas
Ponce
Canovanas
Ponce
Ponce
Ceiba
Cayey
Fajardo
Cidra
Fajardo
Humacao
Rio Grande
Salinas
San Lorenzo
Santa Isabel
Vieques
Villalba
Yabucoa
Coamo
Las Piedras
Loiza
Luquillo
Culebra
Juncos
Gurabo
Ponce
Comerio
Corozal
Guayama
Aibonito
Humacao
Barranquitas
Juana Diaz
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
San Juan
Guaynabo
San Juan
Toa Baja
Dorado
Toa Baja
Toa Baja
Toa Alta
Bayamon
Bayamon
Bayamon
B

In [15]:
print(PR_venues.shape)
PR_venues.head()

(554, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adjuntas,18.180555,-66.749961,Aeropuerto Ruyán,18.179777,-66.75437,Airport
1,Aguada,18.361945,-67.175597,El Sarten Criollo,18.360852,-67.172921,Caribbean Restaurant
2,Aguada,18.361945,-67.175597,Club 4x4,18.364132,-67.176337,Bar
3,Aguada,18.361945,-67.175597,Panaderia Reposteria Lorenzo,18.364824,-67.176945,Bakery
4,Aguada,18.361945,-67.175597,El Tal-Ivan,18.360521,-67.172333,Bar


In [16]:
PR_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adjuntas,1,1,1,1,1,1
Aguada,5,5,5,5,5,5
Aguadilla,7,7,7,7,7,7
Aibonito,5,5,5,5,5,5
Barceloneta,3,3,3,3,3,3
Barranquitas,1,1,1,1,1,1
Bayamon,48,48,48,48,48,48
Cabo Rojo,6,6,6,6,6,6
Caguas,18,18,18,18,18,18
Carolina,34,34,34,34,34,34


In [17]:
print('There are {} uniques categories.'.format(len(PR_venues['Venue Category'].unique())))

There are 156 uniques categories.


In [18]:
# one hot encoding
PR_onehot = pd.get_dummies(PR_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
PR_onehot['Neighborhood'] = PR_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [PR_onehot.columns[-1]] + list(PR_onehot.columns[:-1])
PR_onehot = PR_onehot[fixed_columns]

PR_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Lounge,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Volleyball Court,Warehouse Store,Whisky Bar,Wine Shop,Winery,Wings Joint,Women's Store
0,Adjuntas,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Aguada,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Aguada,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Aguada,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Aguada,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
PR_grouped = PR_onehot.groupby('Neighborhood').mean().reset_index()
PR_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Lounge,Airport Terminal,American Restaurant,Argentinian Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Volleyball Court,Warehouse Store,Whisky Bar,Wine Shop,Winery,Wings Joint,Women's Store
0,Adjuntas,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aguada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aguadilla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aibonito,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barceloneta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Barranquitas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bayamon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.020833,...,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.020833
7,Cabo Rojo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Caguas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Carolina,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [20]:
num_top_venues = 5

for hood in PR_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = PR_grouped[PR_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adjuntas----
                  venue  freq
0               Airport   1.0
1             Nightclub   0.0
2          Noodle House   0.0
3  Other Great Outdoors   0.0
4     Paella Restaurant   0.0


----Aguada----
                  venue  freq
0                   Bar   0.4
1                 Hotel   0.2
2  Caribbean Restaurant   0.2
3                Bakery   0.2
4     Accessories Store   0.0


----Aguadilla----
                venue  freq
0      Sandwich Place  0.29
1              Bakery  0.14
2  Italian Restaurant  0.14
3        Cocktail Bar  0.14
4    Asian Restaurant  0.14


----Aibonito----
                  venue  freq
0        Scenic Lookout   0.2
1                 Diner   0.2
2  Caribbean Restaurant   0.2
3      Basketball Court   0.2
4                   Bar   0.2


----Barceloneta----
               venue  freq
0      Auto Workshop  0.33
1               Food  0.33
2      Grocery Store  0.33
3           Pharmacy  0.00
4  Paella Restaurant  0.00


----Barranquitas----
            

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = PR_grouped['Neighborhood']

for ind in np.arange(PR_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(PR_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adjuntas,Airport,Women's Store,Dive Bar,Food,Flower Shop,Fast Food Restaurant,Farmers Market,Fabric Shop,Electronics Store,Donut Shop
1,Aguada,Bar,Hotel,Caribbean Restaurant,Bakery,Women's Store,Donut Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Fabric Shop
2,Aguadilla,Sandwich Place,Italian Restaurant,Asian Restaurant,Burger Joint,Bakery,Cocktail Bar,Argentinian Restaurant,Art Museum,Food,Flower Shop
3,Aibonito,Basketball Court,Caribbean Restaurant,Scenic Lookout,Bar,Diner,Women's Store,Dive Bar,Fast Food Restaurant,Farmers Market,Fabric Shop
4,Barceloneta,Food,Auto Workshop,Grocery Store,Flower Shop,Fast Food Restaurant,Farmers Market,Fabric Shop,Electronics Store,Donut Shop,Dive Bar


In [23]:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 10

PR_grouped_clustering = PR_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(PR_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([5, 0, 5, 5, 5, 6, 5, 5, 5, 5], dtype=int32)

In [24]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels 2', kmeans.labels_)

PR_merged = join_df
neighborhoods_venues_sorted= neighborhoods_venues_sorted.rename(columns = {"Neighborhood":"County"})

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
PR_merged = PR_merged.join(neighborhoods_venues_sorted.set_index('County'), on='County')

PR_merged.head() # check the last columns!



Unnamed: 0,ZIP,Borough,County,LAT,LNG,Cluster Labels 2,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,601,Adjuntas,Adjuntas,18.180555,-66.749961,5.0,Airport,Women's Store,Dive Bar,Food,Flower Shop,Fast Food Restaurant,Farmers Market,Fabric Shop,Electronics Store,Donut Shop
1,602,Aguada,Aguada,18.361945,-67.175597,0.0,Bar,Hotel,Caribbean Restaurant,Bakery,Women's Store,Donut Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Fabric Shop
2,603,Aguadilla,Aguadilla,18.455183,-67.119887,5.0,Sandwich Place,Italian Restaurant,Asian Restaurant,Burger Joint,Bakery,Cocktail Bar,Argentinian Restaurant,Art Museum,Food,Flower Shop
3,606,Maricao,Maricao,18.158345,-66.932911,,,,,,,,,,,
4,610,Anasco,Anasco,18.295366,-67.125135,,,,,,,,,,,


In [25]:
from matplotlib import cm
import matplotlib.colors as colors

PR_merged = PR_merged.fillna(0)
PR_merged['Cluster Labels 2'] = PR_merged['Cluster Labels 2'].astype(int)

# create map
map_clusters = folium.Map(location=[18.303910000000002, -66.32617900000001], zoom_start=9)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(PR_merged['LAT'], PR_merged['LNG'], PR_merged['County'], PR_merged['Cluster Labels 2']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h1> 4. Results </h1>

<p>There are 554 venues in the Foursquare's Places API under zip codes from Puerto Rico. These venues belong to 156 unique categories in 42 counties.
    
The following counties had the greatest number of venues:
<ol>
    <li>San Juan -276 </li>    
    <li>Guaynabo - 61 </li>
    <li>Bayamon - 48 </li>
    <li>Carolina - 34 </li>
    <li>Caguas - 18</li>
</ol>
    
These counties are located in the metro area of Puerto Rico.


The following counties had the lowest number of venues (just one):
<ol>
    <li>Adjuntas</li>
    <li>Barranquitas</li>
    <li>Catano</li>
    <li>Corozal</li>
    <li>Juncos</li>
    <li>Naguabo</li>
    <li>Quebradillas</li>
    <li>Rincon</li>
    <li>Rio Grande</li>
    <li>Salinas</li>
    <li>Toa Baja</li>
    <li>Utuado</li>
    <li>Vega Alta</li>
    <li>Yauco</li>
</ol>
    
Most of these counties are located out of the metro area of Puerto Rico.


</p>

<h1> 5. Discussion </h1>



<p>There are some clear patterns in the clusters:
    
<ul>
    <li>Cluster 1 is composed of San Lorenzo and Juncos. Making this cluster an area for sports such as Baseball Fields. </li>    
    <li>Cluster 8 is composed of Vega Alta. Making this cluster an area for Music Stores.</li>
    <li>Cluster 9 is composed of Rincon. Making this cluster an are for Latin American restaurants. </li>

</ul>
</p>

<h1> 6. Conclusion</h1>

<p> Although we can say that this strategy for data analysis is a correct and useful approach for decision making in a business ecosystem, the project has some limitations. The main limitation is the low representation of Puerto Rican businesses in the API used. This makes the determination of clusters not so effective. In order to make decisions based on these results, the effort must be accompanied with a strategy for more businesses to complete their profile on the Foursquare platform.

</p>