# Capstone Project (Week 2)
### Applied Data Science Capstone by IBM/Coursera

# Title: Opening of a new Bar in Lima Central 

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

#### Description & Discussion of the Background

Lima Metropolitan is one of the largest metropolises in the world where over 10 million people live and it has a population density of 11.000 people per square kilometer. The Metropolis is divided into 49 districts and they are all grouped in 4 mean regions: North Lima, East Lima, South Lima, Callao, Lima Central. At the same time, Lima Central contains 15 districts and is well known to be the tourist spot of Lima Metropolitan. As a resident of Lima Central and being aware of its popularity, I decided to use this region in my project.  However, the fact that the districts inside Lima Central are squeezed into an area of approximately 95 square kilometers and have a population density of 14 000 people per square kilometer overall, causes the city to have a very intertwined and mixed structure.

As you can see from the figures, Lima Central is a city with a high population density. Being such a crowded city leads the owners of shops, bars, restaurants and social sharing places in the city where the population is dense. When we think of it by the investor, we expect from them to prefer the districts where there is high traffic of people with a high income. If we think of the city residents, they may want to choose the regions where real rates values are higher, too. At the same time, they may want to choose the district according to the density of the social place and quality either social or environmental. However, it is difficult to obtain information that will guide investors in this direction, nowadays.

When we consider all these problems, we can create a map and information chart where the real estate index is placed on Lima Central and each district is clustered according to the venue density and popular type of business.

## Data <a name="data"></a>

To consider the problem we can list the data as below:

1. I found the zip/postal codes and location of each district of Lima Central from GeoPostcodes and Vectomap from GeoData Limited.
2. The CSV file was created by myself and it contains coordinates of all districts of Lima Central. 
3. I used Foursquare API to get the most common venues of given Borough of Lima Central.

## Methodology <a name="methodology"></a>

To determine which location an entrepreneur can open his bar so as to succeed, we need first to analyze which locations are the most likely people go to bars. We will do that by extracting data in a JSON format from Foursquare API that shows us the top 100 places that people go the most with high rates within a radio of 500 meters from each district location. Once we have our top list, we will cluster our district with similar characteristics by using unsupervised machine learning algorithm K-means and will search which is the best places to open our bar. During the process of this analysis, we will be able to extract and analyze some other important data so that we can understand other places where people tend to go most in each district. Furthermore, we will be able to determine which districts share the same interests.

In resume:
1. Read the data frame in our notebook.
2. Sear for the most 100 commons venues in a radio of 500 meters from each district location by using  Foursquare API.
3. Organize the data and show a table of the first 10 most popular venues of each district.
4. Use unsupervised machine learning algorithm K-means clustering to group our district into clusters that share common characteristics.
5. Analyze our result and give conclusions.

## Analysis <a name="analysis"></a>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [5]:
#Importing packages
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
import types

from botocore.client import Config
import ibm_boto3
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

### Load and Explore Dataset

The data set is in CSV format and contains 15 districts of Lima Central with their locations (latitude and longitude). The data has been downloaded from: https://www.geopostcodes.com/Lima_provincia 

In [12]:
#Reading CSV data file from the IBM cloud

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_8df9ce27c7874208bdd6e2fd9695d3af = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='fq92Ve7iMYFU-piVXatS3NctPXduvL3RWfcrpaiou1Ic',
    ibm_auth_endpoint="https://iam.eu-gb.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')

body = client_8df9ce27c7874208bdd6e2fd9695d3af.get_object(Bucket='project1course9-donotdelete-pr-mupbkjfmqoeqei',Key='LimaData1.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)
df.head()

#data downloaded from: https://www.geopostcodes.com/Lima_provincia

Unnamed: 0,Postcode,Department,District,Latitude,Longitude
0,lima33,Lima,Santiago de Surco,-12.145406,-77.004753
1,lima34,Lima,Surquillo,-12.117202,-77.020622
2,lima4,Lima,Barranco,-12.149707,-77.021276
3,lima5,Lima,Breña,-12.058713,-77.045959
4,lima11,Lima,Jesús María,-12.076734,-77.043904


In [13]:
df.shape

(15, 5)

#### Use geopy library to get the latitude and longitude values of Lima City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

Finding the coordinates of Lima city

In [14]:
address = 'Lima'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Lima City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Lima City are -12.0621065, -77.0365256.


#### Create a map of New York with neighborhoods superimposed on top.

In [15]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Department'], df['District']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Next, we are going to start utilizing the Foursquare API to explore the districts and segment them.

### Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = 'EPMNER5ZQHRKWHKNT20E5KRGIMR4EJ1LJBXI0WYIGKLBAO1U' # your Foursquare ID
CLIENT_SECRET = 'C3IGSFDHTTQGXGQBRAGS1BFFLULAENNXG2EV2KMH04QBNN1W' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EPMNER5ZQHRKWHKNT20E5KRGIMR4EJ1LJBXI0WYIGKLBAO1U
CLIENT_SECRET:C3IGSFDHTTQGXGQBRAGS1BFFLULAENNXG2EV2KMH04QBNN1W


#### Let's explore the first neighborhood in our dataframe.

In [17]:
neighbourhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = df.loc[0, 'District'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Santiago de Surco are -12.145406, -77.00475300000001.


In [18]:
df.loc[0, 'District']

'Santiago de Surco'

#### Now, let's get the top 100 venues that are in Santiago de Surco within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [19]:
# type your answer here

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url # display URL



'https://api.foursquare.com/v2/venues/explore?&client_id=EPMNER5ZQHRKWHKNT20E5KRGIMR4EJ1LJBXI0WYIGKLBAO1U&client_secret=C3IGSFDHTTQGXGQBRAGS1BFFLULAENNXG2EV2KMH04QBNN1W&v=20180605&ll=-12.145406,-77.00475300000001&radius=500&limit=100'

Send the GET request and examine the resutls

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d63c5d200bad70025b8cdc5'},
 'response': {'headerLocation': 'Santiago De Surco',
  'headerFullLocation': 'Santiago De Surco, Lima',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 14,
  'suggestedBounds': {'ne': {'lat': -12.140905995499995,
    'lng': -77.00015855928541},
   'sw': {'lat': -12.149906004500004, 'lng': -77.0093474407146}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '514f618fe4b066e101c45665',
       'name': 'El Loco del Ceviche',
       'location': {'lat': -12.146681021804822,
        'lng': -77.0073193656087,
        'labeledLatLngs': [{'label': 'display',
          'lat': -12.146681021804822,
          'lng': -77.0073193656087}],
        'distance': 313,
        'cc': 'PE',
        'country': 'Perú

#### Let's explore the first neighborhood in our dataframe.

From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [21]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [52]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,El Loco del Ceviche,Seafood Restaurant,-12.146681,-77.007319
1,La Granja de Surco,Restaurant,-12.143982,-77.004145
2,Restaurant La Plaza de Surco,Restaurant,-12.144969,-77.004694
3,El Señorio - Surco,South American Restaurant,-12.144084,-77.003758
4,Veterinaria Surco Vet,Pet Store,-12.143442,-77.002368


In [23]:
nearby_venues.shape

(14, 4)

### Exploring all the Districts in Lima Central

#### Let's create a function to repeat the same process to all the districts in Lima Central

In [53]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *lima_venues*.

In [54]:
# type your answer here

lima_venues = getNearbyVenues(names=df['District'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )



Santiago de Surco
Surquillo
Barranco
Breña
Jesús María
La Victoria
Lima
Lince
Magdalena del Mar
Miraflores
Pueblo Libre
Rímac
San Borga
San Isidro
San Miguel


#### Let's check the size of the resulting dataframe

In [55]:
print(lima_venues.shape)
lima_venues.head()

(679, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Santiago de Surco,-12.145406,-77.004753,El Loco del Ceviche,-12.146681,-77.007319,Seafood Restaurant
1,Santiago de Surco,-12.145406,-77.004753,La Granja de Surco,-12.143982,-77.004145,Restaurant
2,Santiago de Surco,-12.145406,-77.004753,Restaurant La Plaza de Surco,-12.144969,-77.004694,Restaurant
3,Santiago de Surco,-12.145406,-77.004753,El Señorio - Surco,-12.144084,-77.003758,South American Restaurant
4,Santiago de Surco,-12.145406,-77.004753,Veterinaria Surco Vet,-12.143442,-77.002368,Pet Store


Let's check how many venues were returned for each neighborhood

In [56]:
lima_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barranco,100,100,100,100,100,100
Breña,23,23,23,23,23,23
Jesús María,30,30,30,30,30,30
La Victoria,14,14,14,14,14,14
Lima,51,51,51,51,51,51
Lince,49,49,49,49,49,49
Magdalena del Mar,51,51,51,51,51,51
Miraflores,100,100,100,100,100,100
Pueblo Libre,68,68,68,68,68,68
Rímac,9,9,9,9,9,9


#### Let's find out how many unique categories can be curated from all the returned venues

In [57]:
print('There are {} uniques categories.'.format(len(lima_venues['Venue Category'].unique())))

There are 153 uniques categories.


### Analyze Each District

In [58]:
# one hot encoding
lima_onehot = pd.get_dummies(lima_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lima_onehot['District'] = lima_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [lima_onehot.columns[-1]] + list(lima_onehot.columns[:-1])
lima_onehot = lima_onehot[fixed_columns]

lima_onehot.head()

Unnamed: 0,District,Arcade,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Casino,Chinese Restaurant,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Latin American Restaurant,Lounge,Market,Martial Arts Dojo,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monastery,Motorcycle Shop,Museum,Music School,Music Venue,Nail Salon,Nightclub,Office,Organic Grocery,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plaza,Pub,Public Art,Restaurant,Rock Climbing Spot,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Santiago de Surco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Santiago de Surco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Santiago de Surco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Santiago de Surco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Santiago de Surco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [30]:
lima_onehot.shape

(679, 154)

#### Next, let's group rows by district and by taking the mean of the frequency of occurrence of each category

In [59]:
lima_grouped = lima_onehot.groupby('District').mean().reset_index()
lima_grouped

Unnamed: 0,District,Arcade,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Casino,Chinese Restaurant,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Latin American Restaurant,Lounge,Market,Martial Arts Dojo,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monastery,Motorcycle Shop,Museum,Music School,Music Venue,Nail Salon,Nightclub,Office,Organic Grocery,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plaza,Pub,Public Art,Restaurant,Rock Climbing Spot,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Trail,Vegetarian / Vegan Restaurant,Wine Bar,Wings Joint,Yoga Studio
0,Barranco,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.02,0.13,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.03,0.08,0.0,0.0,0.0,0.02,0.02,0.01,0.01,0.04,0.0,0.0,0.0,0.01,0.02,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0
1,Breña,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Jesús María,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.1,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0
3,La Victoria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.142857,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Lima,0.0,0.0,0.078431,0.0,0.0,0.0,0.0,0.058824,0.019608,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.058824,0.0,0.019608,0.0,0.058824,0.019608,0.098039,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0
5,Lince,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.081633,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.020408,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.040816,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.020408,0.040816,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.122449,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.040816,0.0,0.102041,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.0,0.0
6,Magdalena del Mar,0.0,0.0,0.0,0.0,0.019608,0.0,0.058824,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.039216,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.098039,0.0,0.098039,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Miraflores,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.08,0.0,0.01,0.02,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.01,0.05,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.07,0.0,0.0,0.03,0.01,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01
8,Pueblo Libre,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.044118,0.132353,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0,0.0,0.073529,0.0,0.0,0.014706,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.044118,0.014706,0.029412,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.044118,0.0,0.0,0.029412,0.0,0.029412,0.0,0.029412,0.029412,0.029412,0.0,0.058824,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Rímac,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [60]:
lima_grouped.shape

(15, 154)

#### Let's print each district along with the top 5 most common venues

In [61]:
num_top_venues = 5

for hood in lima_grouped['District']:
    print("----"+hood+"----")
    temp = lima_grouped[lima_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barranco----
                 venue  freq
0                  Bar  0.13
1  Peruvian Restaurant  0.08
2   Seafood Restaurant  0.04
3           Restaurant  0.04
4                 Café  0.04


----Breña----
                 venue  freq
0           Restaurant  0.13
1  Fried Chicken Joint  0.13
2   Seafood Restaurant  0.09
3   Chinese Restaurant  0.09
4    Convenience Store  0.04


----Jesús María----
                 venue  freq
0  Peruvian Restaurant  0.17
1               Bakery  0.10
2                  Gym  0.07
3            BBQ Joint  0.07
4  Fried Chicken Joint  0.07


----La Victoria----
                venue  freq
0  Seafood Restaurant  0.21
1           Nightclub  0.14
2       Shopping Mall  0.14
3  Miscellaneous Shop  0.07
4           Cafeteria  0.07


----Lima----
                 venue  freq
0   Seafood Restaurant  0.10
1          Art Gallery  0.08
2           Restaurant  0.06
3               Bakery  0.06
4  Peruvian Restaurant  0.06


----Lince----
                 venue  freq

#### Let's put that into a *pandas* dataframe


First, let's write a function to sort the venues in descending order.

In [62]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [63]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['District'] = lima_grouped['District']

for ind in np.arange(lima_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lima_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barranco,Bar,Peruvian Restaurant,Restaurant,Café,Seafood Restaurant,Cocktail Bar,Art Gallery,Performing Arts Venue,Ice Cream Shop,Japanese Restaurant
1,Breña,Fried Chicken Joint,Restaurant,Seafood Restaurant,Chinese Restaurant,South American Restaurant,Italian Restaurant,Dessert Shop,Breakfast Spot,Taco Place,High School
2,Jesús María,Peruvian Restaurant,Bakery,Gym,Park,BBQ Joint,Fried Chicken Joint,Chinese Restaurant,Convenience Store,Beer Garden,Burger Joint
3,La Victoria,Seafood Restaurant,Nightclub,Shopping Mall,Big Box Store,Peruvian Restaurant,Soccer Stadium,Latin American Restaurant,Miscellaneous Shop,Cafeteria,Furniture / Home Store
4,Lima,Seafood Restaurant,Art Gallery,Bakery,Sandwich Place,Restaurant,Peruvian Restaurant,Historic Site,Plaza,Indie Theater,Cocktail Bar


## Results and Discussion <a name="results"></a>

### Cluster Districts

Run *k*-means to cluster the neighborhood into 5 clusters.

In [67]:
# set number of clusters
kclusters = 5

lima_grouped_clustering = lima_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lima_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 2, 1, 0, 0, 0, 0, 0, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each District.

In [68]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

lima_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
lima_merged = lima_merged.join(neighborhoods_venues_sorted.set_index('District'), on='District')

lima_merged.head() # check the last columns!

Unnamed: 0,Postcode,Department,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,lima33,Lima,Santiago de Surco,-12.145406,-77.004753,4,Restaurant,Soccer Field,Plaza,Latin American Restaurant,Seafood Restaurant,Soccer Stadium,South American Restaurant,Supermarket,Pet Store,Cajun / Creole Restaurant
1,lima34,Lima,Surquillo,-12.117202,-77.020622,2,Seafood Restaurant,Peruvian Restaurant,Chinese Restaurant,Bakery,Burger Joint,Restaurant,Convenience Store,Gourmet Shop,BBQ Joint,Mobile Phone Shop
2,lima4,Lima,Barranco,-12.149707,-77.021276,0,Bar,Peruvian Restaurant,Restaurant,Café,Seafood Restaurant,Cocktail Bar,Art Gallery,Performing Arts Venue,Ice Cream Shop,Japanese Restaurant
3,lima5,Lima,Breña,-12.058713,-77.045959,2,Fried Chicken Joint,Restaurant,Seafood Restaurant,Chinese Restaurant,South American Restaurant,Italian Restaurant,Dessert Shop,Breakfast Spot,Taco Place,High School
4,lima11,Lima,Jesús María,-12.076734,-77.043904,2,Peruvian Restaurant,Bakery,Gym,Park,BBQ Joint,Fried Chicken Joint,Chinese Restaurant,Convenience Store,Beer Garden,Burger Joint


Finally, let's visualize the resulting clusters

In [69]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lima_merged['Latitude'], lima_merged['Longitude'], lima_merged['District'], lima_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

In [70]:
lima_merged.loc[lima_merged['Cluster Labels'] == 0, lima_merged.columns[[1] + [2] + list(range(5, lima_merged.shape[1]))]]

Unnamed: 0,Department,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Lima,Barranco,0,Bar,Peruvian Restaurant,Restaurant,Café,Seafood Restaurant,Cocktail Bar,Art Gallery,Performing Arts Venue,Ice Cream Shop,Japanese Restaurant
6,Lima,Lima,0,Seafood Restaurant,Art Gallery,Bakery,Sandwich Place,Restaurant,Peruvian Restaurant,Historic Site,Plaza,Indie Theater,Cocktail Bar
7,Lima,Lince,0,Peruvian Restaurant,Seafood Restaurant,Chinese Restaurant,Bar,Restaurant,Japanese Restaurant,Sandwich Place,Hobby Shop,Café,Park
8,Lima,Magdalena del Mar,0,Italian Restaurant,Seafood Restaurant,Sandwich Place,Fried Chicken Joint,BBQ Joint,Park,Café,Chinese Restaurant,Peruvian Restaurant,Bakery
9,Lima,Miraflores,0,Bar,Hotel,Coffee Shop,Peruvian Restaurant,Cocktail Bar,Café,Sandwich Place,Ice Cream Shop,Italian Restaurant,Deli / Bodega
10,Lima,Pueblo Libre,0,Bar,Chinese Restaurant,Restaurant,Park,Japanese Restaurant,Bakery,Coffee Shop,Juice Bar,Peruvian Restaurant,Pharmacy
12,Lima,San Borga,0,Fast Food Restaurant,Park,Seafood Restaurant,Pastry Shop,Chinese Restaurant,Coffee Shop,Sandwich Place,Snack Place,Candy Store,Gym
13,Lima,San Isidro,0,Hotel,Peruvian Restaurant,Café,Italian Restaurant,Seafood Restaurant,Coffee Shop,Latin American Restaurant,Sushi Restaurant,Restaurant,Art Gallery


In [71]:
lima_merged.loc[lima_merged['Cluster Labels'] == 1, lima_merged.columns[[1] + [2] + list(range(5, lima_merged.shape[1]))]]

Unnamed: 0,Department,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Lima,La Victoria,1,Seafood Restaurant,Nightclub,Shopping Mall,Big Box Store,Peruvian Restaurant,Soccer Stadium,Latin American Restaurant,Miscellaneous Shop,Cafeteria,Furniture / Home Store


In [72]:
lima_merged.loc[lima_merged['Cluster Labels'] == 2, lima_merged.columns[[1] + [2] + list(range(5, lima_merged.shape[1]))]]

Unnamed: 0,Department,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Lima,Surquillo,2,Seafood Restaurant,Peruvian Restaurant,Chinese Restaurant,Bakery,Burger Joint,Restaurant,Convenience Store,Gourmet Shop,BBQ Joint,Mobile Phone Shop
3,Lima,Breña,2,Fried Chicken Joint,Restaurant,Seafood Restaurant,Chinese Restaurant,South American Restaurant,Italian Restaurant,Dessert Shop,Breakfast Spot,Taco Place,High School
4,Lima,Jesús María,2,Peruvian Restaurant,Bakery,Gym,Park,BBQ Joint,Fried Chicken Joint,Chinese Restaurant,Convenience Store,Beer Garden,Burger Joint
14,Lima,San Miguel,2,Bakery,Park,Chinese Restaurant,Fried Chicken Joint,Performing Arts Venue,Peruvian Restaurant,Concert Hall,Diner,Sandwich Place,Skate Park


In [73]:
lima_merged.loc[lima_merged['Cluster Labels'] == 3, lima_merged.columns[[1] + [2] + list(range(5, lima_merged.shape[1]))]]

Unnamed: 0,Department,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Lima,Rímac,3,Historic Site,Bakery,Seafood Restaurant,Sandwich Place,Museum,Bar,Department Store,Cocktail Bar,Park,Cupcake Shop


### Potential places of opening a new bar in Lima Central

Looking at our results, we percieve that the districts where are well known as to have crowed places for Bar Business and another related are: 

In [80]:
bar = lima_merged.loc[lima_merged['Cluster Labels'] == 0, lima_merged.columns[[1] + [2] + list(range(5, lima_merged.shape[1]))]]
bar

Unnamed: 0,Department,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Lima,Barranco,0,Bar,Peruvian Restaurant,Restaurant,Café,Seafood Restaurant,Cocktail Bar,Art Gallery,Performing Arts Venue,Ice Cream Shop,Japanese Restaurant
6,Lima,Lima,0,Seafood Restaurant,Art Gallery,Bakery,Sandwich Place,Restaurant,Peruvian Restaurant,Historic Site,Plaza,Indie Theater,Cocktail Bar
7,Lima,Lince,0,Peruvian Restaurant,Seafood Restaurant,Chinese Restaurant,Bar,Restaurant,Japanese Restaurant,Sandwich Place,Hobby Shop,Café,Park
8,Lima,Magdalena del Mar,0,Italian Restaurant,Seafood Restaurant,Sandwich Place,Fried Chicken Joint,BBQ Joint,Park,Café,Chinese Restaurant,Peruvian Restaurant,Bakery
9,Lima,Miraflores,0,Bar,Hotel,Coffee Shop,Peruvian Restaurant,Cocktail Bar,Café,Sandwich Place,Ice Cream Shop,Italian Restaurant,Deli / Bodega
10,Lima,Pueblo Libre,0,Bar,Chinese Restaurant,Restaurant,Park,Japanese Restaurant,Bakery,Coffee Shop,Juice Bar,Peruvian Restaurant,Pharmacy
12,Lima,San Borga,0,Fast Food Restaurant,Park,Seafood Restaurant,Pastry Shop,Chinese Restaurant,Coffee Shop,Sandwich Place,Snack Place,Candy Store,Gym
13,Lima,San Isidro,0,Hotel,Peruvian Restaurant,Café,Italian Restaurant,Seafood Restaurant,Coffee Shop,Latin American Restaurant,Sushi Restaurant,Restaurant,Art Gallery


The districts where the mean activity business is Bar are:

In [85]:
Bar = bar.loc[[2,9,10]]
Bar

Unnamed: 0,Department,District,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Lima,Barranco,0,Bar,Peruvian Restaurant,Restaurant,Café,Seafood Restaurant,Cocktail Bar,Art Gallery,Performing Arts Venue,Ice Cream Shop,Japanese Restaurant
9,Lima,Miraflores,0,Bar,Hotel,Coffee Shop,Peruvian Restaurant,Cocktail Bar,Café,Sandwich Place,Ice Cream Shop,Italian Restaurant,Deli / Bodega
10,Lima,Pueblo Libre,0,Bar,Chinese Restaurant,Restaurant,Park,Japanese Restaurant,Bakery,Coffee Shop,Juice Bar,Peruvian Restaurant,Pharmacy


It seems that Miraflores is a excepcional place where you can accommodate and party near your hotel.
On the other hand, Barranco and Pueblo Libre are exceptional in 4

## Conclusion <a name="conclusion"></a>

As a result, people are turning to big cities to start a business or work. For this reason, people can achieve better outcomes through their access to the platforms where such information is provided.

Not only for investors but also city managers can manage the city more regularly by using similar data analysis types or platforms.

To the future,