# Applied Data Science Capstone Project #
## Real-estate: Helping an undecided customer. ##




In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Introduction.

#### Business problem.

A customer sent us an email interested in buying a holiday property in the Algarve. The customer was not sure which area he would like to buy in, but he gave us some tips on what he was looking for.

* The client stated that he would like to live very close to a tourist area that had several outdoor leisure options.
* The client said that he would not like to live near bars or loud places at night.
* The client would not like to live in isolation or in places that are far from restaurants and other leisure options.
* The client would like to live in a luxury area.

As a real estate office supervisor one of my duties is to facilitate the work of real estate agents by delivering a good report on the client and his demands.


In [3]:
address = 'algarve'

geolocator = Nominatim(user_agent="pt_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Algarve are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Algarve are 37.2454248, -8.150925273079235.


Algarve Map

In [4]:
map_algarve = folium.Map(location=[latitude, longitude], zoom_start=9)

map_algarve

### Data Section.
#### Data Sources.
Foursquare API.

In [5]:
# @hidden_cell

CLIENT_ID = 'FBLHKZM4GHR104ZQSPDBIUP2HASHOQOLCSIP0MYCUW2KAJYP' # your Foursquare ID
CLIENT_SECRET = 'UWQ0NO1MZMSX0O3WNMSCXO2UXTFMPWBBW1QQ5O3GGKORDGLV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FBLHKZM4GHR104ZQSPDBIUP2HASHOQOLCSIP0MYCUW2KAJYP
CLIENT_SECRET:UWQ0NO1MZMSX0O3WNMSCXO2UXTFMPWBBW1QQ5O3GGKORDGLV


#### Data Requirements.
To make a good report, we will first collect information on “Foursquare” about all outdoor leisure places in the Algarve. Next, we will clear this data and look for cities with more outdoor leisure options. Since we already know the cities with more options, we will eliminate all outdoor leisure places that are not in any of these cities. Having defined the cities and their places of leisure in outdoors, we will collect more data on “Foursquare” about what is around each of these outdoor leisure places, analyze them and define which are the best to look for properties for this client based on the client requirements .

#### Data Cleaning.
Defining the desired locations for the analysis.

1.	Perform a “Get Request” in Foursquare API for all “outdoor fun” venues in Algarve.

In [6]:
query = 'Outdoor%20Fun' # "Outdoor fun" search filter.
sw = '35.92242%2C-9.765472' # SW Limit of the venues search square.
ne = '37.352693%2C-7.463837' # NE Limit of the venues search square.
LIMIT = '200' # limit of number of venues returned by Foursquare API

In [7]:
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ne={}&sw={}&query={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    ne,
    sw,
    query,
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=FBLHKZM4GHR104ZQSPDBIUP2HASHOQOLCSIP0MYCUW2KAJYP&client_secret=UWQ0NO1MZMSX0O3WNMSCXO2UXTFMPWBBW1QQ5O3GGKORDGLV&v=20180605&ne=37.352693%2C-7.463837&sw=35.92242%2C-9.765472&query=Outdoor%20Fun&limit=200'

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eb2fdc5d03993001b8ca0b6'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'query': 'outdoor fun',
  'totalResults': 139,
  'suggestedBounds': {'ne': {'lat': 37.352693, 'lng': -7.463837},
   'sw': {'lat': 35.92242, 'lng': -9.765472}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d1f48a9bdd7a0934093e4ce',
       'name': 'Ponta da Piedade',
       'location': {'address': 'EM536',
        'lat': 37.080517008015015,
        'lng': -8.669393062591553,
        'labeledLatLngs': [{'label': 'display',
          'lat': 37.080517008015015,
     

2.	Create a table with the name of the outdoor_venue, the city of the venue, o_latitude and o_longitude.

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [10]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.city', 'venue.location.state', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

  This is separate from the ipykernel package so we can avoid doing imports until


In [11]:
nearby_venues.dropna(0, inplace=True)
outdoor_venues = nearby_venues.rename(columns={"name": "outdoor_venue","lat":"o_latitude","lng":"o_longitude"},inplace=True)
nearby_venues['state'] = nearby_venues['state'].str.replace('Algarve / Portugal', 'Algarve')
nearby_venues['city'] = nearby_venues['city'].str.replace('Armação de Pêra', 'Armação de Pera')
nearby_venues['state'] = nearby_venues['state'] = 'Algarve'
outdoor_venues = nearby_venues.drop(['state','categories'], axis = 1)
outdoor_venues.drop(98, inplace=True)
outdoor_venues

Unnamed: 0,outdoor_venue,city,o_latitude,o_longitude
0,Ponta da Piedade,Lagos,37.080517,-8.669393
1,Praia Prainha,Alvor,37.118533,-8.578629
2,Praia de Vale Centianes,Carvoeiro,37.091755,-8.454089
3,Praia da Arrifana,Arrifana,37.294429,-8.865646
5,Praia do Ancão,Loulé,37.034009,-8.039484
6,Praia da Fuseta,Olhão,37.051619,-7.743901
7,Praia do Pinhão,Lagos,37.095606,-8.670456
8,Praia Vale do Garrão,Loulé,37.041268,-8.051114
9,Praia de São Rafael,Albufeira,37.074662,-8.28058
10,Ilha do Farol,Faro,36.976794,-7.865472


3.	Perform a count of locations by cities to define the 5 cities with more locations.

In [12]:
df = outdoor_venues.groupby('city').count()
df['outdoor_venue'].sort_values(ascending=False).head(5)

city
Albufeira    14
Portimão      8
Lagoa         6
Lagos         6
Sagres        5
Name: outdoor_venue, dtype: int64

4.	Filter the cities from step 3 in the table of step 2.

In [13]:
filter_list = ['Albufeira', 'Portimão', 'Lagoa', 'Lagos', 'Sagres']
outdoor_venues = outdoor_venues[outdoor_venues.city.isin(filter_list)]
outdoor_venues

Unnamed: 0,outdoor_venue,city,o_latitude,o_longitude
0,Ponta da Piedade,Lagos,37.080517,-8.669393
7,Praia do Pinhão,Lagos,37.095606,-8.670456
9,Praia de São Rafael,Albufeira,37.074662,-8.28058
13,Praia Dona Ana,Lagos,37.091669,-8.669655
14,Praia dos Três Castelos,Portimão,37.118081,-8.548125
16,Farol do Cabo de São Vicente,Sagres,37.023229,-8.995989
17,Praia Porto de Mós,Lagos,37.085577,-8.689277
18,Praia do Camilo,Lagos,37.087583,-8.668481
20,Praia da Rocha,Portimão,37.11922,-8.540215
24,Praia da Galé,Albufeira,37.081561,-8.316407


Creating the analysis table.
5.	Perform a “Get Request” in Foursquare API for each “outdoor venue” from the table of step 4.

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['outdoor_venue', 
                  'o_latitude', 
                  'o_longitude', 
                  'n_venue', 
                  'n_venue_latitude', 
                  'n_venue_longitude', 
                  'n_venue_Category']
    
    return(nearby_venues)

In [15]:
Algave_selected_venues = getNearbyVenues(names=outdoor_venues['outdoor_venue'],
                                   latitudes=outdoor_venues['o_latitude'],
                                   longitudes=outdoor_venues['o_longitude']
                                  )

Ponta da Piedade
Praia do Pinhão
Praia de São Rafael
Praia Dona Ana
Praia dos Três Castelos
Farol do Cabo de São Vicente
Praia Porto de Mós
Praia do Camilo
Praia da Rocha
Praia da Galé
Praia dos Três Irmãos
Praia da Falésia
Praia do Vau
Praia do Beliche
Praia Barranco das Belharucas
Praia do Tonel
Cabo de Sao Vicente
Praia do Alemão
Praia do Peneco
Praia Senhora Da Rocha
Praia do Carvoeiro
Praia do Lourenço
Meia Praia
Praia dos Salgados
Forte de Santa Catarina
Praia da Restinga
Praia dos Arrifes
Praia da Coelha
Praia Grande
Praia do Pintadinho
Praia do Castelo
Praia da Mareta
Praia de Benagil
Praia dos Pescadores
Herdade dos Salgados Resort
Praia dos Caneiros
Miradouro 3 Castelos
Praia Maria Luisa
Praia dos Aveiros


6.	Create a table with the name of the outdoor venue, latitude, longitude, locations around the outdoor venue, their categories, and latitude and longitude.

In [16]:
Algave_selected_venues

Unnamed: 0,outdoor_venue,o_latitude,o_longitude,n_venue,n_venue_latitude,n_venue_longitude,n_venue_Category
0,Ponta da Piedade,37.080517,-8.669393,Ponta da Piedade,37.080517,-8.669393,Scenic Lookout
1,Ponta da Piedade,37.080517,-8.669393,Farol da Ponta da Piedade,37.080953,-8.669381,Lighthouse
2,Ponta da Piedade,37.080517,-8.669393,Restaurante Bar Sol Nascente,37.08122,-8.66943,Bar
3,Ponta da Piedade,37.080517,-8.669393,Grotto Boat Trip,37.081252,-8.669457,Boat or Ferry
4,Ponta da Piedade,37.080517,-8.669393,Praia da Balanca,37.083013,-8.667874,Beach
5,Ponta da Piedade,37.080517,-8.669393,Praia dos Pinheiros,37.084232,-8.667274,Beach
6,Praia do Pinhão,37.095606,-8.670456,Praia do Pinhão,37.095606,-8.670456,Beach
7,Praia do Pinhão,37.095606,-8.670456,Iberlagos,37.093055,-8.669962,Resort
8,Praia do Pinhão,37.095606,-8.670456,Pro Putting Garden,37.097574,-8.673841,Mini Golf
9,Praia do Pinhão,37.095606,-8.670456,Qwazi - Natura Snack Bar,37.094568,-8.672263,Lounge


### Exploratory Data Analysis.

1. Create the analisis table.

In [17]:
# one hot encoding
Algave_selected_venues_onehot = pd.get_dummies(Algave_selected_venues[['n_venue_Category']], prefix="", prefix_sep="")

# add Outdoor Fun venue column back to dataframe
Algave_selected_venues_onehot['outdoor_venue'] = Algave_selected_venues['outdoor_venue'] 

# move Outdoor Fun venue column to the first column
fixed_columns = [Algave_selected_venues_onehot.columns[-1]] + list(Algave_selected_venues_onehot.columns[:-1])
Algave_selected_venues_onehot = Algave_selected_venues_onehot[fixed_columns]

Algave_selected_venues_onehot.head(5)

Unnamed: 0,outdoor_venue,American Restaurant,Arts & Crafts Store,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Beach Bar,Beer Garden,Bistro,Board Shop,Boat or Ferry,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Casino,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie,Diner,Dive Shop,Food,Food Truck,Fountain,French Restaurant,Gastropub,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hookah Bar,Hostel,Hotel,Hotel Pool,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Laundry Service,Lighthouse,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mini Golf,Modern European Restaurant,Nightclub,Outdoors & Recreation,Park,Pizza Place,Plaza,Pool,Portuguese Restaurant,Resort,Restaurant,Rock Club,Roof Deck,Sandwich Place,Scenic Lookout,Seafood Restaurant,Snack Place,Soup Place,Speakeasy,Sports Bar,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Tour Provider
0,Ponta da Piedade,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,Ponta da Piedade,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ponta da Piedade,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ponta da Piedade,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ponta da Piedade,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


2.	Transform the analisis table into a frequency table.

In [18]:
Algave_selected_venues_grouped = Algave_selected_venues_onehot.groupby('outdoor_venue').mean().reset_index()
print(Algave_selected_venues_grouped.shape)
Algave_selected_venues_grouped.head()

(39, 81)


Unnamed: 0,outdoor_venue,American Restaurant,Arts & Crafts Store,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Beach Bar,Beer Garden,Bistro,Board Shop,Boat or Ferry,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Casino,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie,Diner,Dive Shop,Food,Food Truck,Fountain,French Restaurant,Gastropub,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hookah Bar,Hostel,Hotel,Hotel Pool,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Laundry Service,Lighthouse,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mini Golf,Modern European Restaurant,Nightclub,Outdoors & Recreation,Park,Pizza Place,Plaza,Pool,Portuguese Restaurant,Resort,Restaurant,Rock Club,Roof Deck,Sandwich Place,Scenic Lookout,Seafood Restaurant,Snack Place,Soup Place,Speakeasy,Sports Bar,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Tour Provider
0,Cabo de Sao Vicente,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Farol do Cabo de São Vicente,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Forte de Santa Catarina,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.1,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.1,0.0,0.266667,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Herdade dos Salgados Resort,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.357143,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0
4,Meia Praia,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


3. Checking the 5 most common places in outdoor leisure locations.

In [19]:
num_top_venues = 5

for hood in Algave_selected_venues_grouped['outdoor_venue']:
    print("----"+hood+"----")
    temp = Algave_selected_venues_grouped[Algave_selected_venues_grouped['outdoor_venue'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Cabo de Sao Vicente----
                      venue  freq
0                Food Truck  0.33
1            Scenic Lookout  0.33
2                Lighthouse  0.33
3  Mediterranean Restaurant  0.00
4                      Park  0.00


----Farol do Cabo de São Vicente----
                      venue  freq
0                Food Truck  0.33
1            Scenic Lookout  0.33
2                Lighthouse  0.33
3  Mediterranean Restaurant  0.00
4                      Park  0.00


----Forte de Santa Catarina----
                   venue  freq
0             Restaurant  0.27
1                 Lounge  0.10
2                  Hotel  0.10
3  Portuguese Restaurant  0.10
4            Pizza Place  0.07


----Herdade dos Salgados Resort----
                venue  freq
0               Hotel  0.36
1          Restaurant  0.14
2         Coffee Shop  0.07
3    Sushi Restaurant  0.07
4  Italian Restaurant  0.07


----Meia Praia----
                      venue  freq
0                Hotel Pool  0.25
1         

### Clustering using K-Means method for analysis.

#### Clustering.

1. Creating a DATAFRAME to order each category of localities in the surroundings according to their frequency.

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['outdoor_venue']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue near'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue near'.format(ind+1))

# create a new dataframe
Algave_selected_venues_sorted = pd.DataFrame(columns=columns)
Algave_selected_venues_sorted['outdoor_venue'] = Algave_selected_venues_grouped['outdoor_venue']

for ind in np.arange(Algave_selected_venues_grouped.shape[0]):
    Algave_selected_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Algave_selected_venues_grouped.iloc[ind, :], num_top_venues)

Algave_selected_venues_sorted.head()

Unnamed: 0,outdoor_venue,1st Most Common Venue near,2nd Most Common Venue near,3rd Most Common Venue near,4th Most Common Venue near,5th Most Common Venue near,6th Most Common Venue near,7th Most Common Venue near,8th Most Common Venue near,9th Most Common Venue near,10th Most Common Venue near
0,Cabo de Sao Vicente,Lighthouse,Food Truck,Scenic Lookout,Tour Provider,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie
1,Farol do Cabo de São Vicente,Lighthouse,Food Truck,Scenic Lookout,Tour Provider,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie
2,Forte de Santa Catarina,Restaurant,Hotel,Lounge,Portuguese Restaurant,Pizza Place,Harbor / Marina,Bakery,Bar,Dive Shop,Seafood Restaurant
3,Herdade dos Salgados Resort,Hotel,Restaurant,Italian Restaurant,Golf Course,Sushi Restaurant,Beach,Resort,Café,Coffee Shop,Creperie
4,Meia Praia,Gym,Hotel Pool,Beach,Beach Bar,Tour Provider,Dive Shop,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop


2.	Dividing the data into clusters.

In [22]:
# set number of clusters
kclusters = 5

Algave_selected_venues_grouped_clustering = Algave_selected_venues_grouped.drop('outdoor_venue', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Algave_selected_venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([4, 4, 0, 1, 3, 0, 3, 1, 1, 0, 1, 1, 1, 3, 3, 1, 0, 0, 0, 0, 1, 2,
       0, 1, 0, 3, 2, 0, 0, 3, 3, 1, 1, 1, 3, 0, 1, 1, 1])

In [23]:
# add clustering labels
Algave_selected_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Algave_selected_venues_merged = outdoor_venues

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Algave_selected_venues_merged = Algave_selected_venues_merged.join(Algave_selected_venues_sorted.set_index('outdoor_venue'), on='outdoor_venue')

In [24]:
Algave_selected_venues

Unnamed: 0,outdoor_venue,o_latitude,o_longitude,n_venue,n_venue_latitude,n_venue_longitude,n_venue_Category
0,Ponta da Piedade,37.080517,-8.669393,Ponta da Piedade,37.080517,-8.669393,Scenic Lookout
1,Ponta da Piedade,37.080517,-8.669393,Farol da Ponta da Piedade,37.080953,-8.669381,Lighthouse
2,Ponta da Piedade,37.080517,-8.669393,Restaurante Bar Sol Nascente,37.08122,-8.66943,Bar
3,Ponta da Piedade,37.080517,-8.669393,Grotto Boat Trip,37.081252,-8.669457,Boat or Ferry
4,Ponta da Piedade,37.080517,-8.669393,Praia da Balanca,37.083013,-8.667874,Beach
5,Ponta da Piedade,37.080517,-8.669393,Praia dos Pinheiros,37.084232,-8.667274,Beach
6,Praia do Pinhão,37.095606,-8.670456,Praia do Pinhão,37.095606,-8.670456,Beach
7,Praia do Pinhão,37.095606,-8.670456,Iberlagos,37.093055,-8.669962,Resort
8,Praia do Pinhão,37.095606,-8.670456,Pro Putting Garden,37.097574,-8.673841,Mini Golf
9,Praia do Pinhão,37.095606,-8.670456,Qwazi - Natura Snack Bar,37.094568,-8.672263,Lounge


#### Cluster analysis.

##### Cluster 0:

In [25]:
Algave_selected_venues_merged.loc[Algave_selected_venues_merged['Cluster Labels'] == 0, Algave_selected_venues_merged.columns[[0,1] + list(range(5, Algave_selected_venues_merged.shape[1]))]]

Unnamed: 0,outdoor_venue,city,1st Most Common Venue near,2nd Most Common Venue near,3rd Most Common Venue near,4th Most Common Venue near,5th Most Common Venue near,6th Most Common Venue near,7th Most Common Venue near,8th Most Common Venue near,9th Most Common Venue near,10th Most Common Venue near
7,Praia do Pinhão,Lagos,Hotel,Portuguese Restaurant,Restaurant,Beach,Breakfast Spot,Bar,Pizza Place,Seafood Restaurant,Lounge,Burger Joint
20,Praia da Rocha,Portimão,Bar,Portuguese Restaurant,Hotel,Restaurant,Ice Cream Shop,Nightclub,Bistro,Italian Restaurant,Pizza Place,Cocktail Bar
31,Praia do Beliche,Sagres,Portuguese Restaurant,Beach,Coffee Shop,Snack Place,Tour Provider,Dive Shop,Castle,Cave,Chinese Restaurant,Cocktail Bar
37,Praia do Peneco,Albufeira,Portuguese Restaurant,Bar,Restaurant,Hotel,Mediterranean Restaurant,Seafood Restaurant,Plaza,Scenic Lookout,Italian Restaurant,Nightclub
43,Praia do Carvoeiro,Lagoa,Portuguese Restaurant,Bar,Seafood Restaurant,Restaurant,Beach,Ice Cream Shop,Mediterranean Restaurant,Coffee Shop,Diner,Café
53,Forte de Santa Catarina,Portimão,Restaurant,Hotel,Lounge,Portuguese Restaurant,Pizza Place,Harbor / Marina,Bakery,Bar,Dive Shop,Seafood Restaurant
59,Praia da Restinga,Portimão,Seafood Restaurant,Portuguese Restaurant,Bar,Restaurant,Café,Mediterranean Restaurant,Italian Restaurant,Snack Place,Boat or Ferry,Food
63,Praia Grande,Lagoa,Tapas Restaurant,Bar,Beach,Beach Bar,Tour Provider,Food,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop
69,Praia da Mareta,Sagres,Portuguese Restaurant,Pizza Place,Burger Joint,Café,Cocktail Bar,Bar,Restaurant,Hotel,Board Shop,Laundry Service
71,Praia de Benagil,Lagoa,Seafood Restaurant,Beach,Cave,Portuguese Restaurant,Tour Provider,BBQ Joint,Golf Course,Diner,Gym,Chinese Restaurant


Due to the high number of bars in the top positions of the ranking, this cluster is not suitable for the client.

##### Cluster 1:

In [26]:
Algave_selected_venues_merged.loc[Algave_selected_venues_merged['Cluster Labels'] == 1, Algave_selected_venues_merged.columns[[0,1] + list(range(5, Algave_selected_venues_merged.shape[1]))]]

Unnamed: 0,outdoor_venue,city,1st Most Common Venue near,2nd Most Common Venue near,3rd Most Common Venue near,4th Most Common Venue near,5th Most Common Venue near,6th Most Common Venue near,7th Most Common Venue near,8th Most Common Venue near,9th Most Common Venue near,10th Most Common Venue near
9,Praia de São Rafael,Albufeira,Beach,Mediterranean Restaurant,Gym,Modern European Restaurant,Portuguese Restaurant,Hotel,Diner,Cave,Chinese Restaurant,Cocktail Bar
13,Praia Dona Ana,Lagos,Hotel,Restaurant,Beach,Portuguese Restaurant,Chinese Restaurant,Scenic Lookout,Seafood Restaurant,Snack Place,Pizza Place,Resort
14,Praia dos Três Castelos,Portimão,Hotel,Italian Restaurant,Portuguese Restaurant,Restaurant,Seafood Restaurant,Mediterranean Restaurant,Pizza Place,Japanese Restaurant,Scenic Lookout,Café
17,Praia Porto de Mós,Lagos,Seafood Restaurant,Hotel,Resort,Beach,Dive Shop,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop
18,Praia do Camilo,Lagos,Hotel,Beach,Seafood Restaurant,Bar,Snack Place,Scenic Lookout,Food,Cave,Chinese Restaurant,Cocktail Bar
24,Praia da Galé,Albufeira,Hotel,Beach,Pizza Place,Portuguese Restaurant,Italian Restaurant,Café,Seafood Restaurant,Supermarket,French Restaurant,Fountain
26,Praia dos Três Irmãos,Portimão,Hotel,Seafood Restaurant,Beach,Portuguese Restaurant,Bowling Alley,Mediterranean Restaurant,Hotel Pool,Resort,Restaurant,Tour Provider
30,Praia do Vau,Portimão,Hotel,Beach,Café,Mediterranean Restaurant,Dive Shop,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie
32,Praia Barranco das Belharucas,Albufeira,Hotel,Bar,Restaurant,Gastropub,Beach,Fountain,Pool,Seafood Restaurant,Breakfast Spot,Resort
40,Praia Senhora Da Rocha,Lagoa,Bar,Beach,Hotel,Snack Place,Ice Cream Shop,Resort,Italian Restaurant,Restaurant,Tour Provider,Diner


Given the high number of hotels this cluster may be unpleasant to the customer, but it has many other options for outdoor leisure.

### Cluster 2:

In [27]:
Algave_selected_venues_merged.loc[Algave_selected_venues_merged['Cluster Labels'] == 2, Algave_selected_venues_merged.columns[[0,1] + list(range(5, Algave_selected_venues_merged.shape[1]))]]

Unnamed: 0,outdoor_venue,city,1st Most Common Venue near,2nd Most Common Venue near,3rd Most Common Venue near,4th Most Common Venue near,5th Most Common Venue near,6th Most Common Venue near,7th Most Common Venue near,8th Most Common Venue near,9th Most Common Venue near,10th Most Common Venue near
36,Praia do Alemão,Portimão,Beach,Hotel,Restaurant,Tour Provider,Dive Shop,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie
46,Praia do Lourenço,Albufeira,Beach,Portuguese Restaurant,Tour Provider,Dive Shop,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie


Given the high outdoors leisure options, this cluster is recommended to our client.

Cluster 3:

In [28]:
Algave_selected_venues_merged.loc[Algave_selected_venues_merged['Cluster Labels'] == 3, Algave_selected_venues_merged.columns[[0,1] + list(range(5, Algave_selected_venues_merged.shape[1]))]]

Unnamed: 0,outdoor_venue,city,1st Most Common Venue near,2nd Most Common Venue near,3rd Most Common Venue near,4th Most Common Venue near,5th Most Common Venue near,6th Most Common Venue near,7th Most Common Venue near,8th Most Common Venue near,9th Most Common Venue near,10th Most Common Venue near
0,Ponta da Piedade,Lagos,Beach,Scenic Lookout,Bar,Lighthouse,Boat or Ferry,Tour Provider,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie
29,Praia da Falésia,Albufeira,Beach,Restaurant,Seafood Restaurant,Thai Restaurant,Lounge,Speakeasy,Beer Garden,Diner,Cave,Chinese Restaurant
33,Praia do Tonel,Sagres,Seafood Restaurant,Beach,Ice Cream Shop,Castle,Arts & Crafts Store,Grocery Store,Dive Shop,Cave,Chinese Restaurant,Cocktail Bar
51,Meia Praia,Lagos,Gym,Hotel Pool,Beach,Beach Bar,Tour Provider,Dive Shop,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop
61,Praia da Coelha,Albufeira,Beach,Restaurant,Portuguese Restaurant,Hotel Pool,Resort,Tour Provider,Creperie,Castle,Cave,Chinese Restaurant
65,Praia do Pintadinho,Lagoa,Beach,Surf Spot,Cocktail Bar,Lighthouse,Tour Provider,Dive Shop,Cave,Chinese Restaurant,Coffee Shop,Creperie
67,Praia do Castelo,Albufeira,Beach,Restaurant,Seafood Restaurant,Portuguese Restaurant,Resort,Diner,Castle,Cave,Chinese Restaurant,Cocktail Bar
79,Praia dos Caneiros,Lagoa,Seafood Restaurant,Surf Spot,Beach,Food,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie


With many leisure options and few bars, this cluster seems to be the best choice.

Cluster 4:

In [29]:
Algave_selected_venues_merged.loc[Algave_selected_venues_merged['Cluster Labels'] == 4, Algave_selected_venues_merged.columns[[0,1] + list(range(5, Algave_selected_venues_merged.shape[1]))]]

Unnamed: 0,outdoor_venue,city,1st Most Common Venue near,2nd Most Common Venue near,3rd Most Common Venue near,4th Most Common Venue near,5th Most Common Venue near,6th Most Common Venue near,7th Most Common Venue near,8th Most Common Venue near,9th Most Common Venue near,10th Most Common Venue near
16,Farol do Cabo de São Vicente,Sagres,Lighthouse,Food Truck,Scenic Lookout,Tour Provider,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie
35,Cabo de Sao Vicente,Sagres,Lighthouse,Food Truck,Scenic Lookout,Tour Provider,Castle,Cave,Chinese Restaurant,Cocktail Bar,Coffee Shop,Creperie


Given the lack of restaurants and other infrastructure, this cluster is not recommended.

### Conclusion.

Analyzing the clusters we can determine that the best reference locations to search for properties for the client are:

In [30]:
Recomendations = Algave_selected_venues_merged[['outdoor_venue','city','o_latitude','o_longitude']][Algave_selected_venues_merged['Cluster Labels'] == 3]
Recomendations

Unnamed: 0,outdoor_venue,city,o_latitude,o_longitude
0,Ponta da Piedade,Lagos,37.080517,-8.669393
29,Praia da Falésia,Albufeira,37.07502,-8.130998
33,Praia do Tonel,Sagres,37.005243,-8.947999
51,Meia Praia,Lagos,37.108176,-8.662033
61,Praia da Coelha,Albufeira,37.073975,-8.293565
65,Praia do Pintadinho,Lagoa,37.107763,-8.518518
67,Praia do Castelo,Albufeira,37.073481,-8.298519
79,Praia dos Caneiros,Lagoa,37.105142,-8.513445


Map of Locations

In [31]:
Algarve_conclusion_map = folium.Map(
    location=[37.105142 , -8.513445],
    zoom_start=10,
    tiles='Stamen Terrain'
)


for lat, lng, label in zip(Recomendations['o_latitude'], Recomendations['o_longitude'], Recomendations['outdoor_venue']):
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color='red', icon='info-sign')
).add_to(Algarve_conclusion_map)  

Algarve_conclusion_map