# Report on Beach Venues

## Introduction
During summer vacation is common to go to beaches and their seaside venues and enjoy the warm weather. Many famous beaches nowadays have been developing during many years and are in a continuos process of renewing and improving its venues.<br>
Our report aims to evaluate the top beaches of the world and the kind of venues they have in common, helping guide future beach developers/city planners to invest in best venues to increase turism.

## Data Description
Data will be collected from different sources:
* To identify the best beaches in the world, data will be collected from the website TripAdvisor. Anually they make a traveller choice award that rank the best turist attractions around the world. For this report will be used the Travellers' Choice 2021 top beaches in the world and in each region [1].
* To geocode beaches location was used GoogleMapsAPI [2].
* To identify the best venues in each beach will be used Foursquare API [3].
* Beaches are going to be grouped by venues similarities, creating different kinds of beaches categories that can be choose to atract tourists.

## Methodology
### Importing CSV data
The beaches database was a previously created CSV with the data from TripAdvisor Travellers' Choice 2021 top beaches in the world and in each region [1]. It was converted in a Pandas DataFrame.<br>
<img src="original database.png">

### Geocoding data
As the database didn't have coordinates it was used GoogleMapsAPI [2] to geocode the addresses (Beach Name plus Location). It was checked if were any empty/NaN value on the coordinates, but the geocoding could find coordinated to all beaches.<br>
<img src="database com gps.png">

As every beach had its own coordinates it was used folium library to create a world map with the beaches locations. <br>
<img src="mapa1.png">

As expected most beaches on the list are near or around the tropical area. There are two big cluster of beaches (Caribbean and Mediterranean) that are worldly renowned. And it is also possible to see that beaches in the Southern Hemisphere (South America, Africa and Oceania) are under represented on this dataset.<br>

### Exploring venues on FourSquare API
Using FourSquare API [3] it was explored the venues in each beach in a 1000m radius and a limit of 100 venuesfrom the central point of each beach. This distance was selected because 1km is a confortable distance to walk for most people and because almost half of this search area would be inside water in any coastal point.<br>
<img src="foursquare1.png">

Analysing the venue data extracted from FourSquare API it was possible to see that many venues are the beaches themselves and some of the beaches are so isolated that did not return any venue at all (141 beaches in the dataset x 133 beaches with venues found). So it was excluded all venues categories that are beaches and all beaches from the original data set that do not have venues were grouped on a venue category named Only Beaches.<br>
Besides that it was possible to analyse the venues categories and to exclude some categories that do not fit our problem. For exemple: it was excluded any accomodation (as hotel or resort) and hotel venues (as hotel bar or hotel pool) because these venues are generally exclusive and tourists usually do not change hotels in a single vacation.<br>
<img src="foursquare2.png">

After data cleaning it was found that 119 beaches of 141 have some kind of commercial venue. These venues are divided in 301 uniques categories, showing a great diversity of venues.<br>

### Clustering beaches
To help classify and clusterize the beaches were selected the top 10 venues from each beach.<br>
<img src="top10.png">

To clusterize the beaches it was used unsupervised learning K-means algorithm. To determine the optimal k value was used the elbow method. <br>
<img src="elbow.png">

With this method it was defined k value of 3 for the K-means alghorithm that resulted in the following distribution:<br>
<img src="clusters.png">

As cluster 2 have only one beach it was further investigated to evaluete if this result is a outlier:<br>
<img src="cluster2.png">

As can be viewed in the data, cluster 2 is a outlier because it have only one and a very especific kind of venue (Clothing Store). Because of that this beach should be considered as part of the "Only Beaches" cluster (cluster 3). 

## Results
The resulting clusters were joined in a single table with the original data:
<img src="clusterscomp.png">

Analyzing the clusters data it is possible to verify 3 kinds of beaches and 1 outlier:<br>
* Cluster 0 - <b>Smaller beaches</b>: there are 15 beaches in this classification. They have few venues but are more focused on food venus (restaurants).
* Cluster 1 - <b>Bigger beaches</b>: there are 103 beaches in this classification. They have a lot of venues, as Cluster 0 they also are focused on food venues but with more especialized restaurants (ethnics).
* Cluster 2 - <b>Outlier</b>: there are only one beach in this classification. The only venue found on these beach is a Clothing Store, making it being clustered alone.
* Cluster 3 - <b>Nature beaches</b>: there are 22 beaches in this classification. These beaches don't have venues nearby, being usualy preserved places.

The resulting map shows little difference of clusters based on location, so this help to support the idea that the clusters were formed because of different levels in beach development/size. The beaches in cluster 3 are located in well known nature paradises (as Australia and Caribbean).
<img src="mapa2.png">

<b>Legend:</b> Cluter 0 (red), Cluster 1 (purple), Cluster 2 (cyan), Cluster 3 (yellow)

## Discussion

As was our aims to evaluate the top beaches of the world and the kind of venues they have in common, we could distinguish between three kind of beaches: natural and without venues, small beaches with a low number of venues and big beaches with a high number of beaches. In the beggining of this work it was thought that these top beaches would differ more from one another, like having more party beaches (with bars and drinking venues) to family beaches (with restaurants and friendly ammenities). But want was found is that these top beaches have a more homogeneous kind of venues, focusing primarily on restaurants.<br>
The main difference between cluster 0 (smaller beaches) and cluster 1 (bigger beaches) are the kind of restaurant presents. In cluster 0 most venues were classified more generic as Restaurants, on the other hand cluster 1 showed a more especialized cousine (like Caribbean food or Seafood restaurants). It show a clearly pattern of development where smaller beaches do not have enough competition between venues or enough tourists to justify having a lot of diverse restaurants.

## Conclusion

From our study is possible to recommend investing in restaurant as a main venue attraction on these kind of beaches. The investor should consider how many restaurants already exists and the quantity of venues presents on the beach. Having to decide between a more specialized or a more generic kind of restaurant.<br>
Future studies could compare these results with venues in lesser known beaches to see if this pattern also happen.

## References
[1] Tripadvisor - Travellers' Choice 2021:
* [World](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g1)
* [Caribbean](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g147237)
* [Mexico](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g150768)
* [United States](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g191)
* [South America](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g13)
* [Europe](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g4)
* [South Pacific](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g8)
* [Africa](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g6)
* [Asia](https://www.tripadvisor.com.br/TravelersChoice-Beaches-cTop-g2)

[2] [GoogleMapsAPI](https://developers.google.com/maps)

[3] [FourSquare API](https://developer.foursquare.com/)