# Choose a business to fund in Trentino-Alto Adige

### Introduction

A random investor wants to fund a business in Northern Italy, Trentino-Alto Adige. 
He is somewhat interested in Alpine region and hotel business. Soon he faces **the problem: the area consists of small towns sparse** in Alps. He realises that it is a complicated task to choose a location and a precise type of business. So he hires a Data science specialist to produce an insight to this problem.

### Area of interest

![](first_glance_map.png)

____________________________

# Data

We will be using:
- **Google**
    - coordinates of towns, 
    - population,
    - finding a local government website

- **Foursquare**
    - coordinates of venues 
    - categories of venues

![](Coordinates.png)

- **Official statistics** from [local government website](https://astat.provincia.bz.it/it/banche-dati-comunali.asp):
    - for in-depth insight in touristic flows in region.
![](Visitors.png)

For Foursquare there are some additional requirements: one needs to have own account to be able to run and copy the code. 
Official statistics are given in tables on the government website. Everything is **downloadable in XLS.** For the scope of this project all downloads are **available in repository.**

____________

# Methodology

## Data analysis:
- **Visualize Foursquare venues on the map**
- **Visualize official statistics**
    - Convert XLS to dataframes and graph
    - Introduce corrective coefficients for better visual perception:
        - For two towns that share statistical data
        - For all towns based on population 
    - Pair and merge dataframes and graph
- **Take notes on the insights**
- **Create a global dataframe from official statistics**

![](Hotel_preference.png)

![](Distribution_of_hotels.png)

![](Bars.png)

## Machine learning
### Applied to *global dataframe* (official statistics)
- Feature engineering
    - Using **Pearson correlation heatmap** from global dataframe extract the most relevant features, as if the scenario would be building a 4-5 star hotel in the region. This part can be repeated for any other feature of the global dataframe
![](Pearson.png)

- Clustering
    - Using **KMEANS** cluster the towns *based on global dataframe*
    - Using **Unsupervised Agglomerative clustering** (ascending) cluster the towns of interest *based on global dataframe*
    - Compare 2 clustering algorithms. They should produce cluster the towns in similar, or better - equal way.
![](Clusters.png)

### Applied to Foursquare venues dataframe
- Clustering
    - Using **KMEANS** cluster venues according to venue categories. Visualize cluster centers. It produces economy centroids for each town for each category of a venue.
    - Using **KMEANS weighted clustering** cluster venues according to categories and weights, where each weight is given by ratio of *given category n_samples / all categories n_samples*. It produces weighted economy centroids for each town.
    - Visualize resulting centroids and geographical centers of towns. Create a dataframe of **average** presence of category in town and **actual** percent of category presence. Show the result in Popup on centroids.
![](Weighted.png)

______

# Results and discussion

   **1) Based on official regional statistics it was possible to make a solid image of touristic  flux in the region, latest years trends, hotel industry and nationality preferences. There is a possibility to add a study of road traffic to this section because data is available on government website.** 
   
   **Supposing that the goal was finding a place for a luxury hotel according to this analysis 2 candidate places out of 6 are selected.**
    
   **2) Based on registered venues on Foursquare and machine learning techniques it was possible to introduce a concept of "economical centroid" and "weighted economical centroid" in a given place. This information provides an insight on economy major constituents and disbalances of each centroid. This information allows to conclude what type of business is abundant or lacking in a selected area and make investment decisions.** 
   
   **After this stage not only we can define a particular place for a luxury hotel (initial problem), but for any major category represented in Foursquare. We can argue profitability of the suggested initial search and provide numerical evidence for better solutions for investment.**
    
   **3) The second part of machine learning code can be reproduced for any other set of places in tandem with Foursquare data. We now have a readymade tool for similar requests worldwide.**

![](Final.png)

____

# Conclusion

**Wherever Foursquare data is present on the world map, a similar study can be implemented. The concept of *"weighted economy centroid"* is useful to help determine the geolocation of a new hypothetical venue with respect to geographical center of the area, and the most profitable direction in the area. Also *economy centroid* provides an insight of economical unbalance of the region. The more general conclusion is to try and find more economically unbalanced places and fill them with new venues. It's less profitable and more difficult to enter in a well-balanced area.**