# Diving equipment store in Haute-Savoie

## Table of contents
   1. [Introduction : Business Problem](#introduction)
   2. [Data](#data)
       - [Cities and their population in Haute-Savoie](#cities)
       - [Localization of the cities](#loc)
       - [Venues in the neighborhoods](#venues)
   3. [Methodology and data analysis](#methodology)
   4. [Results](#results)
   5. [Discussion](#discussion)
   5. [Conclusion](#conclusion)

## 1. Introduction : Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a diving equipment store. Specifically, this report will be targeted to stakeholders interested in opening a **diving & watersport equipment store** in **Haute-Savoie**, France.

Since the store is dedicated to scuba diving and watersport activities, we will first try to detect **locations with *water access* in vicinity**. In order to limit the effect of competition, it will also be necessary to ensure that **no such sports store is already established near the selected areas**.
Then, assuming that these two conditions are met, we would prefer **frequented places** with a fairly high population density.

We will use our data science powers to generate a few most promising neighborhoods based on these criteria. Advantages of each area will then be clearly expressed, so that the stakeholders can easily make a choice.

# 2. Data <a name="data"></a>

Based on our business problem, we will have to gather data such as :
 * Name and localization of all the cities in Haute-Savoie
 * Population of each city
 * Number and type of water access in the neighborhood of each city, if any
 * Number of existing sports store in the neighborhood of each city
 

Following data sources will be needed :
 * Names and populations of the cities in Haute-Savoie will be obtained using an available table of **Wikipedia**
 * Localization of the cities will be read from a local csv file
 * Venues such as water access and sports store will be extracted using **Foursquare API**

### a. Cities and their population in Haute-Savoie <a name="cities"></a>

Let's extract the available table of Wikipedia where all the cities ('communes' in french) of Haute-Savoie are listed with also the associated population.

In [5]:
#import libraries
import numpy as np
import pandas as pd 
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows', None)

In [6]:
# Extract table from wikipedia
from pandas.io.html import read_html
page='https://fr.wikipedia.org/wiki/Liste_des_communes_de_la_Haute-Savoie'
table=read_html(page,attrs={"class":"wikitable"})
print('Table extracted')

Table extracted


In [7]:
# Check the first rows of the obtained dataframe
communes74=table[0]
print(communes74.shape)
communes74.head()

(280, 10)


Unnamed: 0,Nom,CodeInsee,Code postal,Arrondissement,Canton,Intercommunalité,Superficie(km2),Population(dernière pop. légale),Densité(hab./km2),Modifier
0,Annecy(préfecture),74010,7400074370746007494074960,Annecy,Annecy-1Annecy-2Annecy-le-VieuxSeynod,CA du Grand Annecy,6694,126 924 (2017),1 896,
1,Abondance,74001,74360,Thonon-les-Bains,Évian-les-Bains,CC Pays d'Évian Vallée d'Abondance,5884,1 439 (2017),24,
2,Alby-sur-Chéran,74002,74540,Annecy,Rumilly,CA du Grand Annecy,656,2 579 (2017),393,
3,Alex,74003,74290,Annecy,Faverges,CC des vallées de Thônes,1702,1 072 (2017),63,
4,Allèves,74004,74540,Annecy,Rumilly,CA du Grand Annecy,881,411 (2017),47,


We only need the columns with the name and the population, so let's clean the table.

In [8]:
# Remove rows with undefined postal code
communes74.dropna(subset=['Code postal'],axis=0,inplace=True)
# Remove all colums except 'Nom' and 'Population'
communes74.drop({'CodeInsee','Code postal','Arrondissement','Canton','Intercommunalité','Superficie(km2)','Densité(hab./km2)','Modifier'},axis=1,inplace=True)
# Rename the columns as 'Commune' and 'Population'
communes74.rename(columns={'Nom':'Commune','Population(dernière pop. légale)':'Population'},inplace=True)
# Remove all comments in ()
communes74['Commune']=communes74['Commune'].str.split('(').str[0]
communes74['Population']=communes74['Population'].str.split('(').str[0]
print(communes74.shape)
communes74.head()

(279, 2)


Unnamed: 0,Commune,Population
0,Annecy,126 924
1,Abondance,1 439
2,Alby-sur-Chéran,2 579
3,Alex,1 072
4,Allèves,411


Population has to be considered as a number, so let's define the type of collected data.

In [9]:
communes74.dtypes

Commune       object
Population    object
dtype: object

In [10]:
# Remove the thousand separator in the string
def remov_sep(s):
    j=""
    for x in s.split():
        j=j+str(x)
    return j

for num in range(communes74.shape[0]):
    communes74['Population'][num]=remov_sep(communes74['Population'][num])

In [11]:
communes74[['Population']]=communes74[['Population']].astype('int')
print(communes74.dtypes)
communes74.head()

Commune       object
Population     int64
dtype: object


Unnamed: 0,Commune,Population
0,Annecy,126924
1,Abondance,1439
2,Alby-sur-Chéran,2579
3,Alex,1072
4,Allèves,411


### b. Localization of the cities <a name="loc"></a>

Now let's collect also the latitude and longitude of all theses cities. For that we have to read a local csv file.

In [12]:
# The code was removed by Watson Studio for sharing.

(279, 4)


Unnamed: 0,Commune,Code Postal,Latitude,Longitude
0,Annecy,74000\n74370\n74600\n74940\n74960,45.900002,6.11667
1,Abondance,74360,46.283329,6.73333
2,Alby-sur-Chéran,74540,45.8167,6.0167
3,Alex,74290,45.883331,6.23333
4,Allèves,74540,45.75,6.08333


Let's first remove the postal code from the previous table and then combine the 2 databases.

In [13]:
communes_latlon.drop({'Code Postal'},axis=1,inplace=True)
comm74 = pd.merge(communes74,communes_latlon,on='Commune')
print(comm74.shape)
comm74.head()

(279, 4)


Unnamed: 0,Commune,Population,Latitude,Longitude
0,Annecy,126924,45.900002,6.11667
1,Abondance,1439,46.283329,6.73333
2,Alby-sur-Chéran,2579,45.8167,6.0167
3,Alex,1072,45.883331,6.23333
4,Allèves,411,45.75,6.08333


In order to visualize on a map the repartition of the cities, we have to import the following libraries.

In [14]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    openssl-1.1.1f             |       h516909a_0         2.1 MB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ------------------------------------------------------------
                       

In [15]:
# Define Latitude and Longitude of Haute-Savoie
T_lon = 6.3833
T_lat = 46.05

# Create map of Haute-Savoie
map_htesavoie = folium.Map(location = [T_lat, T_lon], zoom_start=10)

# Add markers to map
for lat, lon, comm in zip(comm74['Latitude'], comm74['Longitude'], comm74['Commune']):
    label = '{}'.format(comm)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html=False).add_to(map_htesavoie)

map_htesavoie

### c. Venues in the neighborhoods <a name="venues"></a>

Now that we have a good overview of the cities in Haute-Savoie, let's find out where are the 'water access' and sports stores around them. We will use Foursquare API to gather this information.

Foursquare credentials are defined in hidden cell below.

In [16]:
# The code was removed by Watson Studio for sharing.

In [17]:
import requests

In [18]:
# Define a function to get nearby venues
# limit of number of venues is set to 100 and radius around each location is set to 2500 m

limit = 100

def getNearbyVenues(names, latitudes, longitudes, radius=2500):
    
    venues_list=[]
    for name, lat, lon in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lon,
            radius, 
            limit)
        
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lon, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['categories'][0]['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Commune', 
                  'Commune Latitude', 
                  'Commune Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Venue Id']
    
    return(nearby_venues)

Let's now go over our neighborhood locations and get nearby venues

In [19]:
htesavoie_venues = getNearbyVenues(names = comm74['Commune'], latitudes = comm74['Latitude'], longitudes = comm74['Longitude'])
print(htesavoie_venues.shape)
htesavoie_venues.head()

(2366, 8)


Unnamed: 0,Commune,Commune Latitude,Commune Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Id
0,Annecy,45.900002,6.11667,Brumes,45.899517,6.123535,Coffee Shop,4bf58dd8d48988d1e0931735
1,Annecy,45.900002,6.11667,Beer O'Clock,45.897427,6.123039,Bar,4bf58dd8d48988d116941735
2,Annecy,45.900002,6.11667,Une Autre Histoire,45.899761,6.121999,Tea Room,4bf58dd8d48988d1dc931735
3,Annecy,45.900002,6.11667,Chez Pen,45.904349,6.121446,Bar,4bf58dd8d48988d116941735
4,Annecy,45.900002,6.11667,Le Barista Café,45.900824,6.124986,Coffee Shop,4bf58dd8d48988d1e0931735


*The following lines of code are just to avoid relaunching calls too many times during the study...*

In [16]:
#htesavoie_venues.to_pickle('./htesavoie_venues.pkl')    

In [17]:
#import pickle
#with open('htesavoie_venues.pkl', 'rb') as f:
#        t = pickle.load(f)

In [None]:
#t.shape

In [None]:
#htesavoie_venues = t

In [20]:
print('There are {} unique categories.'.format(len(htesavoie_venues['Venue Category'].unique())))

There are 228 unique categories.


We have to define the list of relevant venues. We can find all the needed documentation on Foursquare website *(https://developer.foursquare.com/docs/resources/categories)*.

In [21]:
# list of all the relevant categories
category = ['4bf58dd8d48988d193941735','4bf58dd8d48988d105941735','52e81612bcbc57f1066b7a28',
            '56aa371be4b08b9a8d573544','4bf58dd8d48988d1e2941735','52e81612bcbc57f1066b7a12',
            '4bf58dd8d48988d1e0941735','4bf58dd8d48988d160941735','4bf58dd8d48988d161941735',
            '4bf58dd8d48988d15e941735','52e81612bcbc57f1066b7a29','56aa371be4b08b9a8d573541',
            '4eb1d4dd4b900d56c88a45fd','56aa371be4b08b9a8d573560','56aa371be4b08b9a8d5734c3',
            '52e81612bcbc57f1066b7a44','52e81612bcbc57f1066b7a27','52f2ab2ebcbc57f1066b8b1a',
            '52f2ab2ebcbc57f1066b8b22','58daa1558bbb0b01f18ec1ae','4bf58dd8d48988d1ed941735',
            '4bf58dd8d48988d1f2941735','56aa371be4b08b9a8d57353e']
# sublist with only the categories related to water access
water_access_cat = ['4bf58dd8d48988d193941735','4bf58dd8d48988d105941735','52e81612bcbc57f1066b7a28',
            '56aa371be4b08b9a8d573544','4bf58dd8d48988d1e2941735','52e81612bcbc57f1066b7a12',
            '4bf58dd8d48988d1e0941735','4bf58dd8d48988d160941735','4bf58dd8d48988d161941735',
            '4bf58dd8d48988d15e941735','52e81612bcbc57f1066b7a29','56aa371be4b08b9a8d573541',
            '4eb1d4dd4b900d56c88a45fd','56aa371be4b08b9a8d573560','56aa371be4b08b9a8d5734c3',
            '52e81612bcbc57f1066b7a44','52e81612bcbc57f1066b7a27','58daa1558bbb0b01f18ec1ae',
            '4bf58dd8d48988d1ed941735','56aa371be4b08b9a8d57353e']
# sublist with only the categories related to shops
shops_cat = ['52f2ab2ebcbc57f1066b8b1a','52f2ab2ebcbc57f1066b8b22','4bf58dd8d48988d1f2941735']

Let's extract only the relevant venues for our research.

In [22]:
htesavoie_sel_venues = htesavoie_venues[htesavoie_venues['Venue Id'].isin(category)]
print(htesavoie_sel_venues.shape)

(113, 8)


In [23]:
# one hot encoding
htesavoie_onehot = pd.get_dummies(htesavoie_sel_venues[['Venue Category']], prefix="", prefix_sep="")

# add Commune column back to dataframe
htesavoie_onehot['Commune'] = htesavoie_sel_venues['Commune'] 

# move Commune column to the first column
fixed_columns = [htesavoie_onehot.columns[-1]] + list(htesavoie_onehot.columns[:-1])
htesavoie_onehot = htesavoie_onehot[fixed_columns]

print(htesavoie_onehot.shape)
htesavoie_onehot.head()

(113, 16)


Unnamed: 0,Commune,Bay,Beach,Dive Spot,Harbor / Marina,Hot Spring,Lake,Outdoor Supply Store,Pool,Rafting,Reservoir,River,Spa,Sporting Goods Shop,Water Park,Waterfall
46,Annecy,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
50,Annecy,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
77,Annecy,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
134,Ambilly,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
194,Annemasse,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


And here we are ! Let's have a look to the different relevant venues around each city.

In [24]:
communes_table = htesavoie_onehot.groupby('Commune').sum().reset_index()
print(communes_table.shape)
communes_table.head()

(70, 16)


Unnamed: 0,Commune,Bay,Beach,Dive Spot,Harbor / Marina,Hot Spring,Lake,Outdoor Supply Store,Pool,Rafting,Reservoir,River,Spa,Sporting Goods Shop,Water Park,Waterfall
0,Ambilly,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,Annecy,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Annemasse,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,Anthy-sur-Léman,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Armoy,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


Let's merge all the data by city (relevant venues, population, latitude and longitude).

In [25]:
df=pd.merge(communes_table,comm74,on='Commune')
df.head()

Unnamed: 0,Commune,Bay,Beach,Dive Spot,Harbor / Marina,Hot Spring,Lake,Outdoor Supply Store,Pool,Rafting,Reservoir,River,Spa,Sporting Goods Shop,Water Park,Waterfall,Population,Latitude,Longitude
0,Ambilly,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,6385,46.1952,6.2243
1,Annecy,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,126924,45.900002,6.11667
2,Annemasse,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,35712,46.200001,6.25
3,Anthy-sur-Léman,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2171,46.3553,6.4273
4,Armoy,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1295,46.349998,6.51667


Let's now see all the collected data on a map.
All the cities that have water access are shown with a blue dot, and if there's a sports shop at the same time we add a red circle around.

In [26]:
# Define Latitude and Longitude of Haute-Savoie
T_lon = 6.3833
T_lat = 46.05

# Create map of Haute-Savoie
map2_htesavoie = folium.Map(location = [T_lat, T_lon], zoom_start=10)

# Add markers to map
for lat, lon, comm, shop in zip(df['Latitude'], df['Longitude'], df['Commune'], df['Sporting Goods Shop']):
    label = '{}'.format(comm)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 5,
        popup = label,
        color = 'blue' if shop==0 else 'red',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html=False).add_to(map2_htesavoie)

map2_htesavoie

This concludes the data gathering phase - we're now ready to use this data for analysis to find an optimal location for a new dive equipment store !

## 3. Methodology and data analysis <a name="analysis"></a>

Now that we have the needed data, let's go back to our criteria to define the optimal locations :
- locations with water access in vicinity
- no sports store already established
- frequented places

### Types of water access

As we saw above, there are different types of water access. To help the stakeholders in their final decision, it can be interesting to group them as follow :
   - Open water access such as beach or lake
   - Confined water access such as pool, spa or water park
   - Rapids access such as rafting or waterfall

So let's create a new table with these categories.

In [28]:
df_analyse = df
df_analyse['Open water']=df_analyse['Beach']+df_analyse['Lake']+df_analyse['Bay']+df_analyse['Dive Spot']+df_analyse['Harbor / Marina']
df_analyse['Confined water']=df_analyse['Pool']+df_analyse['Spa']+df_analyse['Hot Spring']+df_analyse['Water Park']+df_analyse['Reservoir']
df_analyse['Rapids']=df_analyse['Rafting']+df_analyse['Waterfall']+df_analyse['River']
df_analyse['Sporting Goods Shop']=df_analyse['Sporting Goods Shop']+df_analyse['Outdoor Supply Store']
df_analyse = df_analyse.drop(['Beach','Lake','Bay','Dive Spot','Harbor / Marina','Pool','Spa','Hot Spring','Water Park','Rafting','Waterfall','River','Reservoir','Outdoor Supply Store'],axis=1)
df_analyse

Unnamed: 0,Commune,Sporting Goods Shop,Population,Latitude,Longitude,Open water,Confined water,Rapids
0,Ambilly,1,6385,46.1952,6.2243,0,0,0
1,Annecy,0,126924,45.900002,6.11667,3,0,0
2,Annemasse,1,35712,46.200001,6.25,0,0,0
3,Anthy-sur-Léman,0,2171,46.3553,6.4273,1,0,0
4,Armoy,0,1295,46.349998,6.51667,0,0,1
5,Arthaz-Pont-Notre-Dame,0,1577,46.150002,6.28333,0,1,0
6,Bluffy,0,391,45.866669,6.21667,2,0,0
7,Bonne,0,3218,46.1682,6.3215,0,1,0
8,Brenthonne,0,1037,46.283329,6.4,0,1,0
9,Brizon,0,485,46.049999,6.45,0,1,0


Let's now remove from our gathered data, the cities where there's only a sports store and no water access, if any.

In [29]:
df_shop = df_analyse[df_analyse['Sporting Goods Shop']!=0]

In [30]:
df_shop['Water access']=df_shop['Open water']+df_shop['Confined water']+df_shop['Rapids']
df_shop_no_water_access = df_shop[df_shop['Water access']==0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [31]:
df_analyse = df_analyse.drop(df_shop_no_water_access.index).reset_index(drop=True)
df_analyse

Unnamed: 0,Commune,Sporting Goods Shop,Population,Latitude,Longitude,Open water,Confined water,Rapids
0,Annecy,0,126924,45.900002,6.11667,3,0,0
1,Anthy-sur-Léman,0,2171,46.3553,6.4273,1,0,0
2,Armoy,0,1295,46.349998,6.51667,0,0,1
3,Arthaz-Pont-Notre-Dame,0,1577,46.150002,6.28333,0,1,0
4,Bluffy,0,391,45.866669,6.21667,2,0,0
5,Bonne,0,3218,46.1682,6.3215,0,1,0
6,Brenthonne,0,1037,46.283329,6.4,0,1,0
7,Brizon,0,485,46.049999,6.45,0,1,0
8,Chainaz-les-Frasses,0,728,45.76667,6.0,0,1,0
9,Chamonix-Mont-Blanc,1,8611,45.916672,6.86667,0,2,0


Now we have a list of all the locations with water access in the vincinity and also the 3 defined categories for water access.

### Clustering

Let's now cluster those cities to create centers of zones containing good locations.

In [32]:
from sklearn.cluster import KMeans

number_of_clusters = 10

good_loc = df_analyse[['Latitude', 'Longitude']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_loc)

cluster_centers = [(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

df_clusters=df_analyse
df_clusters.insert(0, 'Cluster Labels', kmeans.labels_)

Let's see on a map where are the centers of the 10 defined clusters. They are represented with a black circle and the clusters of cities are shown with 10 different colors.

In [33]:
# create map
map_cl= folium.Map(location=[T_lat, T_lon], zoom_start=10)

# set color scheme for the clusters
x = np.arange(number_of_clusters)
ys = [i + x + (i*x)**2 for i in range(number_of_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lon in cluster_centers:
    folium.Circle([lat, lon], radius=1000, color='black', fill=False).add_to(map_cl) 
for lat, lon, cl, comm in zip(df_clusters['Latitude'], df_clusters['Longitude'], df_clusters['Cluster Labels'], df_clusters['Commune']):
    label = folium.Popup(str(comm) + ' Cluster ' + str(cl), parse_html=True)
    folium.CircleMarker([lat, lon], radius=5, popup=label, color=rainbow[cl-1], fill=True, fill_color=rainbow[cl-1], fill_opacity=1).add_to(map_cl)
    
map_cl

Now that we have defined 10 zones that could be good candidates, let's compare them more in details.

In [34]:
cl_analyse = df_clusters
cl_analyse[['Population']]=cl_analyse[['Population']].astype('int')

Let's have a look at each cluster.
For each of the zones, we can resume their population and numbers of sport shops and types of water access.

In [35]:
cl1_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==0,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl1_analyse = cl1_analyse.drop(['Latitude','Longitude'],axis=1)
cl1_analyse.sum()

Commune                BrizonLa TourVille-en-SallazViuz-en-Sallaz
Sporting Goods Shop                                             0
Population                                                   7039
Open water                                                      3
Confined water                                                  1
Rapids                                                          0
dtype: object

In [36]:
cl2_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==1,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl2_analyse = cl2_analyse.drop(['Latitude','Longitude'],axis=1)
cl2_analyse.sum()

Commune                Chainaz-les-FrassesCusy
Sporting Goods Shop                          0
Population                                2578
Open water                                   0
Confined water                               2
Rapids                                       0
dtype: object

In [37]:
cl3_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==2,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl3_analyse = cl3_analyse.drop(['Latitude','Longitude'],axis=1)
cl3_analyse.sum()

Commune                Anthy-sur-LémanBrenthonneChens-sur-LémanExcene...
Sporting Goods Shop                                                    0
Population                                                         16628
Open water                                                            12
Confined water                                                         1
Rapids                                                                 0
dtype: object

In [38]:
cl4_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==3,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl4_analyse = cl4_analyse.drop(['Latitude','Longitude'],axis=1)
cl4_analyse.sum()

Commune                Chamonix-Mont-BlancComblouxDomancyPassySaint-G...
Sporting Goods Shop                                                    4
Population                                                         46392
Open water                                                             7
Confined water                                                         4
Rapids                                                                 0
dtype: object

In [39]:
cl5_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==4,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl5_analyse = cl5_analyse.drop(['Latitude','Longitude'],axis=1)
cl5_analyse.sum()

Commune                ChavannazChoisyFeigèresNeydensSaint-Julien-en-...
Sporting Goods Shop                                                    3
Population                                                         19695
Open water                                                             1
Confined water                                                         4
Rapids                                                                 0
dtype: object

In [40]:
cl6_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==5,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl6_analyse = cl6_analyse.drop(['Latitude','Longitude'],axis=1)
cl6_analyse.sum()

Commune                ArmoyChampangesLa ForclazLa VernazLyaudMarinNe...
Sporting Goods Shop                                                    2
Population                                                         60958
Open water                                                             4
Confined water                                                         2
Rapids                                                                 5
dtype: object

In [41]:
cl7_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==6,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl7_analyse = cl7_analyse.drop(['Latitude','Longitude'],axis=1)
cl7_analyse.sum()

Commune                Arthaz-Pont-Notre-DameBonneCranves-SalesEtauxF...
Sporting Goods Shop                                                    0
Population                                                         18598
Open water                                                             0
Confined water                                                         6
Rapids                                                                 0
dtype: object

In [42]:
cl8_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==7,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl8_analyse = cl8_analyse.drop(['Latitude','Longitude'],axis=1)
cl8_analyse.sum()

Commune                MorillonSamoënsTaningesVerchaix
Sporting Goods Shop                                  0
Population                                        7315
Open water                                           5
Confined water                                       0
Rapids                                               0
dtype: object

In [43]:
cl9_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==8,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl9_analyse = cl9_analyse.drop(['Latitude','Longitude'],axis=1)
cl9_analyse.sum()

Commune                Chêne-en-SemineMinzier
Sporting Goods Shop                         0
Population                               1519
Open water                                  1
Confined water                              1
Rapids                                      0
dtype: object

In [44]:
cl10_analyse = cl_analyse.loc[cl_analyse['Cluster Labels']==9,cl_analyse.columns[[1]+list(range(2,cl_analyse.shape[1]))]]
cl10_analyse = cl10_analyse.drop(['Latitude','Longitude'],axis=1)
cl10_analyse.sum()

Commune                AnnecyBluffyDingy-Saint-ClairDuingtLathuileMen...
Sporting Goods Shop                                                    0
Population                                                        146771
Open water                                                            30
Confined water                                                         0
Rapids                                                                 4
dtype: object

Let's combine this data in a unique table.

In [45]:
clusters_analyse = cl_analyse
clusters_analyse = clusters_analyse.drop(['Latitude','Longitude'],axis=1)
clusters_analyse = clusters_analyse.groupby('Cluster Labels').sum().reset_index()
clusters_analyse

Unnamed: 0,Cluster Labels,Sporting Goods Shop,Population,Open water,Confined water,Rapids
0,0,0,7039,3,1,0
1,1,0,2578,0,2,0
2,2,0,16628,12,1,0
3,3,4,46392,7,4,0
4,4,3,19695,1,4,0
5,5,2,60958,4,2,5
6,6,0,18598,0,6,0
7,7,0,7315,5,0,0
8,8,0,1519,1,1,0
9,9,0,146771,30,0,4


### Finding the optimal locations

Our criteria to define the optimal locations were :
- locations with water access in vicinity
- no sports store already established
- frequented places

So let's compare each cluster, that is to say each potential zone with these criteria.

In [47]:
clusters_analyse['Total number of water access']=clusters_analyse['Open water']+clusters_analyse['Confined water']+clusters_analyse['Rapids']
results = clusters_analyse.drop(['Open water','Confined water','Rapids'],axis=1)
results

Unnamed: 0,Cluster Labels,Sporting Goods Shop,Population,Total number of water access
0,0,0,7039,4
1,1,0,2578,2
2,2,0,16628,13
3,3,4,46392,11
4,4,3,19695,5
5,5,2,60958,11
6,6,0,18598,6
7,7,0,7315,5
8,8,0,1519,2
9,9,0,146771,34


It appears clearly that **cluster n°9** is the best candidate with the maximum number of water access, the largest population and no sport shop in the area.

## 4. Results <a name="results"></a>

Let's see on a map the cities of the optimal zone found.

In [49]:
optimal_cluster = pd.merge(cl10_analyse,communes_latlon,on='Commune')
optimal_cluster

Unnamed: 0,Commune,Sporting Goods Shop,Population,Open water,Confined water,Rapids,Latitude,Longitude
0,Annecy,0,126924,3,0,0,45.900002,6.11667
1,Bluffy,0,391,2,0,0,45.866669,6.21667
2,Dingy-Saint-Clair,0,1433,0,0,1,45.900002,6.21667
3,Duingt,0,972,5,0,1,45.833328,6.2
4,Lathuile,0,1016,1,0,0,45.783329,6.2
5,Menthon-Saint-Bernard,0,1889,6,0,1,45.849998,6.2
6,Saint-Jorioz,0,5738,2,0,0,45.833328,6.16667
7,Sevrier,0,4161,4,0,0,45.866669,6.13333
8,Talloires-Montmin,0,1996,2,0,1,45.849998,6.21667
9,Veyrier-du-Lac,0,2251,5,0,0,45.883331,6.16667


In [51]:
# create map
map_cl9= folium.Map(location=[cluster_centers[9][0],cluster_centers[9][1]], zoom_start=12)

for lat, lon, comm in zip(optimal_cluster['Latitude'], optimal_cluster['Longitude'], optimal_cluster['Commune']):
    label = folium.Popup(str(comm), parse_html=True)
    folium.CircleMarker([lat, lon], radius=5, popup=label, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_cl9)
    
map_cl9

**The best zone in Haute-Savoie to create a diving & watersport equipment store, according to our criteria, seems to be around the famous *'Lake of Annecy'*.**

## 5. Discussion <a name="discussion"></a>

Our analysis shows that although there is a great number of water accesses in Haute-Savoie, the best zone to create a new diving & watersport equipment store is **around the lake of Annecy**. It is where the 3 criteria of our research are clearly satisfied : 34 water access, no existing sport shop in the area and more than 145 000 inhabitants.

Nevertheless, we've also seen that water access could be of 3 different types : open water, confined water or rapids. These 3 types could involve specific sports equipment. So according to the preferential target aimed by the articles of the store, another analysis could be useful to define the most appropriate location in the defined area. 

Moreover the surroundings of the lake of Annecy is quite a big area and this study should be considered as **a starting point to a more detailed analysis** which could also consider **new criteria**, such as **real estate availability** in the selected areas and prices, **proximity to major roads**...

## 6. Conclusion <a name="conclusion"></a>

Purpose of this project was to identify an optimal location in Haute-Savoie to open a new diving & watersport equipment store. The criteria for this analysis where :
- water access in the vicinity
- no already existing sports store
- freqented place

**The surroundings of the Lake of Annecy was clearly identified as the best location.**

A list of the cities in this area was etablished with, for each of them, the number of water access separated in 3 different types which could be used as starting poins for final exploration by stakeholders.

Final decision on optimal location will be made by stakeholders based on specific characteristics of neighborhoods and locations in this recommended zone, taking into consideration additional factors like attractiveness of each location, proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood...