# Report for the Applied Data Science Capstone project  
by Andreas Johannes

<a id='top'></a>  

# Salvation of a Rosesalesman

[1 Background](#background)  
[2 Data sources and treatment](#data)  
[3 Methodology](#methods)  
 - [3.1 Paris map](#map)  
 - [3.2 Foursquare data](#foursquare)  
 - [3.3 Heat map](#heat)  
 - [3.4 k-means](#kmeans)
 
[4 Results](#results)  
[5 Discussion](#discussion)  
[6 Conclusion](#conclusion)  

<a id='background'></a>  
## 1 Background  
[back to top](#top)  

![Be this guy!](https://thumbs.dreamstime.com/z/money-7661988.jpg)

### Sell your Roses  here (or rather there)!

Parisean Rose seller, this could be you!
Whether you are selling Roses to couples or playing your Fiddle for tips, you want to know where the most restaurants and bars are, because that's were the most money can be made. Read on for an depth analysis of where to go tonight to ply your trade.

PLUS if you know you made money in one area, use our similarity rating to find similar areas for your next nights work!

![make the machiene work for you](https://thumbs.dreamstime.com/z/human-hand-receiving-rose-artificial-hand-senior-87445418.jpg)



### Summary:  
To find the best areas to sell Roses on the street:
 - Grade areas in Paris according to how many restaurants and bars there are in them
  - Show this data on a map of Paris
  - By restaurant category/type
 - Find locations which offer similar night life options
  - generally categorize areas
  - given a starting address, find similar areas  
Probably this analysis may be useful outside of the rose-selling market, but that's a future venture.

<a id='data'></a>  
## 2 Data sources and treatment  
[back to top](#top)  
### The heat-map   
 - We will segment Paris into evenly sized tiles
 - Use **Foursquare** to obtain a count for the restaurants and bars in each tile.
 - Categorize Restaurants and bars in 4-8 categories (eg. Bar, club, Fast Food etc.)
 - Use **folium** to plot heat-map tiles onto a map of Paris for each category
 - Sort by number of found places to suggest best areas.

### Similar areas  
Use above categories to find areas that are similar:  
 - Inspect distribution to see how many area categories are sensible
 - use k-means to group this number of categories
 - Plot to map of Paris
 - given a location, use generalized distance across features (as used in k-means algorithm) to produce a sorted list of areas similar to the current location.


 

<a id='methods'></a>  
## 3 Methodology
[back to top](#top)

In this section we will execute the strategy outlined in the previous section.

<a id='map'></a>  
### 3.1 Paris map  
[back to top](#top)

In [21]:
import numpy as np
import pandas as pd
import folium


Create a regular hexagonal grid around Paris. We wil use cube coordinates centered around the center of paris accordintg to [wiki: paris](https://en.wikipedia.org/wiki/Paris). The tiles will be spaced 200 m appart and we will have 50 tiles in each direction. This covers the center of Paris quite well and should have sufficient resolution.
See [https://www.redblobgames.com/grids/hexagons/] for an introduction to hexagonal coordinates.

In [33]:
# get a 3D grid from with 2*tile_count + 1 number of tiles across
tile_count = tc = 20
p_range, q_range, r_range = range(-tc,tc+1),range(-tc,tc+1),range(-tc,tc+1)
r_i, q_i, p_i = np.meshgrid(p_range, q_range, r_range)
pqr_i = np.stack([p_i.flat, q_i.flat, r_i.flat])
# reduce grid to include only the indexes on our hexagonal plane
hex_mask = pqr_i.sum(axis=0)==0
hex_mask
#xyz_hex = np.empty(shape=(3,hex_mask.sum()),dtype=np.int32)
pqr_hex = pqr_i[:,hex_mask]
pqr_hex.dtype, pqr_hex.T.shape

(dtype('int32'), (1261, 3))

We have an index grid, not to convert it into geospacial coordinates. We want the spacing to be tile_size, and first need to convert that to angular distances. We will only cover a small segment of the sperical earth and use the apropriate simplifiations.
see [wiki: geographic coordinates](https://en.wikipedia.org/wiki/Geographic_coordinate_system)

In [34]:
tile_size = ts = 200. # m
earth_radius =  6367449
center_of_paris = (48.8567, 2.3508)
# in angle per meter
lat_conversion = 360./(np.pi*earth_radius)
lon_conversion = 180/(np.pi*earth_radius)*np.cos(np.pi/180.0*center_of_paris[0])
lat_conversion, lon_conversion

# defining vectors to get form the center of the hex to corner points in angles
h = 1.5*ts*lon_conversion
v = 0.75*ts*lat_conversion
s60 = np.sin(60./180.*np.pi)
c60 = np.cos(60./180.*np.pi)
x_step = (v, 0)
y_step = (-v*c60, h*s60)
z_step = (-v*c60, -h*s60)
step_vector = np.asarray((x_step, y_step, z_step)).T

def get_corners(step_vector, center):
    '''
    returns the list of coordinates for the corners of a hexagon defined by
    the hexagonal step vector and a center point
    ''' 
    coordinates = []
    perms = [[1,0,0],
             [0,0,-1],
             [0,1,0],
             [-1,0,0],
             [0,0,1],
             [0,-1,0]]
             
    for perm in perms:
        coordinates.append(list(center + np.dot(step_vector,perm)))
    return coordinates

We have all we need to create the hexagonal grid mapped over Paris.

In [35]:
# usefull library to create geojson files
# https://github.com/karimbahgat/PyGeoj
import pygeoj
# creating regular tiles around city center
json_tiles = pygeoj.new()
json_tiles_fname = "tiles.geojson"
coords_str_list = []
center_list = []
p_list = []
q_list = []
r_list = []
for coords in pqr_hex.T:
    # create a geojson file
    coords_str=('_').join([str(x) for x in coords])
    coords_str_list.append(coords_str)
    p_list.append(coords[0])
    q_list.append(coords[1])
    r_list.append(coords[2])
    
    center = center_of_paris[::-1] + np.dot(step_vector, coords)
    center_list.append(center)
    coordinates = get_corners(step_vector, center)
    json_tiles.add_feature(
        properties={"coords_str":"coords_str"},
        geometry={"type":"Polygon", "coordinates":[coordinates]})

json_tiles.add_all_bboxes()
json_tiles.update_bbox()
json_tiles.add_unique_id()
json_tiles.save(json_tiles_fname)
coordinates

[[2.272515386842703, 48.88746297488709],
 [2.2711656521330945, 48.889001123631445],
 [2.2684661827138775, 48.889001123631445],
 [2.267116448004269, 48.88746297488709],
 [2.2684661827138775, 48.88592482614274],
 [2.2711656521330945, 48.88592482614274]]

In [53]:
# create a corresponding dataframe:
center_array = np.asarray(center_list)
df_tiles = pd.DataFrame({'coords_str':coords_str_list, 
                         'lat':center_array[:,1],
                         'lon':center_array[:,0]})
latdist_array = (np.asarray(df_tiles.lat)-center_of_paris[0])/lat_conversion
londist_array = (np.asarray(df_tiles.lon)-center_of_paris[1])/lon_conversion
df_tiles['distance_to_center'] = np.asarray(np.sqrt(latdist_array**2 + londist_array**2),
                                            dtype=np.int32)
df_tiles['p'] = p_list
df_tiles['q'] = q_list
df_tiles['r'] = r_list

In [54]:
map_paris = folium.Map(location=center_of_paris, zoom_start=12)
test_df = pd.DataFrame({'Paris':1}, columns=['City','Value'])
# Add the color for the chloropleth:
folium.Choropleth(
    geo_data=json_tiles_fname,
    name='choropleth',
    data=df_tiles,
    fill_color='Blues',
    columns=['coords_str', 'distance_to_center'],
    key_on='feature.properties.coords_str',
    fill_opacity=0.5, 
    line_opacity=0.1,
    legend_name='Distance to Center',   
).add_to(map_paris)


map_paris

In [56]:
df_tiles.distance_to_center.min()

0

<a id='foursquare'></a>  
### 3.2 Foursquare data  
[back to top](#top)

Noe that we have the grid on which we want to check for locations, lets use foursquare to find them. We will immedeately collect different restaurant types seperately for later.
[see foursquare:categories](https://developer.foursquare.com/docs/resources/categories) 

In [64]:
# not sharing foursquare credentials
with open('../../foursquare_credentials.dat','r') as f:
    client_id, client_secret = f.readlines()
client_id = client_id[:-1]

In [None]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues
nightlife_category = '4d4b7105d754a06376d81259'# 'Root' category for all nightlife venues
# other categories:
categories_dict = {}
categories_dict.update({'other':['503288ae91d4c4b30a586d67',
                                 '4bf58dd8d48988d1c8941735',
                                 '4bf58dd8d48988d14e941735',
                                 '4bf58dd8d48988d169941735',
                                 '52e81612bcbc57f1066b7a01',
                                 '4bf58dd8d48988d1df931735',
                                 '52e81612bcbc57f1066b79f4',
                                 '4bf58dd8d48988d17a941735',
                                 '4bf58dd8d48988d144941735',
                                 '4bf58dd8d48988d108941735',
                                 '4bf58dd8d48988d120951735',
                                 '4bf58dd8d48988d1be941735',
                                 '4bf58dd8d48988d1c1941735',
                                 '56aa371be4b08b9a8d573508',
                                 '4bf58dd8d48988d1c4941735',
                                 '4bf58dd8d48988d1ce941735',
                                 '4bf58dd8d48988d1cc941735',
                                 '4bf58dd8d48988d1dc931735',
                                 '56aa371be4b08b9a8d573538'],                                 
                        'sweet':['4bf58dd8d48988d146941735',
                                 '52e81612bcbc57f1066b79f2',
                                 '4bf58dd8d48988d1d0941735',
                                 '4bf58dd8d48988d148941735'],
                                 
                        'european':['52f2ae52bcbc57f1066b8b81',
                                    '5293a7d53cf9994f4e043a45',
                                    '4bf58dd8d48988d147941735',
                                    '5744ccdfe4b0c0459246b4d0',
                                    '4bf58dd8d48988d109941735',
                                    '52e81612bcbc57f1066b7a05',
                                    '52e81612bcbc57f1066b7a09',
                                    '4bf58dd8d48988d10c941735',
                                    '52e81612bcbc57f1066b79fa',
                                    '4bf58dd8d48988d110941735',
                                    '52e81612bcbc57f1066b79fd',
                                    '4bf58dd8d48988d1c0941735',
                                    '52e81612bcbc57f1066b79f9',
                                    '4bf58dd8d48988d1c2941735',
                                    '52e81612bcbc57f1066b7a04',
                                    '4def73e84765ae376e57713a',
                                    '5293a7563cf9994f4e043a44',
                                    '4bf58dd8d48988d1c6941735',
                                    '5744ccdde4b0c0459246b4a3',
                                    '56aa371be4b08b9a8d57355a',
                                    '4bf58dd8d48988d150941735',
                                    '4bf58dd8d48988d158941735',
                                    '4f04af1f2fb6e1c99f3db0bb',
                                    '52e928d0bcbc57f1066b7e96'],
                        'asian':['4bf58dd8d48988d142941735',
                                 '4bf58dd8d48988d10f941735',
                                 '4bf58dd8d48988d115941735'
                                 '52e81612bcbc57f1066b79f8'
                                 '5413605de4b0ae91d18581a9'],
                        'fast':['4bf58dd8d48988d179941735',
                                '4bf58dd8d48988d16a941735',
                                '52e81612bcbc57f1066b7a02',
                                '52e81612bcbc57f1066b79f1',
                                '4bf58dd8d48988d143941735',
                                '52e81612bcbc57f1066b7a0c',
                                '4bf58dd8d48988d16c941735',
                                '4bf58dd8d48988d128941735',
                                '4bf58dd8d48988d16d941735',
                                '4bf58dd8d48988d1e0931735',
                                '52e81612bcbc57f1066b7a00',
                                '4bf58dd8d48988d10b941735',
                                '4bf58dd8d48988d16e941735',
                                '4edd64a0c7ddd24ca188df1a',
                                '56aa371be4b08b9a8d57350b',
                                '4bf58dd8d48988d1cb941735',
                                '4d4ae6fc7a7b7dea34424761',
                                '5283c7b4e4b094cb91ec88d7',
                                '4bf58dd8d48988d1ca941735',
                                '4bf58dd8d48988d1c5941735',
                                '4bf58dd8d48988d1bd941735',
                                '4bf58dd8d48988d1c7941735',


In [None]:
def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

<a id='heat'></a>  
### 3.3 Heat map  
[back to top](#top)

<a id='kmeans'></a>  
### 3.4 k-means  
[back to top](#top)

<a id='results'></a>  
## 4 Results  
[back to top](#top)

<a id='discussion'></a>  
## 5 Discussion  
[back to top](#top)

<a id='conclusion'></a>  
## 6 Conclusion  
[back to top](#top)