## **Pickup Place Solution**
data-science for delivery service  
Takeshi.Yagyu



### Introduction, Data Description
see [Here](https://go660088.wordpress.com/pickup-place-solution/)

### Methodology

#### Load library

In [0]:
# library to handle requests
import requests

# library for data analsysis
import pandas as pd
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# library to handle data in a vectorized manner
import numpy as np

# library for random number generation
import random 

# module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# module to calculate disance
from geopy.distance import geodesic

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

# !conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

# for print pretty
import pprint
pp = pprint.PrettyPrinter(indent=4)

import re

#### Select major shopping districts for this project. 
1.  Select major shopping districts in individual experience.
1.  Use geopy.geocoders.Nominatim to convert address to latitude and longitude.

In [2]:
# Use Nominatim to convert address to  latitude and longitude 
geolocator = Nominatim(user_agent="foursquare_agent")

# Selecting in my experience
centers = ['Ginza, Tokyo','Shinjuku, Tokyo','Akihabara, Tokyo', 
           'Shibuya, Tokyo','Odaiba, Tokyo'] 
major_area = []

# Get latitude & longitude
for address in centers:
  try:
    location = geolocator.geocode(address)
    major_area.append({'name': address,
                       'lat': location.latitude,
                       'lng': location.longitude})
  except:
    major_area.append({'name': address,'lat': None,'lng': None})

pp.pprint(major_area)

[   {'lat': 35.66951555, 'lng': 139.7643055988, 'name': 'Ginza, Tokyo'},
    {'lat': 35.6937632, 'lng': 139.7036319, 'name': 'Shinjuku, Tokyo'},
    {'lat': 35.6997364, 'lng': 139.7712503, 'name': 'Akihabara, Tokyo'},
    {'lat': 35.6645956, 'lng': 139.6987107, 'name': 'Shibuya, Tokyo'},
    {'lat': 35.61912805, 'lng': 139.779403349221, 'name': 'Odaiba, Tokyo'}]


In [3]:
shop_google_df = pd.read_csv('/content/drive/My Drive/shop_google_df_5columns.csv')
park_df = pd.read_csv('/content/drive/My Drive/park_df.csv')
shop_df = pd.read_csv('/content/drive/My Drive/shop_df.csv')

shop_google_df.head()
park_df.head()
shop_df[['name','lat','lng']].head()

park_df.head()

Unnamed: 0,Name,Address,Number,lat,lng
0,北の丸公園第三駐車場,東京都千代田区北の丸公園１番,20.0,35.691279,139.749237
1,丸ノ内鍛冶橋駐車場,東京都千代田区丸の内３丁目８番２号,22.0,35.677149,139.766105
2,靖国神社外苑駐車場,東京都千代田区九段北２丁目１番１号,23.0,35.694195,139.745354
3,市場橋駐車場（観光バス）,東京都中央区築地４丁目１５番２号,9.0,35.665995,139.768744
4,タイムズ晴海４丁目バスプール,東京都中央区晴海4-6,8.0,35.653477,139.780225


#### Search duty-free stores from google map
1.  Use google place nearby search api.
1.  The radius is set to 5km
1.  Filter 

In [5]:
# new dataframe
shop_google_df = pd.DataFrame()

# search from selected major disticts
for area in major_area:

  search = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json'

  # set radius to 5km, keyword to "duty free"
  # filer on "type: store"
  payload = {'location':  f'{area["lat"]},{area["lng"]}', 
             'radius': 5000,
             'keyword': 'duty free',
             'type': 'store', 
             'key': google_api_key,
             'language': 'en'
            }
  response = requests.get(search, params=payload).json()

  res = response['results']
  shop_google_df_tmp = json_normalize(data=res)

  # get the necessary information only
  columns = ['id', 'name',
             'geometry.location.lat', 
             'geometry.location.lng',
             'user_ratings_total', 
             'rating'] 
  shop_google_df_tmp = shop_google_df_tmp[columns]

  # shorten column name
  columns = ['id', 'name','lat', 'lng', 'user_ratings_total', 'rating']
  shop_google_df_tmp.columns = columns

  # concat searched results
  shop_google_df = pd.concat([shop_google_df, shop_google_df_tmp], sort=True)
  print(area['name'], shop_google_df_tmp.shape)

print('----------------')
print('Total ', shop_google_df.shape)

# merge search results with same id
shop_google_df = shop_google_df.groupby('id').first()

print('Merged ', shop_google_df.shape)

Ginza, Tokyo (18, 6)
Shinjuku, Tokyo (12, 6)
Akihabara, Tokyo (17, 6)
Shibuya, Tokyo (13, 6)
Odaiba, Tokyo (6, 6)
----------------
Total  (66, 6)
Merged  (34, 5)


#### Get the parking area infomation
1. Get list from https://www.s-park.jp/bus that is exclusively for motorcoach.
1. Convert address to latitude and longitude using google map api.

In [6]:
# pre-defined function
def get_coordinates(api_key, address, verbose=False):
    try:

        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']

        # get geographical coordinates
        geographical_data = results[0]['geometry']['location'] 
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]

    except:
        return [None, None]

# parking list  exclusively for motorcoach
park_df = pd.read_html("https://www.s-park.jp/bus")
park_df = park_df[0]
park_df.columns=['Name', 'Address', 'Telphone', 'Number']

# data wrangling
park_df = park_df[park_df['Name'] != park_df['Address']]
park_df.drop(columns='Telphone', axis=1, inplace=True)
park_df['Number'] = park_df['Number'].apply(lambda x: re.sub('\D','',x))

get_location = lambda x: get_coordinates(google_api_key, x)
park_df[['lat', 'lng']] = park_df['Address'].apply(get_location).apply(pd.Series)

park_df.head()

Unnamed: 0,Name,Address,Number,lat,lng
0,北の丸公園第三駐車場,東京都千代田区北の丸公園１番,20,35.691279,139.749237
1,丸ノ内鍛冶橋駐車場,東京都千代田区丸の内３丁目８番２号,22,35.677149,139.766106
2,靖国神社外苑駐車場,東京都千代田区九段北２丁目１番１号,23,35.694195,139.745354
4,市場橋駐車場（観光バス）,東京都中央区築地４丁目１５番２号,9,35.665995,139.768744
5,タイムズ晴海４丁目バスプール,東京都中央区晴海4-6,8,35.653477,139.780225


#### Search duty-free stores using FourSquare API

In [7]:
shop_df = pd.DataFrame()

# search fqursquare venues that like duty-free store
for area in major_area:

  search = 'https://api.foursquare.com/v2/venues/search'

  payload = {'client_id': CLIENT_ID, 
            'client_secret': CLIENT_SECRET, 
            'll': f'{area["lat"]},{area["lng"]}',
            'v': '20180604',
            'query': 'tax free',
            'radius': 5000,
            'limit': 1000
            }

  res = requests.get(search, params=payload).json()
  shop_df_tmp = json_normalize(data=res['response']['venues'])
  shop_df_tmp['categories'] = shop_df_tmp['categories'].apply(
                                lambda x: x[0]['name'] if x else 'UNKNOWN')
  
  # filter categories to store 
  shop_df_tmp = shop_df_tmp[shop_df_tmp['categories'].str.contains('Store')]

  shop_df = pd.concat([shop_df, shop_df_tmp], sort=False)
  print(area['name'], shop_df_tmp.shape)

print('-----------------------')
print('Total ', shop_df.shape)

shop_df = shop_df.groupby('id').first()
print('Merged ', shop_df.shape)

shop_df.columns = [col.split('.')[-1] for col in shop_df.columns]

columns = ['name', 'lat', 'lng']
shop_df = shop_df[columns]

Ginza, Tokyo (7, 18)
Shinjuku, Tokyo (8, 18)
Akihabara, Tokyo (9, 19)
Shibuya, Tokyo (10, 18)
Odaiba, Tokyo (0, 17)
-----------------------
Total  (34, 19)
Merged  (22, 18)


#### Merge search results from google map and FourSquare.

In [8]:
print('FourSquare Search Resullts: ', shop_df.shape)
print('Google Map Search Resullts: ', shop_google_df.shape)

# merge search results
shop_df = pd.concat([shop_df,shop_google_df], axis=0, sort=True)

shop_df.reset_index(inplace=True, drop=True)

# fill NaN
shop_df = shop_df.fillna(shop_df.median())

print('-----------------------------------------')
print('Merged duty-free stores   : ', shop_df.shape)

FourSquare Search Resullts:  (22, 3)
Google Map Search Resullts:  (34, 5)
-----------------------------------------
Merged duty-free stores   :  (56, 5)


In [0]:
shop_df.sort_values(by='user_ratings_total', ascending=False)

#### Data Visualization and Some Simple Statistical Analysis
1. Show parking areas on map - green circle
2. Show duty-free stores on map - blue circle

In [11]:
import folium

center = park_df.loc[0][['lat','lng']]
tokyo_map = folium.Map(location=center, zoom_start=12, width='75%', height='80%')

# show parking areas
dataset =  zip(park_df['lat'],  park_df['lng'], 
               park_df['Name'], park_df['Number'])
for lat, lng, name, number in dataset:
    label = '{}, {}'.format(name, number)
    label = folium.Popup(label, parse_html=True)
    # number = 1/ (1+ np.exp(number))
    folium.CircleMarker(
        [lat, lng],
        radius= (number//100 +1)*8,
        popup=label,
        color='green',
        fill=True,
        fill_color='#00ff00',
        fill_opacity=0.7,
        parse_html=False).add_to(tokyo_map)  

# show duty-free stores
dataset = zip(shop_df['lat'],  shop_df['lng'], 
              shop_df['name'], shop_df['rating'], 
              shop_df['user_ratings_total'])
for lat, lng, name, rating, users in dataset:
    label = str(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=(users//100 +1)*5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tokyo_map)  
    folium.Map()
tokyo_map

#### Clustering duty-free stores using K-Means

In [12]:
from sklearn.cluster import KMeans

# set n_clusters=5 to fit selected 5 major districts
k_means = KMeans(init='k-means++', n_clusters=5, n_init=20)
k_means.fit(shop_df[['lat', 'lng']])

# show results
kmeans_labels = k_means.labels_
pp.pprint(kmeans_labels)

# show cluster center that is used for calculating midpoint
k_means_cluster_centers = k_means.cluster_centers_
pp.pprint(k_means_cluster_centers)

array([4, 0, 0, 0, 2, 0, 3, 2, 3, 3, 2, 2, 0, 3, 4, 3, 0, 0, 4, 4, 4, 0,
       2, 2, 3, 4, 4, 4, 3, 2, 3, 2, 3, 4, 3, 4, 3, 1, 2, 3, 4, 4, 1, 2,
       0, 1, 4, 2, 3, 2, 2, 2, 3, 3, 3, 1], dtype=int32)
array([[ 35.65270632, 139.70183272],
       [ 35.63138323, 139.76443828],
       [ 35.70258741, 139.77491478],
       [ 35.69822311, 139.70469552],
       [ 35.67029902, 139.76328579]])


Show the shooping areas on map.

In [13]:
kc = k_means_cluster_centers

i = 0
for center in k_means_cluster_centers:
  
    r = np.sum(kmeans_labels == i) 
    folium.CircleMarker(
        center,
        radius= str(r),
        # popup=label,
        color='red',
        fill=True,
        fill_color='#ff0000',
        fill_opacity=0.7,
        parse_html=False).add_to(tokyo_map)  
    i += 1

tokyo_map

#### Calculate the midpoint between parking and shopping area

In [0]:
pickup = pd.DataFrame()

# calculate midpoing
midpoint = lambda a,b: ((a[0]+b[0])/2, (a[1]+b[1])/2)

# loop major shopping area
# for major, a in zip(major_area, k_means_cluster_centers):
for a in k_means_cluster_centers:
  dt, mp = [], []

  # loop parking
  for b in zip(park_df['lat'], park_df['lng']):
    dt.append(geodesic(a, b).km)
    mp.append(midpoint(a, b))

  park_df['distance'] = dt
  park_df['midpoint'] = mp

  # the closest parking lot
  x = park_df[park_df['distance'] == park_df['distance'].min(axis=0)]

  # create pickup place
  pickup = pd.concat([pickup,x], axis=0, sort=True)
  pickup.reset_index(inplace=True, drop=True)

In [15]:
pickup.head()

Unnamed: 0,Address,Name,Number,distance,lat,lng,midpoint
0,東京都渋谷区恵比寿４丁目２０番,恵比寿ガーデンプレイス駐車場,3.0,1.534341,35.642914,139.713796,"(35.64781015984074, 139.7078142116872)"
1,東京都江東区青海１丁目１番１６号,（バス対応）青海第二臨時駐車場,55.0,1.187093,35.625128,139.77507,"(35.628255612500006, 139.7697541375)"
2,東京都台東区蔵前２丁目１番８号,台東区蔵前臨時観光バス待機場,4.0,1.563145,35.701984,139.792171,"(35.702285705964805, 139.7835428924887)"
3,東京都新宿区歌舞伎町２丁目２０番２号,歌舞伎町観光バス駐車場,9.0,0.157624,35.696827,139.704374,"(35.69752500576264, 139.7045347585062)"
4,東京都中央区築地４丁目１５番２号,市場橋駐車場（観光バス）,9.0,0.68718,35.665995,139.768744,"(35.668147209107644, 139.76601484401695)"


In [16]:
import folium

# any location
center = park_df.loc[0][['lat','lng']]
tokyo_map = folium.Map(location=center, zoom_start=12, width='75%', height='75%')

# show parking areas
dataset =  zip(pickup['lat'],  pickup['lng'], 
               pickup['Name'], pickup['Number'])
for lat, lng, name, number in dataset:
    label = '{}, {}'.format(name, number)
    label = folium.Popup(label, parse_html=True)

    folium.CircleMarker(
        [lat, lng],
        radius= (number//100 +1)*8,
        popup=label,
        color='green',
        fill=True,
        fill_color='#00ff00',
        fill_opacity=0.7,
        parse_html=False).add_to(tokyo_map)  
        
i = 0
for center in k_means_cluster_centers:
    r = np.sum(kmeans_labels == i) 
    folium.CircleMarker(
        center,
        radius= str(r),
        # popup=label,
        color='red',
        fill=True,
        fill_color='#ff0000',
        fill_opacity=0.7,
        parse_html=False).add_to(tokyo_map)  
    i += 1

# show calculated midpoints
for place in pickup['midpoint']:
  folium.Marker(location=place).add_to(tokyo_map)

tokyo_map

### Results, Discuss, Conclusion

see [Here](https://go660088.wordpress.com/pickup-place-solution/)