# Get driving-distance(time) between two points

## INFO
- \[Encoding\] : `utf-8`
- \[Author\] : `yen-nan ho`
- \[Contact\] : `aaron1aaron2@gmail.com`
- \[GitHub\] : `https://github.com/aaron1aaron2`
- \[Create Date\] :  `20210912`

## Content
1. [資料讀取與準備](#資料讀取與準備)
2. [建立點之間的連結](#建立點之間的連結)
3. [爬取資料](#爬取資料)
4. [清理爬回來的資料](#清理爬回來的資料)
5. [選順暢時最快的路徑作為依據](#選順暢時最快的路徑作為依據)
6. [合併回雙向資料](#合併回雙向資料)

## 使用到的套件和 function 

In [66]:
import os 
import numpy as np
import pandas as pd
import sub_project.path_planning_tool as pathplanning
import sub_project.googlemap_2pointdistance_crawler as gcrawler

output_folder = 'data/process/3_driving_distance(time)'
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
    
def build_edge_list(df, output_folder, group_name, id_col, coordinate_col, name_col):
    '''依據 group_name 去針對各個 group 內建立 edge list，以此做回後續爬蟲所需爬取的兩點間關系'''
    pd.options.mode.chained_assignment = None
    if id_col not in df.columns:
        df.reset_index(drop=True, inplace=True)
        df[id_col] = df.index
    if group_name == None:
        tmp = df[[id_col, coordinate_col, name_col]]
        tmp['group'] = 0
        tmp.to_csv(os.path.join(output_folder, 'spot_list.csv'), index=False)
    else:
        df[[id_col, group_name, coordinate_col, name_col]].to_csv(os.path.join(output_folder, 'spot_list.csv'), index=False)
    # 注意: coordinate 的格式要 "緯度,經度" | 25.05051294275547,121.5106463282203
    distance_Table_helf = pathplanning.get_linear_distance(df.copy(), group=group_name, coor_col=coordinate_col, id_col=id_col) 
    distance_Table_helf.to_csv(os.path.join(output_folder, 'distance_table_half.csv'), index=None)

    distance_Table_all = pathplanning.to_whole_Htable(df.copy(), distance_Table_helf, coor_col=coordinate_col, id_col=id_col)
    distance_Table_all.to_csv(os.path.join(output_folder, 'distance_table_all.csv'), index=None)

## 資料讀取與準備 
[回首頁](#content)

In [13]:
spot_coordinate = pd.read_csv("data/process/1_popular_time/main.csv", usecols=['keyword', 'coordinate'])
keyword_spotname_match = pd.read_excel("data/raw/scenic_spot/spot_list(manual).xlsx", sheet_name='spot_googlemap_keyword',
                            engine='openpyxl', usecols=['遊憩據點', 'keyword', '縣市']
                            )
keyword_spotname_match.rename(columns={'遊憩據點': 'name', '縣市': 'county'}, inplace=True)

In [14]:
# 合併
keyword_spotname_match = keyword_spotname_match.merge(spot_coordinate, how='left')
keyword_spotname_match.drop('keyword', axis=1, inplace=True)

In [15]:
keyword_spotname_match.head()

Unnamed: 0,county,name,coordinate
0,花蓮縣,南安遊客中心,"23.3026374,121.2586523"
1,花蓮縣,太魯閣國家公園遊客中心,"24.1580764,121.6222377"
2,花蓮縣,布洛灣遊憩區,"24.17068,121.572612"
3,花蓮縣,臺八線沿線景觀區,"24.1938752,121.4907536"
4,花蓮縣,秀姑巒溪遊客中心,"23.4878928,121.4010781"


## 建立點之間的連結
[回首頁](#content)

需要爬取的路徑距離，會輸出以下資料:
- `distance_table_all.csv`: 每個 group_name 中的所有點雙向連結
- `distance_table_half.csv`: 每個 group_name 中的所有點單向連結
- `spot_list.csv` : 組合的景點列表與對應的編號(no)

In [16]:
build_edge_list(
            df = keyword_spotname_match,
            output_folder = output_folder,
            group_name = None, 
            id_col = 'no', 
            coordinate_col = 'coordinate',
            name_col = 'name'
            )

## 爬取資料
[回首頁](#content)

`distance_table_half.csv`可以省爬曲所有連線的時間，但只包含單向的資訊，如需要雙向資訊請使用 `distance_table_all.csv`

In [2]:
googlecrawler = gcrawler.crawler(
            input_data = os.path.join(output_folder, 'distance_table_half.csv'), 
            tor_path = 'tool/tor-win32-0.4.3.6/Tor/tor.exe', 
            tor_confs_path = os.path.join(output_folder, 'tor_config'), 
            core=1
            )

googlecrawler.run()



Current google-chrome version is 93.0.4577
Get LATEST driver version for 93.0.4577


target->528
check output file
leftover->528


Driver [C:\Users\aaron\.wdm\drivers\chromedriver\win32\93.0.4577.63\chromedriver.exe] found in cache
528it [17:36,  2.00s/it]


## 清理爬回來的資料
[回首頁](#content)

In [4]:
# 參數
group_name = None
crawler_data_type = 'half'

In [57]:
# 讀取與整理爬取資料
assert crawler_data_type in ['all', 'half']

crawler_result = pd.read_csv(os.path.join(output_folder, f'distance_table_{crawler_data_type}_result.csv'), dtype=str, header=None)
crawler_result.columns = ['coordinate_pair', 'route', 'original_str', 'item_type', 'initial_value', 'url']

In [10]:
crawler_result.head()

Unnamed: 0,coordinate_pair,route,original_str,item_type,initial_value,url
0,"23.3026374,121.2586523/24.1580764,121.6222377",途經花東縱谷公路/台9線,2 小時 24 分 預估行車時間：,trip duration,"['2 小時', '24 分 ']","https://www.google.com.tw/maps/dir/23.3026374,..."
1,"23.3026374,121.2586523/24.1580764,121.6222377",途經花東縱谷公路/台9線,預估抵達時間： 建議出發時間： 118 公里,distance,['118 公里'],"https://www.google.com.tw/maps/dir/23.3026374,..."
2,"23.3026374,121.2586523/24.1580764,121.6222377",途經花東縱谷公路/台9線,交通順暢時 2 小時 4 分,trip duration(Smooth),"['2 小時', '4 分 ']","https://www.google.com.tw/maps/dir/23.3026374,..."
3,"23.3026374,121.2586523/24.17068,121.572612",途經花東縱谷公路/台9線,2 小時 39 分 預估行車時間：,trip duration,"['2 小時', '39 分 ']","https://www.google.com.tw/maps/dir/23.3026374,..."
4,"23.3026374,121.2586523/24.17068,121.572612",途經花東縱谷公路/台9線,預估抵達時間： 建議出發時間： 126 公里,distance,['126 公里'],"https://www.google.com.tw/maps/dir/23.3026374,..."


In [58]:

crawler_result = pd.pivot_table(crawler_result, values='initial_value', index=['coordinate_pair', 'route'],
                    columns='item_type', aggfunc=np.min, fill_value=0)

In [26]:
crawler_result

Unnamed: 0_level_0,item_type,distance,trip duration,trip duration(Smooth),distance_google_value(km)
coordinate_pair,route,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"22.6874498,120.9912777/22.643066,120.9526277",途經南迴公路/台9線,['30.6 公里'],"['1 小時', '2 分 ']",['59 分 '],30.6
"22.6874498,120.9912777/22.643066,120.9526277",途經南迴公路/台9線和佳崙產業道路,['34.4 公里'],"['1 小時', '6 分 ']",['1 小時'],34.4
"22.6874498,120.9912777/22.643066,120.9526277",途經樂山產業道路,['19.6 公里'],"['1 小時', '9 分 ']","['1 小時', '9 分 ']",19.6
"22.6874498,120.9912777/22.760712,121.091361",途經南迴公路/台9線,['19.0 公里'],['26 分 '],['22 分 '],19.0
"22.6874498,120.9912777/22.760712,121.091361",途經大學路/大學路三段/東57鄉道,['23.1 公里'],['34 分 '],['28 分 '],23.1
...,...,...,...,...,...
"24.1938752,121.4907536/23.9301319,121.5067062",途經中部橫貫公路/台8線,['56.0 公里'],"['1 小時', '20 分 ']","['1 小時', '9 分 ']",56.0
"24.1938752,121.4907536/23.9736882,121.5647145",途經中部橫貫公路/台8線和蘇花公路/台9線,['44.6 公里'],"['1 小時', '5 分 ']",['55 分 '],44.6
"24.1938752,121.4907536/23.9895973,121.6283569",途經中部橫貫公路/台8線和193縣道,['42.5 公里'],['58 分 '],['52 分 '],42.5
"24.1938752,121.4907536/23.9918905,121.6018901",途經中部橫貫公路/台8線和蘇花公路/台9線,['41.1 公里'],['59 分 '],['49 分 '],41.1


In [59]:
crawler_result['distance_google_value(km)'] = crawler_result['distance'].str.extract("(\d+.*\d+)").astype(float)
crawler_result['distance_google_value(km)'] = crawler_result.apply(
    lambda x: x['distance_google_value(km)'] if x['distance'].find('公尺')==-1 else x['distance_google_value(km)']/1000, axis=1)

In [60]:
hour = crawler_result['trip duration'].str.extract("(\d+) 小時").fillna(0).astype(int)[0].to_list()
minute = crawler_result['trip duration'].str.extract("(\d+) 分").fillna(0).astype(int)[0].to_list()

crawler_result['time_value(min)'] = list(map(lambda x: x[0]*60 + x[1] , zip(hour, minute)))

In [61]:
hour = crawler_result['trip duration(Smooth)'].str.extract("(\d+) 小時").fillna(0).astype(int)[0].to_list()
minute = crawler_result['trip duration(Smooth)'].str.extract("(\d+) 分").fillna(0).astype(int)[0].to_list()

crawler_result['smooth_time_value(min)'] = list(map(lambda x: x[0]*60 + x[1] , zip(hour, minute)))

In [65]:
crawler_result.to_csv(os.path.join(output_folder, 'crawler_result_multi-route.csv'))

In [63]:
crawler_result.head()

Unnamed: 0_level_0,item_type,distance,trip duration,trip duration(Smooth),distance_google_value(km),time_value(min),smooth_time_value(min)
coordinate_pair,route,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"22.6874498,120.9912777/22.643066,120.9526277",途經南迴公路/台9線,['30.6 公里'],"['1 小時', '2 分 ']",['59 分 '],30.6,62,59
"22.6874498,120.9912777/22.643066,120.9526277",途經南迴公路/台9線和佳崙產業道路,['34.4 公里'],"['1 小時', '6 分 ']",['1 小時'],34.4,66,60
"22.6874498,120.9912777/22.643066,120.9526277",途經樂山產業道路,['19.6 公里'],"['1 小時', '9 分 ']","['1 小時', '9 分 ']",19.6,69,69
"22.6874498,120.9912777/22.760712,121.091361",途經南迴公路/台9線,['19.0 公里'],['26 分 '],['22 分 '],19.0,26,22
"22.6874498,120.9912777/22.760712,121.091361",途經大學路/大學路三段/東57鄉道,['23.1 公里'],['34 分 '],['28 分 '],23.1,34,28


## 選順暢時最快的路徑作為依據
[回首頁](#content)

In [74]:
crawler_result.reset_index(inplace=True)

In [80]:
crawler_result_select = crawler_result.loc[crawler_result.groupby(['coordinate_pair'])['smooth_time_value(min)'].idxmax().values]
crawler_result_select.to_csv(os.path.join(output_folder, 'crawler_result_route.csv'), index=False)

In [81]:
crawler_result_select.head()

item_type,coordinate_pair,route,distance,trip duration,trip duration(Smooth),distance_google_value(km),time_value(min),smooth_time_value(min)
2,"22.6874498,120.9912777/22.643066,120.9526277",途經樂山產業道路,['19.6 公里'],"['1 小時', '9 分 ']","['1 小時', '9 分 ']",19.6,69,69
5,"22.6874498,120.9912777/22.760712,121.091361",途經花東海岸公路/台11線,['20.4 公里'],['34 分 '],['29 分 '],20.4,34,29
8,"22.6874498,120.9912777/22.791471,121.119824",途經花東海岸公路/台11線和馬亨亨大道,['27.0 公里'],['41 分 '],['34 分 '],27.0,41,34
9,"22.6874498,120.9912777/22.8530542,121.1039256",途經南迴公路/台9線,['29.4 公里'],['40 分 '],['35 分 '],29.4,40,35
10,"22.6874498,120.9912777/22.8658194,121.1081636",途經南迴公路/台9線,['30.5 公里'],['41 分 '],['36 分 '],30.5,41,36


## 合併回雙向資料
[回首頁](#content)

In [82]:
crawler_result_select[['start_coordinate', 'end_coordinate']] = crawler_result_select['coordinate_pair'].str.split("/",expand=True)


In [83]:
distance_Table_all = pd.read_csv(os.path.join(output_folder, 'distance_table_all.csv'), dtype=str)

In [109]:
crawler_result_expend = pathplanning.optimize_distance(
            crawler_result_select,
            distance_Table_all[['start_id', 'end_id', 'start_coordinate', 'end_coordinate', 'linear_distance']],
            start_coor='start_coordinate',
            end_coor='end_coordinate', 
            )

In [110]:
crawler_result_expend.drop('coordinate_pair', axis=1, inplace=True)
crawler_result_expend.fillna(0, inplace=True)

In [111]:
df = pd.read_csv(os.path.join(output_folder, 'spot_list.csv'), usecols=['no', 'name'], dtype=str)
df.columns = ['start_id','start_name']
crawler_result_expend = crawler_result_expend.merge(df, how='left', on=['start_id'])
df.columns = ['end_id','end_name']
crawler_result_expend = crawler_result_expend.merge(df, how='left', on=['end_id'])

In [117]:
order = [
    'start_id', 'end_id', 'start_name', 'end_name', 'start_coordinate',	'end_coordinate', 
    'route', 'linear_distance', 'distance', 'distance_google_value(km)', 
    'trip duration(Smooth)', 'smooth_time_value(min)', 'trip duration', 'time_value(min)', 
]
crawler_result_expend[order].to_csv(os.path.join(output_folder, 'crawler_result_2PointDistance.csv'), index=False)
crawler_result_expend[order].to_csv('data/final/3_2PointDistance.csv', index=False)

In [115]:
crawler_result_expend[order].head()

Unnamed: 0,start_id,end_id,start_name,end_name,start_coordinate,end_coordinate,route,linear_distance,distance,distance_google_value(km),smooth_time_value(min),time_value(min),trip duration,trip duration(Smooth)
0,0,1,南安遊客中心,太魯閣國家公園遊客中心,"23.3026374,121.2586523","24.1580764,121.6222377",途經花東縱谷公路/台9線,101.7381491916042,['118 公里'],118.0,124.0,144.0,"['2 小時', '24 分 ']","['2 小時', '4 分 ']"
1,0,2,南安遊客中心,布洛灣遊憩區,"23.3026374,121.2586523","24.17068,121.572612",途經花東縱谷公路/台9線,101.32851169626572,['126 公里'],126.0,136.0,159.0,"['2 小時', '39 分 ']","['2 小時', '16 分 ']"
2,0,3,南安遊客中心,臺八線沿線景觀區,"23.3026374,121.2586523","24.1938752,121.4907536",途經花東縱谷公路/台9線,101.50517148419816,['136 公里'],136.0,147.0,173.0,"['2 小時', '53 分 ']","['2 小時', '27 分 ']"
3,0,4,南安遊客中心,秀姑巒溪遊客中心,"23.3026374,121.2586523","23.4878928,121.4010781",途經花東縱谷公路/台9線,25.15769117466058,['34.4 公里'],34.4,37.0,43.0,['43 分 '],['37 分 ']
4,0,5,南安遊客中心,石梯坪,"23.3026374,121.2586523","23.4886832,121.5137917",途經花東縱谷公路/台9線和瑞港產業道路/花64鄉道,33.23771673852799,['62.2 公里'],62.2,81.0,87.0,"['1 小時', '27 分 ']","['1 小時', '21 分 ']"
