## 正式赛题——船运到达时间预测
在企业全球化业务体系中，海运物流作为其最重要的一项支撑。其中，船运公司会和数据供应公司进行合作，对运输用的船通过GPS进行定位以监控船的位置；在运输管理的过程中，货物到达目的港的时间是非常重要的一项数据，那么需要通过船运的历史数据构建模型，对目的港到达时间进行预测，预测时间简称为ETA（estimated time of arrival），目的港到达时间预测为ARRIVAL_ETA。
本次大赛提供历史运单GPS数据、历史运单事件数据、港口坐标数据，**预测货物运单的到达时间，对应“历史运单事件”数据中EVENT_CODE字段值为ARRIVAL AT PORT时EVENT_CONVOLUTION_DATE的时间值**。

一、比赛数据
大赛提供脱敏后的训练数据及测试数据，训练数据集包括：历史运单GPS数据、历史运单事件数据、港口坐标数据，这些数据主要用于参赛队伍训练模型，制定预估策略；测试运单数据为不同运单、运输过程中的不同位置所构成，供选手测试对应的ETA时间。
货物运单在船运过程中，会产生大量的GPS运单数据，记录为“历史运单GPS数据”；货物运单在船运过程中离开起运港、到达中转港、到达目的港等关键事件，记录为“历史运单事件数据”；“港口的坐标数据“为与运单船运相关的港口坐标信息。
允许选手合理增加与题目相关的外部数据进行纠正，如大赛提供的港口坐标数据存在偏差时可自行补充数据纠正

In [1]:
import pandas as pd
import numpy as np
from geopy.distance import geodesic


## 坐标数据描述每个运单在船运的过程中涉及的港口位置信息:
- 港口名称
- 港口的经度坐标
- 港口的纬度坐标
- 国家
- 省|州
- 城市
- 县|区
- 详细地址。
- 港口编码，即港口的字母简码

In [2]:
#港口坐标数据
#港口坐标数据描述每个运单在船运的过程中涉及的港口位置信息。
port=pd.read_csv('./event_port/port.csv')

In [3]:
port.describe()

Unnamed: 0,LONGITUDE,LATITUDE,PORT_CODE,TRANSPORT_NODE_ID
count,2456.0,2456.0,25.0,2420.0
mean,16.518344,18.550666,34586400.0,54778940.0
std,71.399595,22.590817,73842610.0,79740480.0
min,-175.183605,-53.793502,2462000.0,117000.0
25%,-3.457919,0.611022,2576000.0,2927750.0
50%,7.676975,18.432586,3519000.0,3553500.0
75%,57.510925,37.748396,3684000.0,95201500.0
max,178.557549,73.212735,211827000.0,284217000.0


###  历史运单事件数据. 历史运单事件数据描述每个运单在船运的过程中，与港口相关的关键信息，如离开起运港、到达目的港等
- loadingOrder 运单号，与历史运单GPS数据中的loadingOrder字段一致
- EVENT_CODE 事件编码，主要事件包括：
 - TRANSIT PORT ATD实际离开中转港
 - SHIPMENT ONBOARD DATE实际离开起运港
 - TRANSIT PORT ATA实际到达中转港
 - ARRIVAL AT PORT实际到达目的港
 - 注：部分船可能没有中转港
- EVENT_LOCATION_ID 港口名称，对应“港口坐标数据”表中的字段TRANS_NODE_NAME
- EVENT_CONVOLUTION_DATE  事件发生的时间，格式为：yyyy/MM/dd HH:mm:ss（dd与HH之间为两个空格）例如Event_code为“SHIPMENT ONBOARD DATE"时，此字段表示船从起运港出发的时间。
- EVENT_CODE为“ARRIVAL AT PORT"时，此字段表示船到达目的港的时间。

### 历史运单GPS数据
#### 历史运单GPS数据描述每个运单在船运的过程中，所在船产生的GPS位置的相关信息。
- loadingOrder 脱敏后的主运单，货物的运单编号，类似快递单号
- carrierName 脱敏后的承运商名称，类似快递公司名称
- timestamp 时间，格式为：yyyy-MM-dd'T'HH:mm:ss.SSSZ，如2019-09-05T16:33:17.000Z
- longitude 货物在运输过程中，当前船舶所处的经度坐标，如114.234567
- latitude 货物在运输过程中，当前船舶所处的纬度坐标，如21.234567
- vesselMMSI 脱敏后的船舶海上移动业务识别码MMSI， 唯一标识，对应到每一艘船
- speed 单位km/h，货物在运输过程中，当前船舶的瞬时速度，部分数据未提供的可自行计算。
- direction 当前船舶的行驶方向，正北是0度，31480代表西北方向314.80度，900代表正北偏东9度。
- vesselNextport 船舶将要到达的下一港口，港口名称可能不规范，如CNQIN、CN QIN、CN QINGDAO都代表下一站为中国青岛港口。
- vesselNextportETA 船运公司给出的到“下一个港口”预计到达时间，格式为：yyyy-MM-dd'T'HH:mm:ss.SSSZ，如2019-09-12T16:33:17.000Z
- vesselStatus 当前船舶航行状态，主要包括：
  - moored
  - under way using engine
  - not under command
  - at anchor
  - under way sailing
  - constrained by her draught
- vesselDatasource 船舶数据来源（岸基/卫星）：Coastal AIS，Satellite
- TRANSPORT_TRACE  船的路由，由“-”连接组成，例如CNSHK-MYPKG-MYTPP。由承运商预先录入，实际小概率存在不按此路由行驶（如遇塞港时），但最终会到达目的港口。

### 注意!!: 一个运单对应一艘船，一艘船可以对应多个运单

In [4]:
gpsdfx=pd.read_csv('train0711.csv',usecols=[0,1,5])
gpsdfx.columns =['loadingOrder','carrierName', 'vesselMMSI']
print(gpsdfx.shape)
gpsdfx.drop_duplicates(inplace=True)
gpsdfx=gpsdfx.reset_index(drop=True)
print(gpsdfx.shape)

(151948653, 3)
(25132, 3)


In [9]:
gpsdfx['carrierName'].isna().sum(),gpsdfx['vesselMMSI'].isna().sum()
gpsdfx.to_csv('carrierName_vesselMMSI.csv',index=False)

In [5]:
#清洗数据:删除没有trace的运单，有中间trace的运单以及重复的运单
#读取所有行
gpsdfx=pd.read_csv('train0711.csv',usecols=[0,2,3,4,6,7,12])
gpsdfx.columns =['loadingOrder','timestamp', 'longitude','latitude','speed', 'direction','TRANSPORT_TRACE']
gpsdfx.shape
#将有空值的行删除
gpsdfx.dropna(inplace=True)
gpsdfx=gpsdfx.reset_index(drop=True)
'''
#删除有中间港口的运单
traces=gpsdfx['TRANSPORT_TRACE'].values
skips=[]
for i in range(len(traces)):
    if i%20000000==0:
        print(i)
    if len(traces[i].split('-'))>2:
            skips.append(i)
raw=[i for i in range(len(gpsdfx))]
reserve=list(set(raw)-set(skips))
clean_df=gpsdfx.loc[reserve]
clean_df=clean_df.reset_index(drop=True)
'''
#删除重复行
clean_df=gpsdfx
re=clean_df.duplicated()
re=list(re)
skip_dup=[]
for i in range(len(re)):
    if re[i]==False:
        skip_dup.append(i)
clean_df=clean_df.loc[skip_dup]
clean_df=clean_df.reset_index(drop=True)
clean_df.to_csv('GPS_clean712v1.csv',index=False)#写入清洗后的数据，
#对于trace数据，还需要进一步的清洗
trace=clean_df['TRANSPORT_TRACE'].unique()
reserve_trace=[]
for i in trace:
    if '-'in i:
        reserve_trace.append(i)
#
reserve_index=[]
df_trace=clean_df['TRANSPORT_TRACE'].values
for index in range(len(df_trace)):
    if df_trace[index] in reserve_trace:
        reserve_index.append(index)
clean_df=clean_df.loc[reserve_index].reset_index(drop=True)
clean_df.to_csv('clean_dataset/GPS_clean712v2.csv',index=False)#写入清洗后的数据大约3.7kw条数据

In [8]:
#进一步和port数据进行关联
clean_df=pd.read_csv('clean_dataset/GPS_clean712v2.csv')
clean_df.sort_values(['loadingOrder', 'timestamp'], inplace = True)
clean_df=clean_df.reset_index(drop=True)
clean_df
cut=[]
temp_dic=[]
portnames=port['TRANS_NODE_NAME'].values
grouped=clean_df.groupby('loadingOrder')
for name,group in grouped:
    port_name=group['TRANSPORT_TRACE'][:1].reset_index(drop=True)[0]
    s_e=port_name.split('-')
    if s_e[0] in portnames and s_e[-1] in portnames:
        continue
    cut.append(name)
clean_df=clean_df[~clean_df['loadingOrder'].isin(cut)]
#
def convert_name_xy(name):#输入港口名称
    port_name=port[port['TRANS_NODE_NAME'].isin([name])].reset_index()
    return port_name['LONGITUDE'][0],port_name['LATITUDE'][0]#返回港口经纬度
#  
start_x=[]#起点
start_y=[]#起点
end_x=[]#终点
end_y=[]#终点
#存储中间结果,避免重复计算
temp_dic={}
values=clean_df['TRANSPORT_TRACE'].values
for value in values:
    s_e=value.split('-')
    start_port=s_e[0]
    end_port=s_e[-1]
    if start_port in temp_dic:
        re=temp_dic[start_port]
    else:
        re=convert_name_xy(value.split('-')[0])
        temp_dic[start_port]=re
    start_x.append(re[0])
    start_y.append(re[1])
    if end_port in temp_dic:
        re=temp_dic[end_port]
    else:
        re=convert_name_xy(value.split('-')[-1])
        temp_dic[end_port]=re
    end_x.append(re[0])
    end_y.append(re[1])
clean_df['start_x']=start_x
clean_df['start_y']=start_y
clean_df['end_x']=end_x
clean_df['end_y']=end_y
#clean_df.to_csv('clean_dataset/GPS_clean625v1.csv',index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

In [19]:
clean_df.to_csv('clean_dataset/dataHasXY.csv',index=False)

In [63]:
geodesic((3.013783	,101.355750	),(3.034709,101.361204)).km

2.3920541380094607

In [15]:
clean_df[clean_df['loadingOrder'].isin(['AA191175561416'])]

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,TRANSPORT_TRACE,start_x,start_y,end_x,end_y
0,AA191175561416,2019-01-28T16:12:59.000Z,114.260392,22.571047,0,12670,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
1,AA191175561416,2019-01-28T16:22:38.000Z,114.260438,22.571125,0,14790,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
2,AA191175561416,2019-01-28T16:30:55.000Z,114.260693,22.571567,0,21510,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
3,AA191175561416,2019-01-28T16:37:35.000Z,114.260392,22.571463,0,19900,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
4,AA191175561416,2019-01-28T16:45:56.000Z,114.260647,22.571510,0,21360,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
...,...,...,...,...,...,...,...,...,...,...,...
5193,AA191175561416,2019-02-23T15:51:55.000Z,-104.778320,19.093467,36,10690,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
5194,AA191175561416,2019-02-23T16:52:48.000Z,-104.444422,19.017923,27,10870,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
5195,AA191175561416,2019-02-23T18:05:25.000Z,-104.316002,19.063652,11,11930,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
5196,AA191175561416,2019-02-23T18:32:15.000Z,-104.299600,19.059592,0,24100,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961


In [12]:
port[port['TRANS_NODE_NAME'].isin(['ZADUR'])]

Unnamed: 0,TRANS_NODE_NAME,LONGITUDE,LATITUDE,COUNTRY,STATE,CITY,REGION,ADDRESS,PORT_CODE,TRANSPORT_NODE_ID
686,ZADUR,31.05008,-29.868304,South Africa,KwaZulu-Natal,Durban,Point,"79 Browns Rd, Point, Durban, 4001, South Africa",,2996000.0


# 总共12171个运单是可以找到起点和终点坐标的

In [14]:
len(clean_df['loadingOrder'].unique())

12171

In [78]:
def cut_zero_end(indexs,data):
    for i in range(len(data)-1,0,-1):
        if data[i]!=0:
            break
    return indexs[i+1:len(data)]
def cut_zero_start(indexs,data):
    for i in range(len(data)):
        if data[i]!=0:
            break
    return indexs[:i]
cut_list=[]
cut_name=[]
thres=200
grouped=clean_df.groupby('loadingOrder')
for name,group in grouped:
    if len(group)<thres:#记录小于thres条的去掉
        cut_list+=list(group.index)
        continue
    data=list(group['speed'][-thres:])
    indexs=list(group['speed'][-thres:].index)
    if data[-1]!=0 or data[1]!=0:
        cut_name.append(name)
    if data[-1]==0:
        cut_list+=cut_zero_end(indexs,data)
    data=list(group['speed'][:thres])
    indexs=list(group['speed'][:thres].index)
    if data[0]==0:
        cut_list+=cut_zero_start(indexs,data)
    
len(cut_list),len(cut_name)

(195378, 5318)

In [82]:
gpsdf=clean_df.drop(index=cut_list)
gpsdf=gpsdf.reset_index(drop=True)
gpsdf

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,TRANSPORT_TRACE,start_x,start_y,end_x,end_y
0,AA191175561416,2019-01-29T14:47:31.000Z,114.260940,22.570957,1,13340,CNYTN-MXZLO,114.275347,22.577700,-104.305571,19.085961
1,AA191175561416,2019-01-29T14:53:23.000Z,114.261352,22.570832,2,8420,CNYTN-MXZLO,114.275347,22.577700,-104.305571,19.085961
2,AA191175561416,2019-01-29T14:57:35.000Z,114.263375,22.570643,4,11890,CNYTN-MXZLO,114.275347,22.577700,-104.305571,19.085961
3,AA191175561416,2019-01-29T15:00:10.000Z,114.265763,22.568548,3,14650,CNYTN-MXZLO,114.275347,22.577700,-104.305571,19.085961
4,AA191175561416,2019-01-29T15:03:50.000Z,114.267432,22.566027,7,13280,CNYTN-MXZLO,114.275347,22.577700,-104.305571,19.085961
...,...,...,...,...,...,...,...,...,...,...,...
35655754,ZZ907366774129,2020-02-24T14:29:01.000Z,101.352083,3.012183,1,4700,CNSHK-MYPKG,113.863058,22.559462,101.361204,3.034709
35655755,ZZ907366774129,2020-02-24T14:30:47.000Z,101.352667,3.012750,1,4500,CNSHK-MYPKG,113.863058,22.559462,101.361204,3.034709
35655756,ZZ907366774129,2020-02-24T14:36:41.000Z,101.354883,3.013717,1,9900,CNSHK-MYPKG,113.863058,22.559462,101.361204,3.034709
35655757,ZZ907366774129,2020-02-24T14:38:30.000Z,101.355333,3.013700,1,8500,CNSHK-MYPKG,113.863058,22.559462,101.361204,3.034709


In [89]:
gpsdf

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,TRANSPORT_TRACE,start_x,start_y,end_x,end_y
0,AB614679152621,2019-12-16T17:49:51.000Z,114.283133,22.578533,3,7940,CNYTN-ESVAL,114.275347,22.577700,-0.327021,39.460366
1,AB614679152621,2019-12-16T17:50:37.000Z,114.283450,22.578633,2,6620,CNYTN-ESVAL,114.275347,22.577700,-0.327021,39.460366
2,AB614679152621,2019-12-16T17:51:22.000Z,114.283700,22.578767,2,5900,CNYTN-ESVAL,114.275347,22.577700,-0.327021,39.460366
3,AB614679152621,2019-12-16T17:52:22.000Z,114.283917,22.578967,2,3830,CNYTN-ESVAL,114.275347,22.577700,-0.327021,39.460366
4,AB614679152621,2019-12-16T17:53:10.000Z,114.284017,22.579133,1,1680,CNYTN-ESVAL,114.275347,22.577700,-0.327021,39.460366
...,...,...,...,...,...,...,...,...,...,...,...
8165431,ZZ824778274922,2020-03-04T05:10:30.000Z,136.798000,35.024833,7,31590,CNSHK-JPNGO,113.863058,22.559462,136.847423,35.056571
8165432,ZZ824778274922,2020-03-04T05:12:40.000Z,136.795667,35.026333,7,29530,CNSHK-JPNGO,113.863058,22.559462,136.847423,35.056571
8165433,ZZ824778274922,2020-03-04T05:14:21.000Z,136.794000,35.026833,5,28670,CNSHK-JPNGO,113.863058,22.559462,136.847423,35.056571
8165434,ZZ824778274922,2020-03-04T05:16:31.000Z,136.792333,35.027167,2,27440,CNSHK-JPNGO,113.863058,22.559462,136.847423,35.056571


In [84]:
gpsdf=gpsdf[~gpsdf['loadingOrder'].isin(cut_name)].reset_index(drop=True)#还剩下1263条样本

In [88]:
gpsdf.to_csv('clean_dataset/train1263.csv',index=False)

In [6]:
test_data_path = 'A_testData0531.csv'
test_data=pd.read_csv(test_data_path)
#test_data

In [7]:
test_trace=test_data['TRANSPORT_TRACE'].unique()
train_trace=clean_df['TRANSPORT_TRACE'].unique()
tmps=[]
for i in test_trace:
    if i in train_trace:
        tmps.append(i)

In [8]:
#
total=[]
reserve=[]
#grouped=clean_df.groupby('TRANSPORT_TRACE')
value=clean_df['TRANSPORT_TRACE'].values
for i in range(len(value)):
    if value[i] in tmps:
        reserve.append(i)
xx=clean_df.loc[reserve].reset_index(drop=True)

In [14]:
#
#xx.to_csv('clean_dataset/clean17Trace.csv',index=False)
4096599 /831

4929.72202166065

In [18]:
test_data_path = 'A_testData0531.csv'
test_data=pd.read_csv(test_data_path)
#test_data

In [12]:
test_trace=test_data['TRANSPORT_TRACE'].unique()
train_trace=clean_df['TRANSPORT_TRACE'].unique()
tmps=[]
for i in test_trace:
    if i in train_trace:
        tmps.append(i)

## GPS中共1038(出发-目的地对)，没有中转港的共365条，有598条含有中转港的数据，其余不全
### 测试数据中17条在gps中有记录,5条没有记录。
- ['CNYTN-MXZLO',
-  'CNSHK-SGSIN',
- 'CNSHK-CLVAP',
- 'CNYTN-ARENA',
- 'CNYTN-MATNG',
- 'CNSHK-PKQCT',
- 'COBUN-HKHKG',
- 'CNYTN-PAONX',
- 'CNSHK-SIKOP',
- 'CNYTN-CAVAN',
- 'CNYTN-MTMLA',
- 'CNSHK-ZADUR',
- 'CNSHK-LBBEY',
- 'CNYTN-RTM',
- 'CNHKG-MXZLO',
- 'CNYTN-NZAKL',
- 'CNSHA-PAMIT']
 
 没有记录的5条:
 ['CNSHK-MYTPP', 'CNSHK-GRPIR', 'CNSHK-ESALG', 'CNSHA-SGSIN', 'HKHKG-FRFOS']
 
 ## GPS中共有598条含有中转港的数据
 如果算上有中转港的数据:
 ['CNSHK-MYTPP', 'CNSHK-GRPIR']在gps中也有出现
 [CNSHK-CNNSA-CIABJ-BJCOO-SGSIN-MYTPP,CNSHK-TRTEK-GRPIR]
 
- CNYTN-EGPSE-MATNG
- CNYTN-HKHKG-AUBNE-NZAKL
- CNYTN-CNSHA-KRPUS-CAVAN
- CNSHK-SGSIN-EGPSE-LBBEY
- CNSHK-MYTPP-MUPLU-ZADUR
- CNYTN-CNXAM-CNSGH-PAONX
- CNSHK-SGSIN-MYWSP-ZADUR
- CNSHK-SGSIN-EGPSD-LBBEY
- CNSHK-SGSIN-EGSUZ-EGPSE-ILHFA-SIKOP
- CNSHK-SGSIN-MTMLA-SIKOP
- CNSHK-SGSIN-AEJEA-QAHMD-SADMM-OMSOH-SGSIN
- CNYTN-TWKHH-CNSHA-CNNBG-MXZLO
- CNYTN-TWKHH-PAPAN-PAONX
- CNSHK-CNNSA-CIABJ-GHTEM-TGFLW-SGSIN
- CNSHK-CNNSA-BJCOO-MYTPP-SGSIN
- CNYTN-TWKHH-CAVAN
- CNSHK-CNNSA-TGFLW-BFOUA-NAWVB-SGSIN
- CNSHK-CNNSA-TGFLW-CMDLA-SGSIN
 

In [19]:
gpsdfx=pd.read_csv('train0523.csv',usecols=[12])
gpsdfx.columns =['TRANSPORT_TRACE']
test_trace=test_data['TRANSPORT_TRACE'].unique()
train_trace=gpsdfx['TRANSPORT_TRACE'].unique()
tmps=[]
for i in test_trace:
    if i in train_trace:
        tmps.append(i)
tmps

['CNYTN-MXZLO',
 'CNSHK-SGSIN',
 'CNSHK-CLVAP',
 'CNYTN-ARENA',
 'CNYTN-MATNG',
 'CNSHK-PKQCT',
 'COBUN-HKHKG',
 'CNYTN-PAONX',
 'CNSHK-SIKOP',
 'CNYTN-CAVAN',
 'CNYTN-MTMLA',
 'CNSHK-ZADUR',
 'CNSHK-LBBEY',
 'CNYTN-RTM',
 'CNHKG-MXZLO',
 'CNYTN-NZAKL',
 'CNSHA-PAMIT']

In [42]:
#这段代码用于计算中转港有关
cnt=0
i=0
for v in train_trace:
    
    if type(v)==str:
        if '-' in v:
            tmp=v.split('-')
            if len(tmp)>2:
                del_median=tmp[0]+'-'+tmp[-1]
                if del_median in tmps:#['CNSHK-MYTPP', 'CNSHK-GRPIR', 'CNSHK-ESALG', 'CNSHA-SGSIN', 'HKHKG-FRFOS']:
                    print(v)
cnt

CNYTN-EGPSE-MATNG
CNYTN-HKHKG-AUBNE-NZAKL
CNYTN-CNSHA-KRPUS-CAVAN
CNSHK-SGSIN-EGPSE-LBBEY
CNSHK-MYTPP-MUPLU-ZADUR
CNYTN-CNXAM-CNSGH-PAONX
CNSHK-SGSIN-MYWSP-ZADUR
CNSHK-SGSIN-EGPSD-LBBEY
CNSHK-SGSIN-EGSUZ-EGPSE-ILHFA-SIKOP
CNSHK-SGSIN-MTMLA-SIKOP
CNSHK-SGSIN-AEJEA-QAHMD-SADMM-OMSOH-SGSIN
CNYTN-TWKHH-CNSHA-CNNBG-MXZLO
CNYTN-TWKHH-PAPAN-PAONX
CNSHK-CNNSA-CIABJ-GHTEM-TGFLW-SGSIN
CNSHK-CNNSA-BJCOO-MYTPP-SGSIN
CNYTN-TWKHH-CAVAN
CNSHK-CNNSA-TGFLW-BFOUA-NAWVB-SGSIN
CNSHK-CNNSA-TGFLW-CMDLA-SGSIN


0

##  历史运单事件数据. 历史运单事件数据描述每个运单在船运的过程中，与港口相关的关键信息，如离开起运港、到达目的港等
- loadingOrder 运单号，与历史运单GPS数据中的loadingOrder字段一致
- EVENT_CODE 事件编码，主要事件包括：
 - TRANSIT PORT ATD实际离开中转港
 - SHIPMENT ONBOARD DATE实际离开起运港
 - TRANSIT PORT ATA实际到达中转港
 - ARRIVAL AT PORT实际到达目的港
 - 注：部分船可能没有中转港
- EVENT_LOCATION_ID 港口名称，对应“港口坐标数据”表中的字段TRANS_NODE_NAME
- EVENT_CONVOLUTION_DATE  事件发生的时间，格式为：yyyy/MM/dd HH:mm:ss（dd与HH之间为两个空格）例如Event_code为“SHIPMENT ONBOARD DATE"时，此字段表示船从起运港出发的时间。
- EVENT_CODE为“ARRIVAL AT PORT"时，此字段表示船到达目的港的时间。

## 清洗event数据
- event中具有ARRIVAL AT PORT的记录总共14538条,占比14538/158341
- event中具有ARRIVAL AT PORT的运单总共有8926条，占比8926/15512.说明有的运单有多个ARRIVAL AT PORT时间,经过分析发现，感觉是因为有重复值出现和数据错误
- 筛选出有出发记录和到达记录的运单(有且仅有一个出发/到达记录，并且到达时间晚于出发时间)


In [138]:
#event中具有ARRIVAL AT PORT的记录总共14538条,占比14538/158341
#event中具有ARRIVAL AT PORT的运单总共有8926条，占比8926/15512.说明有的运单有多个ARRIVAL AT PORT时间
#其中AY399952630533更是达到了17次出现ARRIVAL AT PORT，但经过分析，仿佛是由于数据重复和错误
event=pd.read_csv('./event_port/loadingOrderEvent.csv')
df_event=event[event['EVENT_CODE'].isin(['SHIPMENT ONBOARD DATE','ARRIVAL AT PORT'])].reset_index()
df_event.sort_values(['loadingOrder', 'EVENT_CONVOLUTION_DATE'],inplace=True)
df_event.drop(['index'],axis=1,inplace=True)
df_event=df_event.reset_index(drop=True)
df_event['EVENT_CONVOLUTION_DATE'] = pd.to_datetime(df_event['EVENT_CONVOLUTION_DATE'], infer_datetime_format=True)
#下面的代码是筛选出有出发记录和到达记录的运单(有且仅有一个出发/到达记录，并且到达时间晚于出发时间)
grouped=df_event.groupby('loadingOrder')
reserve_order=[]
for name,group in grouped:
    if len(group['EVENT_CODE'])==2:
        tmp=group['EVENT_CODE'].reset_index(drop=True)
        tmp_time=group['EVENT_CONVOLUTION_DATE'].reset_index(drop=True)
        if tmp[:1][0]=='SHIPMENT ONBOARD DATE' and tmp[-1:][1]=='ARRIVAL AT PORT' and tmp_time[:1][0]<tmp_time[-1:][1]:
            reserve_order.append(group['loadingOrder'].reset_index(drop=True)[:1][0])
            #print(group['loadingOrder'].reset_index(drop=True)[:1][0])
print("满足条件运单个数:",len(reserve_order))
orders=df_event['loadingOrder'].values
reserve_index=[]
for index in range(len(orders)):
    if orders[index] in reserve_order:
        reserve_index.append(index)
df_event=df_event.loc[reserve_index].reset_index(drop=True)
df_event.to_csv('event_clean.csv',index=False)

  interactivity=interactivity, compiler=compiler, result=result)


满足条件运单个数: 978


Unnamed: 0,loadingOrder,EVENT_CODE,EVENT_LOCATION_ID,EVENT_CONVOLUTION_DATE
0,AB283635056094,SHIPMENT ONBOARD DATE,CNSHK,2020-04-16 22:08:00
1,AB283635056094,ARRIVAL AT PORT,MYPKG,2020-04-20 21:41:00
2,AC188113754775,SHIPMENT ONBOARD DATE,CNYTN,2019-04-02 12:53:00
3,AC188113754775,ARRIVAL AT PORT,MXZLO,2019-04-08 11:11:00
4,AE172923690170,SHIPMENT ONBOARD DATE,CNYTN,2020-03-18 02:23:00
...,...,...,...,...
1951,[W849031957501,ARRIVAL AT PORT,MACAS,2020-04-12 22:00:00
1952,[Y861332548397,SHIPMENT ONBOARD DATE,SIKOP,2020-02-26 18:02:00
1953,[Y861332548397,ARRIVAL AT PORT,CNYTN,2020-04-06 23:50:00
1954,[Z759280240591,SHIPMENT ONBOARD DATE,CNSHK,2020-03-14 17:19:00


In [156]:
#为了方便连接event和gps，这里将event数据重新组织
df_empty = pd.DataFrame(columns=['loadingOrder', 'start_port', 'end_port', 'start_time','end_time']) 
event_order=[]
event_start_port=[]
event_end_port=[]
event_start_time=[]
event_end_time=[]
for index,order,_,ID,time in df_event.itertuples():
    if index%2==0:
        event_order.append(order)
        event_start_port.append(ID)
        event_end_port.append(df_event['EVENT_LOCATION_ID'][index+1])
        event_start_time.append(time)
        event_end_time.append(df_event['EVENT_CONVOLUTION_DATE'][index+1])

df_empty['loadingOrder']=event_order
df_empty['start_port']=event_start_port
df_empty['end_port']=event_end_port
df_empty['start_time']=event_start_time
df_empty['end_time']=event_end_time
df_empty

Unnamed: 0,loadingOrder,start_port,end_port,start_time,end_time
0,AB283635056094,CNSHK,MYPKG,2020-04-16 22:08:00,2020-04-20 21:41:00
1,AC188113754775,CNYTN,MXZLO,2019-04-02 12:53:00,2019-04-08 11:11:00
2,AE172923690170,CNYTN,RTM,2020-03-18 02:23:00,2020-04-10 14:46:00
3,AF167947002003,CNSHK,THLCH,2019-11-25 23:59:00,2020-03-01 22:59:00
4,AF842018574399,CNYTN,TRIZT,2020-01-14 18:20:00,2020-03-06 17:21:00
...,...,...,...,...,...
973,[V777965078223,CNSHK,PHMNL,2020-03-29 23:50:00,2020-04-06 23:50:00
974,[W782379188175,CNSHK,EGALY,2020-03-07 17:14:00,2020-04-19 17:20:00
975,[W849031957501,CNYTN,MACAS,2020-03-17 10:00:00,2020-04-12 22:00:00
976,[Y861332548397,SIKOP,CNYTN,2020-02-26 18:02:00,2020-04-06 23:50:00


## 下面需要将gps数据和event数据对应起来(按照loadingOrder连接)
- 经过清洗:gps有7159条运单被保留，event中仅有978条

In [150]:
#下面筛选出gps和event中共有的运单数据,得到的是最终的gps数据,但是这样筛选出来只有388条运单，这似乎是不太够的
event_order=df_event['loadingOrder'].unique()
gps_order=clean_df['loadingOrder'].values
reserve_index=[]
for index in range(len(gps_order)):
    if index%5000000==0:
        print(index)
    #print(gps_order[index])
    if gps_order[index] in event_order:
        reserve_index.append(index)
gps_event=clean_df.loc[reserve_index].reset_index(drop=True)
gps_event.to_csv('gps_event200w.csv',index=False)#

0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
11000000
12000000
13000000
14000000
15000000
16000000
17000000
18000000
19000000
20000000
21000000
22000000
23000000
24000000
25000000
26000000
27000000
28000000
29000000
30000000
31000000
32000000
33000000
34000000
35000000
36000000
37000000


Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,TRANSPORT_TRACE
0,QL937761313845,2019-01-10T16:16:54.000Z,113.890020,22.449807,0,3770,CNSHK-HRRJK
1,QL937761313845,2019-01-10T16:31:53.000Z,113.890040,22.449805,0,3760,CNSHK-HRRJK
2,QL937761313845,2019-01-10T16:46:53.000Z,113.890042,22.449810,0,3700,CNSHK-HRRJK
3,QL937761313845,2019-01-10T17:19:54.000Z,113.890053,22.449830,0,3720,CNSHK-HRRJK
4,QL937761313845,2019-01-10T17:37:53.000Z,113.890037,22.449823,0,3770,CNSHK-HRRJK
...,...,...,...,...,...,...,...
2002095,SY689544812125,2020-04-30T01:36:29.000Z,120.512892,25.716412,28,2710,CNYTN-CLSAN
2002096,SY689544812125,2020-04-30T01:39:37.000Z,120.519463,25.728125,28,2690,CNYTN-CLSAN
2002097,SY689544812125,2020-04-30T01:42:01.000Z,120.524593,25.737330,28,2680,CNYTN-CLSAN
2002098,SY689544812125,2020-04-30T01:47:14.000Z,120.535722,25.757097,28,2870,CNYTN-CLSAN


In [152]:
gps_event['loadingOrder'].describe()

count            2002100
unique               388
top       JN852747112402
freq               47827
Name: loadingOrder, dtype: object

In [158]:
gps_event388=gps_event.merge(df_empty, on='loadingOrder', how='left')
gps_event388.to_csv('gps_event388.csv',index=False)

In [159]:
gps_event388

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,TRANSPORT_TRACE,start_port,end_port,start_time,end_time
0,QL937761313845,2019-01-10T16:16:54.000Z,113.890020,22.449807,0,3770,CNSHK-HRRJK,SHEKOU,SURABAYA,2019-01-25 09:00:00,2019-01-30 17:00:00
1,QL937761313845,2019-01-10T16:31:53.000Z,113.890040,22.449805,0,3760,CNSHK-HRRJK,SHEKOU,SURABAYA,2019-01-25 09:00:00,2019-01-30 17:00:00
2,QL937761313845,2019-01-10T16:46:53.000Z,113.890042,22.449810,0,3700,CNSHK-HRRJK,SHEKOU,SURABAYA,2019-01-25 09:00:00,2019-01-30 17:00:00
3,QL937761313845,2019-01-10T17:19:54.000Z,113.890053,22.449830,0,3720,CNSHK-HRRJK,SHEKOU,SURABAYA,2019-01-25 09:00:00,2019-01-30 17:00:00
4,QL937761313845,2019-01-10T17:37:53.000Z,113.890037,22.449823,0,3770,CNSHK-HRRJK,SHEKOU,SURABAYA,2019-01-25 09:00:00,2019-01-30 17:00:00
...,...,...,...,...,...,...,...,...,...,...,...
2002095,SY689544812125,2020-04-30T01:36:29.000Z,120.512892,25.716412,28,2710,CNYTN-CLSAN,CNYTN,SIKOP,2020-02-23 09:48:00,2020-02-28 03:00:00
2002096,SY689544812125,2020-04-30T01:39:37.000Z,120.519463,25.728125,28,2690,CNYTN-CLSAN,CNYTN,SIKOP,2020-02-23 09:48:00,2020-02-28 03:00:00
2002097,SY689544812125,2020-04-30T01:42:01.000Z,120.524593,25.737330,28,2680,CNYTN-CLSAN,CNYTN,SIKOP,2020-02-23 09:48:00,2020-02-28 03:00:00
2002098,SY689544812125,2020-04-30T01:47:14.000Z,120.535722,25.757097,28,2870,CNYTN-CLSAN,CNYTN,SIKOP,2020-02-23 09:48:00,2020-02-28 03:00:00


In [162]:
#下面是将gps和event进行连接，主要是起点终点的名字+gps坐标+出发时间和到达时间
def convert_name_xy(name):#输入港口名称
    port_name=port[port['TRANS_NODE_NAME'].isin([name])].reset_index()
    return port_name['LONGITUDE'][0],port_name['LATITUDE'][0]#返回港口经纬度
#  
start_x=[]#起点
start_y=[]#起点
end_x=[]#终点
end_y=[]#终点
#存储中间结果,避免重复计算
temp_dic={}
for index,start_port,end_port in gps_event388[['start_port','end_port'].itertuples()
for index,start_port,end_port in gps_event388[['start_port','end_port'].itertuples():
    if start_port in temp_dic:
        re=temp_dic[start_port]
    else:
        re=convert_name_xy(start_port)
        temp_dic[start_port]=re
    start_x.append(re[0])
    start_y.append(re[1])
    if end_port in temp_dic:
        re=temp_dic[end_port]
    else:
        re=convert_name_xy(end_port)
        temp_dic[end_port]=re
    end_x.append(re[0])
    end_y.append(re[1])
gps_event388['start_x']=start_x
gps_event388['start_y']=start_y
gps_event388['end_x']=end_x
gps_event388['end_y']=end_y

SyntaxError: invalid syntax (<ipython-input-162-bd5c7b2dc9b1>, line 13)

### 测试运单数据
- loadingOrder 运单的运单号
- timestamp 运单当前所处位置（经度、维度）的时间，格式为：yyyy-MM-dd'T'HH:mm:ss.SSSZ，如2019-09-05T16:33:17.000Z
- longitude 运单承运船舶的当前经度：114.234567 
- latitude 运单承运船舶的当前纬度：21.234567
- speed 货物在运输过程中，当前船舶的瞬时速度，部分数据未提供的可自行计算。
- direction 当前船舶的行驶方向，正北是0度，31480代表西北方向314.80度，900代表正东偏南9度。
- carrierName 承运商名称，类似快递公司名称
- vesselMMSI 脱敏后的船舶海上移动业务识别码MMSI， 唯一标识，对应到每一艘船
- onboardDate 离开起运港时间，格式为：yyyy/MM/dd HH:mm:ss（dd与HH之间为两个空格），如2019/09/05 16:33:17
- TRANSPORT_TRACE 船的路由，由“-”连接组成，例如CNSHK-MYPKG-MYTPP。由承运商预先录入，实际小概率存在不按此路由行驶（如遇塞港时），但最终会到达目的港口。

In [110]:
test_data_path = 'A_testData0531.csv'
test_data=pd.read_csv(test_data_path)
test_data.head()

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,carrierName,vesselMMSI,onboardDate,TRANSPORT_TRACE
0,CF946210847851,2019-04-02T02:42:28.000Z,138.471062,40.278787,31,5800,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO
1,CF946210847851,2019-04-02T02:59:28.000Z,138.552168,40.327785,30,4600,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO
2,CF946210847851,2019-04-02T03:07:28.000Z,138.58825,40.352542,30,4900,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO
3,CF946210847851,2019-04-02T03:43:28.000Z,138.751325,40.459447,30,5000,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO
4,CF946210847851,2019-04-02T04:29:28.000Z,138.969782,40.581485,30,5000,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO


In [111]:
#test_data.TRANSPORT_TRACE

In [112]:
def convert_name_xy(name):#输入港口名称
    port_name=port[port['TRANS_NODE_NAME'].isin([name])].reset_index()
    return port_name['LONGITUDE'][0],port_name['LATITUDE'][0]#返回港口经纬度
#  
start_x=[]#起点
start_y=[]#起点
end_x=[]#终点
end_y=[]#终点
#存储中间结果,避免重复计算
temp_dic={}
for value in test_data['TRANSPORT_TRACE']:
    s_e=value.split('-')
    start_port=s_e[0]
    end_port=s_e[1]
    if start_port in temp_dic:
        re=temp_dic[start_port]
    else:
        re=convert_name_xy(value.split('-')[0])
        temp_dic[start_port]=re
    start_x.append(re[0])
    start_y.append(re[1])
    if end_port in temp_dic:
        re=temp_dic[end_port]
    else:
        re=convert_name_xy(value.split('-')[1])
        temp_dic[end_port]=re
    end_x.append(re[0])
    end_y.append(re[1])
test_data['start_x']=start_x
test_data['start_y']=start_y
test_data['end_x']=end_x
test_data['end_y']=end_y

In [113]:
test_data.head(2)

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,carrierName,vesselMMSI,onboardDate,TRANSPORT_TRACE,start_x,start_y,end_x,end_y
0,CF946210847851,2019-04-02T02:42:28.000Z,138.471062,40.278787,31,5800,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961
1,CF946210847851,2019-04-02T02:59:28.000Z,138.552168,40.327785,30,4600,OIEQNT,R5480015614,2019/04/02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961


In [117]:
#
data=test_data
data['temp_timestamp'] = data['timestamp']#当前时间
data['onboardDate'] = pd.to_datetime(data['onboardDate'], infer_datetime_format=True)#离开起运港时间
data['timestamp'] = pd.to_datetime(data['timestamp'], infer_datetime_format=True)
data['longitude'] = data['longitude'].astype(float)
data['loadingOrder'] = data['loadingOrder'].astype(str)
data['latitude'] = data['latitude'].astype(float)
data['speed'] = data['speed'].astype(float)
data['direction'] = data['direction'].astype(float)

In [136]:
df=data
data.sort_values(['loadingOrder', 'timestamp'], inplace=True)#按照订单和当前时间排序
#下面是重点:计算该订单当前时间之前的所有轨迹的特征
# 特征只选择经纬度、速度\方向
df['lat_diff'] = df.groupby('loadingOrder')['latitude'].diff(1)#经度变化
df['lon_diff'] = df.groupby('loadingOrder')['longitude'].diff(1)#纬度变化
df['speed_diff'] = df.groupby('loadingOrder')['speed'].diff(1)#速度变化量
df['diff_minutes'] = df.groupby('loadingOrder')['timestamp'].diff(1).dt.total_seconds() // 60#记录之间的间隔时间(单位:s)
#经纬度变化;速度变化，间隔时间小于一定阈值的认定为船没有动
df['anchor'] = df.apply(lambda x: 1 if x['lat_diff'] <= 0.03 and x['lon_diff'] <= 0.03
                        and x['speed_diff'] <= 0.3 and x['diff_minutes'] <= 10 else 0, axis=1)
#对于每一个运单,记录条数(count)
group_df = df.groupby('loadingOrder')['timestamp'].agg([('count','count')]).reset_index()
#在所有的记录中，多少记录是停船状态
#anchors=df.groupby('loadingOrder')['anchor']
#sum_anchors=[]
#for _,group in anchors:#这里求的是从起点到当前点的累计和，而不是总和(避免数据泄露)
    #sum_anchors+=[sum(group[:v]) for v in range(1,len(group)+1)]
#df['anchor_cnt']=sum_anchors
#以经纬,度,速度，方向为基础特征，然后计算最小，最大，均值，终值等统计特性（以运单为单位）
agg_function = ['min', 'max', 'mean', 'median']
agg_col = ['latitude', 'longitude', 'speed', 'direction']

group = df.groupby('loadingOrder')[agg_col].agg(agg_function).reset_index()
#anchor_df.columns = ['loadingOrder', 'anchor_cnt']

In [144]:
port[port['TRANS_NODE_NAME'].isin(['MXZLO'])].reset_index()

Unnamed: 0,index,TRANS_NODE_NAME,LONGITUDE,LATITUDE,COUNTRY,STATE,CITY,REGION,ADDRESS,PORT_CODE,TRANSPORT_NODE_ID
0,535,MXZLO,-104.305571,19.085961,Mexico,Colima,Manzanillo,Vista de Mar II,"Blvd. Miguel de la Madrid 750, Vista de Mar II...",,2844000.0


In [108]:
#
df.sort_values(['loadingOrder', 'timestamp'], inplace=True)
# 特征只选择经纬度、速度\方向
df['lat_diff'] = df.groupby('loadingOrder')['latitude'].diff(1)#经度变化
df['lon_diff'] = df.groupby('loadingOrder')['longitude'].diff(1)#纬度变化
df['speed_diff'] = df.groupby('loadingOrder')['speed'].diff(1)#速度变化量
df['diff_minutes'] = df.groupby('loadingOrder')['timestamp'].diff(1).dt.total_seconds() // 60#记录之间的间隔时间(单位:s)
#经纬度变化;速度变化，间隔时间小于一定阈值的认定为船没有动
df['anchor'] = df.apply(lambda x: 1 if x['lat_diff'] <= 0.03 and x['lon_diff'] <= 0.03
                        and x['speed_diff'] <= 0.3 and x['diff_minutes'] <= 10 else 0, axis=1)
#对于每一个运单，统计最大时间(mmax),记录条数(count),最小时间(mmin )
#同时计算时间间隔，也就是运单抵达"终点"需要的时间,作为标签
if mode=='train':
    group_df = df.groupby('loadingOrder')['timestamp'].agg([('mmax','max'), ('count','count'), ('mmin','min')]).reset_index()
    # 读取数据的最大值-最小值，即确认时间间隔为label
    group_df['label'] = (group_df['mmax'] - group_df['mmin']).dt.total_seconds()
elif mode=='test':
    group_df = df.groupby('loadingOrder')['timestamp'].agg([('count','count')]).reset_index()
#在所有的记录中，多少记录是停船状态
anchor_df = df.groupby('loadingOrder')['anchor'].agg('sum').reset_index()
anchor_df.columns = ['loadingOrder', 'anchor_cnt']
group_df = group_df.merge(anchor_df, on='loadingOrder', how='left')
#merge回原表，增加一列: 'anchor_cnt'，同时算一下停船的次数占总记录的比例
group_df['anchor_ratio'] = group_df['anchor_cnt'] / group_df['count']
#以经纬,度,速度，方向为基础特征，然后计算最小，最大，均值，终值等统计特性（以运单为单位）
agg_function = ['min', 'max', 'mean', 'median']
agg_col = ['latitude', 'longitude', 'speed', 'direction']

group = df.groupby('loadingOrder')[agg_col].agg(agg_function).reset_index()
group.columns = ['loadingOrder'] + ['{}_{}'.format(i, j) for i in agg_col for j in agg_function]
group_df = group_df.merge(group, on='loadingOrder', how='left')

Unnamed: 0,loadingOrder,timestamp,longitude,latitude,speed,direction,carrierName,vesselMMSI,onboardDate,TRANSPORT_TRACE,start_x,start_y,end_x,end_y,temp_timestamp
0,CF946210847851,2019-04-02 02:42:28,138.471062,40.278787,31.0,5800.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T02:42:28.000Z
1,CF946210847851,2019-04-02 02:59:28,138.552168,40.327785,30.0,4600.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T02:59:28.000Z
2,CF946210847851,2019-04-02 03:07:28,138.588250,40.352542,30.0,4900.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T03:07:28.000Z
3,CF946210847851,2019-04-02 03:43:28,138.751325,40.459447,30.0,5000.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T03:43:28.000Z
4,CF946210847851,2019-04-02 04:29:28,138.969782,40.581485,30.0,5000.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T04:29:28.000Z
5,CF946210847851,2019-04-02 04:41:28,139.023647,40.617033,30.0,4900.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T04:41:28.000Z
6,CF946210847851,2019-04-02 04:49:28,139.059750,40.641672,30.0,4700.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T04:49:28.000Z
7,CF946210847851,2019-04-02 04:53:28,139.077772,40.654002,30.0,4800.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T04:53:28.000Z
8,CF946210847851,2019-04-02 05:13:28,139.171020,40.710698,30.0,5500.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T05:13:28.000Z
9,CF946210847851,2019-04-02 05:17:28,139.190528,40.720732,29.0,5500.0,OIEQNT,R5480015614,2019-04-02 02:42:28,CNYTN-MXZLO,114.275347,22.5777,-104.305571,19.085961,2019-04-02T05:17:28.000Z
