## Preprocessing trajectory data

Data preprocessing is a set of activities performed to prepare data for future analysis and data mining activities.

## Load data from file

The dataset used in this tutorial is GeoLife GPS Trajectories. Available in https://www.microsoft.com/en-us/download/details.aspx?id=52367

In [39]:
import pandas as pd
import numpy as np
from pymove import MoveDataFrame

In [40]:
df = pd.read_csv('examples/geolife_sample.csv', parse_dates=['datetime'])
df.head()

Unnamed: 0,lat,lon,datetime,id
0,39.984094,116.319236,2008-10-23 05:53:05,1
1,39.984198,116.319322,2008-10-23 05:53:06,1
2,39.984224,116.319402,2008-10-23 05:53:11,1
3,39.984211,116.319389,2008-10-23 05:53:16,1
4,39.984217,116.319422,2008-10-23 05:53:21,1


In [41]:
df_move = MoveDataFrame(df, latitude="lat", longitude="lon", datetime="datetime")

In [4]:
df_move.show_trajectories_info()



Number of Points: 217653

Number of IDs objects: 2

Start Date:2008-10-23 05:53:05     End Date:2009-03-19 05:46:37

Bounding Box:(22.147577, 113.54884299999999, 41.132062, 121.156224)





## Filtering

The filters module provides functions to perform different types of data filtering.

Importing the module:

In [5]:
from pymove import filters

A bounding box (usually shortened to bbox) is an area defined by two longitudes and two latitudes. The function by_bbox, filters points of the trajectories according to a especified bounding box.

In [6]:
bbox = (22.147577, 113.54884299999999, 41.132062, 121.156224)
filt_df = filters.by_bbox(df_move, bbox)
filt_df.head()

Unnamed: 0,lat,lon,datetime,id
0,39.984094,116.319236,2008-10-23 05:53:05,1
1,39.984198,116.319322,2008-10-23 05:53:06,1
2,39.984224,116.319402,2008-10-23 05:53:11,1
3,39.984211,116.319389,2008-10-23 05:53:16,1
4,39.984217,116.319422,2008-10-23 05:53:21,1


by_datetime function filters point trajectories according to the time specified by the parameters: start_datetime and end_datetime.

In [7]:
filters.by_datetime(df_move,start_datetime = "2009-03-19 05:45:37", end_datetime = "2009-03-19 05:46:17")

Unnamed: 0,lat,lon,datetime,id
217644,40.000128,116.327171,2009-03-19 05:45:42,5
217645,40.000069,116.327179,2009-03-19 05:45:47,5
217646,40.000001,116.327219,2009-03-19 05:45:52,5
217647,39.999919,116.327211,2009-03-19 05:45:57,5
217648,39.999896,116.32729,2009-03-19 05:46:02,5
217649,39.999899,116.327352,2009-03-19 05:46:07,5
217650,39.999945,116.327394,2009-03-19 05:46:12,5
217651,40.000015,116.327433,2009-03-19 05:46:17,5


by label function filters trajectories points according to specified value and column label, set by value and label_name respectively.

In [8]:
filters.by_label(df_move, value = 116.327219, label_name = "lon").head()

Unnamed: 0,lat,lon,datetime,id
3066,39.97916,116.327219,2008-10-24 06:34:27,1
13911,39.975424,116.327219,2008-10-26 08:18:06,1
16396,39.980411,116.327219,2008-10-27 00:30:47,1
33935,39.975832,116.327219,2008-11-05 11:04:04,1
41636,39.97699,116.327219,2008-11-07 10:34:41,1


by_id function filters trajectories points according to especified trajectory id.

In [9]:
filters.by_id(df_move, id_=5).head()

Unnamed: 0,lat,lon,datetime,id
108607,40.004155,116.321337,2008-10-24 04:12:30,5
108608,40.003834,116.321462,2008-10-24 04:12:35,5
108609,40.003783,116.321431,2008-10-24 04:12:40,5
108610,40.00369,116.321429,2008-10-24 04:12:45,5
108611,40.003589,116.321427,2008-10-24 04:12:50,5


A tid is the result of concatenation between the id and date of a trajectory.
The by_tid function filters trajectory points according to the tid specified by the tid_ parameter.

In [10]:
df_move.generate_tid_based_on_id_datatime()
filters.by_tid(df_move, "12008102305").head()


Creating or updating tid feature...

...Sorting by id and datetime to increase performance


...tid feature was created...



Unnamed: 0,lat,lon,datetime,id,tid
0,39.984094,116.319236,2008-10-23 05:53:05,1,12008102305
1,39.984198,116.319322,2008-10-23 05:53:06,1,12008102305
2,39.984224,116.319402,2008-10-23 05:53:11,1,12008102305
3,39.984211,116.319389,2008-10-23 05:53:16,1,12008102305
4,39.984217,116.319422,2008-10-23 05:53:21,1,12008102305


outliers function filters trajectories points that are outliers.

In [11]:
outliers_points = filters.outliers(df_move)
outliers_points.head()


Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance

(217653/217653) 100% in 00:00:00.066 - estimated end in 00:00:00.000
...Reset index

..Total Time: 0.07281041145324707
...Filtring jumps 



Unnamed: 0,id,lat,lon,datetime,tid,dist_to_prev,dist_to_next,dist_prev_to_next
148,1,39.970511,116.341455,2008-10-23 10:32:53,12008102310,1452.319115,1470.641291,71.08846
338,1,39.995042,116.326465,2008-10-23 10:44:24,12008102310,10.80186,10.274331,1.465144
8133,1,39.991075,116.188395,2008-10-25 08:20:19,12008102508,5.090766,6.24786,1.295191
10175,1,40.015169,116.311045,2008-10-25 23:40:12,12008102523,23.454754,24.899678,3.766959
13849,1,39.977157,116.327151,2008-10-26 08:13:53,12008102608,11.212682,10.221164,1.004375


clen_duplicates function removes the duplicate rows of the Dataframe, optionally only certaind columns can be consider.

In [12]:
filters.clean_duplicates(df_move)


Remove rows duplicates by subset
...Sorting by id and datetime to increase performance

...There are no GPS points duplicated


clean_consecutive_duplicates function removes consecutives duplicate rows of the Dataframe. Optionally only certaind columns can be consider, this is defined by the parameter subset, in this example only the lat column is considered.

In [13]:
filtered_df = filters.clean_consecutive_duplicates(df_move, subset = ["lat"])
len(filtered_df)

196142

clean_nan_values function removes missing values from the dataframe.

In [14]:
filters.clean_nan_values(df_move)
len(df_move)

217649

clean_gps_jumps_by_distance function removes from the dataframe the trajectories points that are outliers.

In [15]:
filters.clean_gps_jumps_by_distance(df_move)


Cleaning gps jumps by distance to jump_coefficient 3.0...

...Filtring jumps 

...Dropping 383 rows of gps points

...Rows before: 217649, Rows after:217266, Sum drop:383


Cleaning gps jumps by distance to jump_coefficient 3.0...

...Filtring jumps 

383 GPS points were dropped


clean_gps_nearby_points_by_distances function removes points from the trajectories when the distance between them and the point before is smaller than the parameter radius_area.

In [16]:
filters.clean_gps_nearby_points_by_distances(df_move, radius_area = 10)


Cleaning gps points from radius of 10 meters

...There are 137478 gps points to drop

...Dropping 137478 gps points

...Rows before: 217266, Rows after:79788


Cleaning gps points from radius of 10 meters

...There are 0 gps points to drop



clean_gps_nearby_points_by_speed function removes points from the trajectories when the speed of travel between them
and the point before is smaller than the value set by the parameter speed_radius.

In [17]:
filters.clean_gps_nearby_points_by_speed(df_move, speed_radius=40.0)


Creating or updating distance, time and speed features in meters by seconds

...Sorting by id and datetime to increase performance

...Set id as index to a higher peformance

(79788/79788) 100% in 00:00:00.057 - estimated end in 00:00:00.000
...Reset index...

..Total Time: 0.063

Cleaning gps points using 40.0 speed radius

...There are 79586 gps points to drop

...Dropping 79586 gps points

...Rows before: 79788, Rows after:202


Cleaning gps points using 0.0 speed radius

...There are 0 gps points to drop



clean_gps_speed_max_radius function recursively removes trajectories points with speed higher than the value especifeid by the user.
    Given any point p of the trajectory, the point will be removed if one of the following happens:
    if the travel speed from the point before p to p is greater than the  max value of speed between adjacent
    points set by the user. Or the travel speed between point p and the next point is greater than the value set by
    the user. When the clening is done, the function will update the time and distance features in the dataframe and
    will call itself again.
    The function will finish processing when it can no longer find points disrespecting the limit of speed.

In [18]:
filters.clean_gps_speed_max_radius(df_move)


Clean gps points with speed max > 50.0 meters by seconds
...There 125 gps points with speed_max > 50.0

...Dropping 125 rows of jumps by speed max

...Rows before: 202, Rows after:77


Clean gps points with speed max > 50.0 meters by seconds
...There 0 gps points with speed_max > 50.0



clean_trajectories_with_few_points function removes from the given dataframe, trajectories with fewer points than was specified by the parameter min_points_per_trajectory.

In [19]:
filters.clean_trajectories_with_few_points(df_move)


...There are 44 ids with few points

...Tids before drop: 57

...Tids after drop: 13

...Shape - before drop: (77, 10) - after drop: (33, 10)

Creating or updating distance, time and speed features in meters by seconds

...Sorting by tid and datetime to increase performance

...Set tid as index to a higher peformance

(33/33) 100% in 00:00:00.012 - estimated end in 00:00:00.000
...Reset index...

..Total Time: 0.017


In [20]:
filters.clean_id_by_time_max(df_move)


Clean gps points with time max by id < 3600 seconds
...Ids total: 2
Ids to drop:1
...Rows before drop: 33
 Rows after drop: 22


Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='int64')

## Segmantation

The segmentation module are used to segment trajectories based on different parameters.

Importing the module:

In [21]:
from pymove import segmentation

bbox_split function splits the bounding box in grids of the same size. The number of grids is defined by the parameter number_grids.

In [22]:
bbox = (22.147577, 113.54884299999999, 41.132062, 121.156224)
segmentation.bbox_split(bbox, number_grids=4)

const_lat: 4.74612125
const_lon: 1.901845250000001


Unnamed: 0,lat_min,lon_min,lat_max,lon_max
0,22.147577,113.548843,41.132062,115.450688
1,22.147577,115.450688,41.132062,117.352533
2,22.147577,117.352533,41.132062,119.254379
3,22.147577,119.254379,41.132062,121.156224


by_dist_time_speed functions segments the trajectories into clusters based on distance, time and speed. The distance, time and speed limits by the parameters by max_dist_between_adj_points, max_time_between_adj_points, max_speed_between_adj_points respectively. The column tid_part is added, it indicates the segment to which the point belongs to.

In [23]:
segmentation.by_dist_time_speed(df_move, max_dist_between_adj_points=5000, 
                                max_time_between_adj_points=800,max_speed_between_adj_points=60.0)
df_move.head()


Split trajectories
...max_time_between_adj_points: 800
...max_dist_between_adj_points: 5000
...max_speed: 60.0
...setting id as index


  (move_data.at[idx, DIST_TO_PREV] > max_dist_between_adj_points) | \
  (move_data.at[idx, SPEED_TO_PREV] > max_speed_between_adj_points)


(22/22) 100% in 00:00:00.005 - estimated end in 00:00:00.000
... Reseting index

...No trajs with only one point. (22, 11)


Unnamed: 0,id,tid,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,tid_part
0,5,52008112104,40.009455,116.320434,2008-11-21 04:38:39,,244.727171,447.043215,,,1
1,5,52008112104,40.010695,116.322808,2008-11-21 04:38:44,244.727171,9.996359,252.024213,5.0,48.945434,1
2,5,52008112608,40.001656,116.329943,2008-11-26 08:24:40,,133.78113,281.429126,,,1
3,5,52008112608,40.000535,116.327103,2008-11-26 08:25:00,272.134164,27.292639,226.902202,20.0,13.606708,1
4,5,52009011410,39.998759,116.326371,2009-01-14 10:46:27,,172.987008,217.745345,,,1


by_speed function segments the trajectories into clusters based on speed. The speed limit is defined by the parameter max_speed_between_adj_points. The column tid_speed is added, it indicates the segment to  which the point belongs to.

In [24]:
segmentation.by_speed(df_move, max_speed_between_adj_points=70.0)
df_move.head()


Split trajectories by max_speed_between_adj_points: 70.0
...setting id as index


  speed = (move_data.at[idx, SPEED_TO_PREV] > max_speed_between_adj_points)


(22/22) 100% in 00:00:00.000 - estimated end in 00:00:00.000
... Reseting index

...No trajs with only one point. (22, 12)


Unnamed: 0,id,tid,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,tid_part,tid_speed
0,5,52008112104,40.009455,116.320434,2008-11-21 04:38:39,,244.727171,447.043215,,,1,1
1,5,52008112104,40.010695,116.322808,2008-11-21 04:38:44,244.727171,9.996359,252.024213,5.0,48.945434,1,1
2,5,52008112608,40.001656,116.329943,2008-11-26 08:24:40,,133.78113,281.429126,,,1,1
3,5,52008112608,40.000535,116.327103,2008-11-26 08:25:00,272.134164,27.292639,226.902202,20.0,13.606708,1,1
4,5,52009011410,39.998759,116.326371,2009-01-14 10:46:27,,172.987008,217.745345,,,1,1


by_time function segments the trajectories into clusters based on time. The time limit is defined by the parameter max_time_between_adj_points. The column tid_time is added, it indicates the segment to  which the point belongs to.

In [25]:
segmentation.by_time(df_move, max_time_between_adj_points = 1000)
df_move.head()


Split trajectories by max_time_between_adj_points: 1000
...setting id as index


  times = (move_data.at[idx, TIME_TO_PREV] > max_time_between_adj_points)


(22/22) 100% in 00:00:00.001 - estimated end in 00:00:00.000
... Reseting index

...No trajs with only one point. (22, 13)


Unnamed: 0,id,tid,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,tid_part,tid_speed,tid_time
0,5,52008112104,40.009455,116.320434,2008-11-21 04:38:39,,244.727171,447.043215,,,1,1,1
1,5,52008112104,40.010695,116.322808,2008-11-21 04:38:44,244.727171,9.996359,252.024213,5.0,48.945434,1,1,1
2,5,52008112608,40.001656,116.329943,2008-11-26 08:24:40,,133.78113,281.429126,,,1,1,1
3,5,52008112608,40.000535,116.327103,2008-11-26 08:25:00,272.134164,27.292639,226.902202,20.0,13.606708,1,1,1
4,5,52009011410,39.998759,116.326371,2009-01-14 10:46:27,,172.987008,217.745345,,,1,1,1


segment_traj_by_max_dist function segments the trajectories into clusters based on distance. The distance limit is defined by the parameter max_dist_between_adj_points. The column tid_dist is added, it indicates the segment to which the point belongs to.

In [26]:
segmentation.by_max_dist(df_move, max_dist_between_adj_points = 4000)
df_move.head()

Split trajectories by max distance between adjacent points: 4000
...setting id as index


  dist = (move_data.at[idx, DIST_TO_PREV] > max_dist_between_adj_points)


(22/22) 100% in 00:00:00.000 - estimated end in 00:00:00.000
... Reseting index

Total Time: 0.01 seconds
------------------------------------------



Unnamed: 0,id,tid,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,tid_part,tid_speed,tid_time,tid_dist
0,5,52008112104,40.009455,116.320434,2008-11-21 04:38:39,,244.727171,447.043215,,,1,1,1,1
1,5,52008112104,40.010695,116.322808,2008-11-21 04:38:44,244.727171,9.996359,252.024213,5.0,48.945434,1,1,1,1
2,5,52008112608,40.001656,116.329943,2008-11-26 08:24:40,,133.78113,281.429126,,,1,1,1,1
3,5,52008112608,40.000535,116.327103,2008-11-26 08:25:00,272.134164,27.292639,226.902202,20.0,13.606708,1,1,1,1
4,5,52009011410,39.998759,116.326371,2009-01-14 10:46:27,,172.987008,217.745345,,,1,1,1,1


In [27]:
segmentation.by_max_time(df_move)

Split trajectories by max_time_between_adj_points: 900.0
...setting id as index


  times = (move_data.at[idx, TIME_TO_PREV] > max_time_between_adj_points)


(22/22) 100% in 00:00:00.001 - estimated end in 00:00:00.000
... Reseting index

Total Time: 0.01 seconds
------------------------------------------



In [28]:
segmentation.by_max_speed(df_move)

Split trajectories by max_speed_between_adj_points: 50.0
...setting id as index


  speed = (move_data.at[idx, SPEED_TO_PREV] > max_speed_between_adj_points)


(22/22) 100% in 00:00:00.000 - estimated end in 00:00:00.000
... Reseting index

Total Time: 0.01 seconds
------------------------------------------



## Stay point detection 

A stay point is location where a moving object has stayed for a while within a certain distance threshold. A stay point could stand different places such: a restaurant, a school, a work place.

Importing the module:

In [29]:
from pymove import stay_point_detection

stay_point_detection function converts the time data into a cyclical format. The columns hour_sin and hour_cos are added to the dataframe.

In [45]:
stay_point_detection.create_update_datetime_in_format_cyclical(df_move)

Encoding cyclical continuous features - 24-hour time
...hour_sin and  hour_cos features were created...



In [46]:
df_move.head()

Unnamed: 0,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,situation,hour_sin,hour_cos
0,1,39.984094,116.319236,2008-10-23 05:53:05,,13.690153,,,0.979084,0.203456
1,1,39.984198,116.319322,2008-10-23 05:53:06,13.690153,7.403788,20.223428,move,0.979084,0.203456
2,1,39.984224,116.319402,2008-10-23 05:53:11,7.403788,1.821083,5.888579,move,0.979084,0.203456
3,1,39.984211,116.319389,2008-10-23 05:53:16,1.821083,2.889671,1.873356,stop,0.979084,0.203456
4,1,39.984217,116.319422,2008-10-23 05:53:21,2.889671,66.555997,68.72726,move,0.979084,0.203456


create_or_update_move_stop_by_dist_time function creates or updates the stay points of the trajectories, based on distance and time metrics. The column segment_stop is added to the dataframe, it indicates the trajectory segment to  which the point belongs to. The column stop is also added, it indicates is the point represents a stop, a place where the object was stationary.

In [47]:
stay_point_detection.create_or_update_move_stop_by_dist_time(df_move, dist_radius=40, time_radius=1000)

Split trajectories by max distance between adjacent points: 40
...setting id as index


  dist = (move_data.at[idx, DIST_TO_PREV] > max_dist_between_adj_points)


(217653/217653) 100% in 00:00:00.066 - estimated end in 00:00:00.000
... Reseting index

Total Time: 0.07 seconds
------------------------------------------


Creating or updating distance, time and speed features in meters by seconds

...Sorting by segment_stop and datetime to increase performance

...Set segment_stop as index to a higher peformance

(5/217653) 0% in 00:00:00.173 - estimated end in 02:05:45.476
(43995/217653) 20% in 00:00:00.222 - estimated end in 00:00:00.877
(88581/217653) 40% in 00:00:00.266 - estimated end in 00:00:00.387
(130800/217653) 60% in 00:00:00.330 - estimated end in 00:00:00.219
(174825/217653) 80% in 00:00:00.479 - estimated end in 00:00:00.117
...Reset index...

..Total Time: 0.585
Create or update stop as True or False
...Creating stop features as True or False using 1000 to time in seconds
True     157738
False     59915
Name: stop, dtype: int64

Total Time: 0.76 seconds
-----------------------------------------------------



In [33]:
df_move.head()

Unnamed: 0,segment_stop,id,tid,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,tid_part,tid_speed,tid_time,tid_dist,hour_sin,hour_cos,stop
0,1,5,52008112104,40.009455,116.320434,2008-11-21 04:38:39,,244.727171,447.043215,,,1,1,1,1,0.887885,0.460065,False
1,2,5,52008112104,40.010695,116.322808,2008-11-21 04:38:44,,9.996359,252.024213,,,1,1,1,1,0.887885,0.460065,True
2,2,5,52008112608,40.001656,116.329943,2008-11-26 08:24:40,1174.527381,133.78113,281.429126,445556.0,0.002636,1,1,1,1,0.81697,-0.57668,True
3,3,5,52008112608,40.000535,116.327103,2008-11-26 08:25:00,,27.292639,226.902202,,,1,1,1,1,0.81697,-0.57668,True
4,3,5,52009011410,39.998759,116.326371,2009-01-14 10:46:27,207.091817,172.987008,217.745345,4242087.0,4.9e-05,1,1,1,1,0.398401,-0.917211,True


create_update_move_and_stop_by_radius function creates or updates the stay points of the trajectories, based on distance. The column situation is also added, it indicates if the point represents a stop point or a moving point.

In [43]:
stay_point_detection.create_update_move_and_stop_by_radius(df_move, radius=2)


Creating or updating features MOVE and STOPS...


Creating or updating distance features in meters...

...Sorting by id and datetime to increase performance

...Set id as index to increase attribution performance

(217653/217653) 100% in 00:00:00.067 - estimated end in 00:00:00.000
...Reset index

..Total Time: 0.07639217376708984

....There are 58981 stops to this parameters



In [35]:
df_move.head()

Unnamed: 0,segment_stop,id,tid,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,time_to_prev,speed_to_prev,tid_part,tid_speed,tid_time,tid_dist,hour_sin,hour_cos,stop,situation
0,1,5,52008112104,40.009455,116.320434,2008-11-21 04:38:39,,244.727171,447.043215,,,1,1,1,1,0.887885,0.460065,False,
1,2,5,52008112104,40.010695,116.322808,2008-11-21 04:38:44,,9.996359,252.024213,,,1,1,1,1,0.887885,0.460065,True,
2,2,5,52008112608,40.001656,116.329943,2008-11-26 08:24:40,1174.527381,133.78113,281.429126,445556.0,0.002636,1,1,1,1,0.81697,-0.57668,True,move
3,3,5,52008112608,40.000535,116.327103,2008-11-26 08:25:00,,27.292639,226.902202,,,1,1,1,1,0.81697,-0.57668,True,
4,3,5,52009011410,39.998759,116.326371,2009-01-14 10:46:27,207.091817,172.987008,217.745345,4242087.0,4.9e-05,1,1,1,1,0.398401,-0.917211,True,move


## Compression

Importing the module:

In [36]:
from pymove import compression

The function below is used to reduce the size of the trajectory, the stop points are used to make the compression. 

In [37]:
compression.compress_segment_stop_to_point(df_move)

...setting mean to lat and lon...
...move segments will be dropped...
...get only segments stop...


HBox(children=(IntProgress(value=0, max=6), HTML(value='')))

(6/12) 50% in 00:00:00.064 - estimated end in 00:00:00.064
(12/12) 100% in 00:00:00.101 - estimated end in 00:00:00.000

...Dropping 10 points...
...Shape_before: 22
...Current shape: 12
...Compression time: 0.113 seconds
-----------------------------------------------------



In [48]:
compression.compress_segment_stop_to_point_optimizer(df_move)

...setting mean to lat and lon...
...move segments will be dropped...
...get only segments stop...


HBox(children=(IntProgress(value=0, max=254), HTML(value='')))

(364/157738) 0% in 00:00:00.059 - estimated end in 00:00:25.871
(15982/157738) 10% in 00:00:00.223 - estimated end in 00:00:01.981
(31984/157738) 20% in 00:00:00.432 - estimated end in 00:00:01.700
(48029/157738) 30% in 00:00:00.617 - estimated end in 00:00:01.411
(79310/157738) 50% in 00:00:00.951 - estimated end in 00:00:00.940
(95470/157738) 60% in 00:00:01.111 - estimated end in 00:00:00.724
(110658/157738) 70% in 00:00:01.372 - estimated end in 00:00:00.584
(126313/157738) 80% in 00:00:01.772 - estimated end in 00:00:00.441
(142093/157738) 90% in 00:00:02.108 - estimated end in 00:00:00.232
(157738/157738) 100% in 00:00:02.411 - estimated end in 00:00:00.000

...Dropping 217145 points...
...Shape_before: 217653
...Current shape: 508
...Compression time: 2.540 seconds
-----------------------------------------------------



In [49]:
from pymove import datetime

In [50]:
df_move.generate_weekend_features()


Creating or updating day of the week feature...

...the day of the week feature was created...

Creating or updating a feature for weekend

...Weekend was set as 1 or 0...

...dropping colum day



In [51]:
df_move

Unnamed: 0,segment_stop,id,lat,lon,datetime,dist_to_prev,dist_to_next,dist_prev_to_next,situation,hour_sin,hour_cos,time_to_prev,speed_to_prev,stop,lat_mean,lon_mean,weekend
195,6,1,39.981364,116.326798,2008-10-23 10:36:01,,12.442111,57.678462,move,3.984011e-01,-0.917211,,,True,39.991524,116.326408,0
558,6,1,40.010720,116.314060,2008-10-23 10:56:50,16.445867,76.470304,77.194096,move,3.984011e-01,-0.917211,5.0,3.289173,True,39.991524,116.326408,0
561,9,1,40.009262,116.312948,2008-10-23 10:56:55,,35.598360,100.606304,move,3.984011e-01,-0.917211,,,True,40.013824,116.306534,0
1368,9,1,39.990973,116.326094,2008-10-24 00:04:28,16.068387,40.942759,56.717519,move,0.000000e+00,1.000000,2.0,8.034193,True,40.013824,116.306534,0
1575,13,1,39.978484,116.326845,2008-10-24 01:45:41,,14.366909,50.322414,move,2.697968e-01,0.962917,,,True,39.980125,116.310745,0
1847,13,1,39.980909,116.308171,2008-10-24 02:28:19,21.870376,51.683385,53.596809,move,5.195840e-01,0.854419,1508.0,0.014503,True,39.980125,116.310745,0
1914,15,1,39.982334,116.308818,2008-10-24 03:16:35,,14.205236,145.788854,move,7.308360e-01,0.682553,,,True,39.979740,116.313614,0
2646,15,1,39.982684,116.311197,2008-10-24 05:40:23,6.437367,73.921445,72.942149,move,9.790841e-01,0.203456,5.0,1.287473,True,39.979740,116.313614,0
2667,19,1,39.981691,116.310004,2008-10-24 06:09:34,,19.612423,51.388287,move,9.976688e-01,-0.068242,,,True,39.981541,116.310104,0
3039,19,1,39.979459,116.325806,2008-10-24 06:33:05,5.697464,78.265156,83.404371,move,9.976688e-01,-0.068242,5.0,1.139493,True,39.981541,116.310104,0
