<center><img src="logo_skmob.png" width=450 align="left" /></center>

# Preprocessing mobility data

- Repo: [http://bit.ly/skmob_repo](http://bit.ly/skmob_repo)
- Docs: [http://bit.ly/skmob_doc](http://bit.ly/skmob_doc)
- Paper: [http://bit.ly/skmob_paper](http://bit.ly/skmob_paper)



### GPS: the [GeoLife dataset](https://www.microsoft.com/en-us/download/details.aspx?id=52367)

collected in (Microsoft Research Asia) **GeoLife** project by 182 users in the period Apr 2007 - Aug 2012.

- 17,621 trajectories
- total distance of about 1.2 million kilometers 
- total duration of 48,000+ hours.

In [1]:
# import the skmob and pandas libraries
import skmob
import pandas as pd

In [3]:
tdf = skmob.TrajDataFrame.from_file('data/geolife_sample.txt.gz').sort_values(by='datetime')
print(type(tdf))
print(tdf.crs, tdf.parameters)
tdf.head()

<class 'skmob.core.trajectorydataframe.TrajDataFrame'>
{'init': 'epsg:4326'} {'from_file': 'data/geolife_sample.txt.gz'}


Unnamed: 0,lat,lng,datetime,uid
0,39.984094,116.319236,2008-10-23 05:53:05,1
1,39.984198,116.319322,2008-10-23 05:53:06,1
2,39.984224,116.319402,2008-10-23 05:53:11,1
3,39.984211,116.319389,2008-10-23 05:53:16,1
4,39.984217,116.319422,2008-10-23 05:53:21,1


In [4]:
tdf.plot_trajectory(zoom=12, weight=3, opacity=0.9, tiles='Stamen Toner',
                    start_end_markers=False)

- How many users in the data set?
- How many points?
- What's the time window?

In [5]:
print('# users: %s' %len(tdf.uid.unique()))
print('# points: %s' %len(tdf))
print('time window: %s' 
      %(tdf.iloc[-1].datetime - tdf.iloc[0].datetime))

# users: 2
# points: 217653
time window: 146 days 23:53:32


## Let's focus on a single user
using the *select* operation as we do in **pandas**

In [6]:
user1_tdf = tdf[tdf.uid == 1]
user1_tdf.head()

Unnamed: 0,lat,lng,datetime,uid
0,39.984094,116.319236,2008-10-23 05:53:05,1
1,39.984198,116.319322,2008-10-23 05:53:06,1
2,39.984224,116.319402,2008-10-23 05:53:11,1
3,39.984211,116.319389,2008-10-23 05:53:16,1
4,39.984217,116.319422,2008-10-23 05:53:21,1


In [7]:
user1_map = user1_tdf.plot_trajectory(zoom=11, weight=3, hex_color='black',
                                      tiles='Open Street Map')
user1_map

## Mobility data preprocessing

There are 3 common steps we can apply to clean our data:

1. Filtering (`filtering.filter`)
- Compression (`compression.compress`)
- Stop detection (`detection.stops`)
- Stops clustering (`clustering.cluster`)


## Filtering trajectories

Filter out points with speed higher than `max_speed` km/h from the previous point.

In [8]:
from skmob.preprocessing import filtering

In [9]:
# filter points with speed higher than 500km/h
max_speed_kmh = 500.
user1_f_tdf = filtering.filter(user1_tdf, max_speed_kmh=max_speed_kmh)

In [10]:
user1_f_tdf.parameters

{'from_file': 'data/geolife_sample.txt.gz',
 'filter': {'function': 'filter',
  'max_speed_kmh': 500.0,
  'include_loops': False,
  'speed_kmh': 5.0,
  'max_loop': 6,
  'ratio_max': 0.25}}

Very few points have been filtered.

In [11]:
print('Filtered points:\t%s'%(len(user1_tdf) - len(user1_f_tdf)))

Filtered points:	18


In [12]:
# indicator adds column _merge
merged = user1_tdf.merge(user1_f_tdf, indicator=True, how='outer')
diff_df = merged[merged['_merge'] == 'left_only']
diff_df

Unnamed: 0,lat,lng,datetime,uid,_merge
149,39.977648,116.326925,2008-10-23 10:33:00,1,left_only
17792,40.013398,116.30649,2008-10-27 12:27:55,1,left_only
23212,39.975403,116.312814,2008-10-31 06:15:21,1,left_only
23213,39.975342,116.312961,2008-10-31 06:15:23,1,left_only
24509,40.070867,116.301276,2008-11-01 01:06:36,1,left_only
24510,40.070832,116.301441,2008-11-01 01:06:37,1,left_only
25373,40.062216,116.294486,2008-11-01 04:17:41,1,left_only
25374,40.061976,116.294452,2008-11-01 04:17:42,1,left_only
25375,40.061711,116.29427,2008-11-01 04:17:43,1,left_only
25376,40.061615,116.294441,2008-11-01 04:17:44,1,left_only


Extract the filtered points between indexes `25372` and `23377`.

In [13]:
min_index, max_index = 25373, 25376
dt_start = user1_tdf.iloc[min_index - 1]['datetime']
dt_end = user1_tdf.iloc[max_index + 1]['datetime']

filtered_tdf = user1_f_tdf[(user1_f_tdf['datetime'] >= dt_start) \
                 & (user1_f_tdf['datetime'] <= dt_end)]

unfiltered_tdf = user1_tdf[(user1_tdf['datetime'] >= dt_start) \
                  & (user1_tdf['datetime'] <= dt_end)]
filtered_tdf

Unnamed: 0,lat,lng,datetime,uid
25366,40.064046,116.301866,2008-11-01 04:17:40,1
25367,40.061521,116.294584,2008-11-01 04:17:45,1


Compute the speeds between consecutive points on the unfiltered trajectory

In [14]:
lat_lng_dt = unfiltered_tdf[['lat', 'lng', 'datetime']].values

In [15]:
# avg speed (km/h) between last not filtered point and following points
from  skmob.utils.gislib import getDistance
lat0, lng0, dt0 = lat_lng_dt[0]
pd.DataFrame(
    [[dt0, dt , getDistance((lat, lng), (lat0, lng0)) / ((dt - dt0).seconds / 3600),
     getDistance((lat, lng), (lat0, lng0)) / ((dt - dt0).seconds / 3600) > max_speed_kmh] \
     for i, (lat ,lng, dt) in enumerate(lat_lng_dt[1:])], \
             columns=['time 0', 'time 1', 'speed (km/h)', 'to_filter'])

Unnamed: 0,time 0,time 1,speed (km/h),to_filter
0,2008-11-01 04:17:40,2008-11-01 04:17:41,2376.687211,True
1,2008-11-01 04:17:40,2008-11-01 04:17:42,1208.91039,True
2,2008-11-01 04:17:40,2008-11-01 04:17:43,835.951942,True
3,2008-11-01 04:17:40,2008-11-01 04:17:44,618.545448,True
4,2008-11-01 04:17:40,2008-11-01 04:17:45,489.850389,False


In [16]:
# Cut a buffer of 10 points around the filtered part
dt_start = user1_tdf.iloc[min_index - 10]['datetime']
dt_end = user1_tdf.iloc[max_index + 10]['datetime']

filtered_tdf = user1_f_tdf[(user1_f_tdf['datetime'] >= dt_start) \
                 & (user1_f_tdf['datetime'] <= dt_end)]

unfiltered_tdf = user1_tdf[(user1_tdf['datetime'] >= dt_start) \
                  & (user1_tdf['datetime'] <= dt_end)]
filtered_tdf.head()

Unnamed: 0,lat,lng,datetime,uid
25357,40.070903,116.299084,2008-11-01 04:16:34,1
25358,40.070882,116.298857,2008-11-01 04:16:35,1
25359,40.070858,116.298628,2008-11-01 04:16:36,1
25360,40.070833,116.298399,2008-11-01 04:16:37,1
25361,40.070808,116.298165,2008-11-01 04:16:38,1


In [17]:
map_f = unfiltered_tdf.plot_trajectory(zoom=14, weight=10, opacity=0.5, hex_color='black') #, tiles='Stamen Toner')
filtered_tdf.plot_trajectory(map_f=map_f, hex_color='red')

## Compressing trajectories

Reduce the number of points of the trajectory, preserving the structure.

Merge together all points that are closer than `spatial_radius_km=0.2` kilometers from each other.

In [18]:
from skmob.preprocessing import compression

In [19]:
user1_cf_tdf = compression.compress(user1_f_tdf, spatial_radius_km=0.5)
user1_cf_tdf.head()

Unnamed: 0,lat,lng,datetime,uid
0,39.98382,116.321003,2008-10-23 05:53:05,1
1,39.979671,116.323967,2008-10-23 05:57:45,1
2,39.978253,116.327275,2008-10-23 06:01:05,1
3,39.970511,116.341455,2008-10-23 10:32:53,1
4,39.979096,116.32713,2008-10-23 10:33:05,1


In [20]:
user1_cf_tdf.parameters

{'from_file': 'data/geolife_sample.txt.gz',
 'filter': {'function': 'filter',
  'max_speed_kmh': 500.0,
  'include_loops': False,
  'speed_kmh': 5.0,
  'max_loop': 6,
  'ratio_max': 0.25},
 'compress': {'function': 'compress', 'spatial_radius_km': 0.5}}

The compressed trajectory has only a small fraction of the points of the filtered trajectory.

In [21]:
print('Points of the filtered trajectory:\t%s'%len(user1_f_tdf))
print('Points of the compressed trajectory:\t%s'%len(user1_cf_tdf))
print('Compressed points:\t\t\t%s'%(len(user1_f_tdf)-len(user1_cf_tdf)))

Points of the filtered trajectory:	108589
Points of the compressed trajectory:	1281
Compressed points:			107308


In [22]:
end_time = user1_f_tdf.iloc[10000]['datetime']
map_f = user1_f_tdf[user1_f_tdf['datetime'] < end_time].plot_trajectory(weight=5, hex_color='black',
                                                                      opacity=0.5, start_end_markers=False)
user1_cf_tdf[user1_cf_tdf['datetime'] < end_time].plot_trajectory(map_f=map_f, \
                                                  start_end_markers=False, hex_color='red')

## Stop detection

Identify locations where the user spent at least `minutes_for_a_stop` minutes within a distance `spatial_radius_km` $\times$ `stop_radius_factor`, from a given point. 

A new column `leaving_datetime` is added, indicating the time when the user departs from the stop.

In [23]:
from skmob.preprocessing import detection

In [24]:
user1_scf_tdf = detection.stops(user1_cf_tdf, stop_radius_factor=0.5, \
            minutes_for_a_stop=20.0, spatial_radius_km=0.2, 
                       leaving_time=True)
user1_scf_tdf.head()

Unnamed: 0,lat,lng,datetime,uid,leaving_datetime
0,39.978253,116.327275,2008-10-23 06:01:05,1,2008-10-23 10:32:53
1,40.013823,116.306532,2008-10-23 11:05:14,1,2008-10-23 23:47:13
2,39.979035,116.326414,2008-10-24 00:10:22,1,2008-10-24 01:49:43
3,39.979928,116.311075,2008-10-24 01:51:56,1,2008-10-24 03:50:45
4,39.978857,116.325902,2008-10-24 03:53:06,1,2008-10-24 05:33:44


In [25]:
user1_scf_tdf.parameters

{'from_file': 'data/geolife_sample.txt.gz',
 'filter': {'function': 'filter',
  'max_speed_kmh': 500.0,
  'include_loops': False,
  'speed_kmh': 5.0,
  'max_loop': 6,
  'ratio_max': 0.25},
 'compress': {'function': 'compress', 'spatial_radius_km': 0.5},
 'detect': {'function': 'stops',
  'stop_radius_factor': 0.5,
  'minutes_for_a_stop': 20.0,
  'spatial_radius_km': 0.2,
  'leaving_time': True,
  'no_data_for_minutes': 1000000000000.0,
  'min_speed_kmh': None}}

### Visualise the compressed trajectory and the stops

Click on the stop markers to see a pop up with: 
- User ID
- Coordinates of the stop (click to see the location on Google maps)
- Arrival time
- Departure time

In [26]:
map_f = user1_scf_tdf.plot_trajectory(max_points=1000, hex_color=-1, start_end_markers=False)
user1_scf_tdf.plot_stops(map_f=map_f, hex_color=-1)

## Stops define <font color="blue">trips</font>
Let's take the first trip of the individual using the stops

In [27]:
user1_scf_tdf.head()

Unnamed: 0,lat,lng,datetime,uid,leaving_datetime
0,39.978253,116.327275,2008-10-23 06:01:05,1,2008-10-23 10:32:53
1,40.013823,116.306532,2008-10-23 11:05:14,1,2008-10-23 23:47:13
2,39.979035,116.326414,2008-10-24 00:10:22,1,2008-10-24 01:49:43
3,39.979928,116.311075,2008-10-24 01:51:56,1,2008-10-24 03:50:45
4,39.978857,116.325902,2008-10-24 03:53:06,1,2008-10-24 05:33:44


In [28]:
dt1 = user1_scf_tdf.iloc[0].leaving_datetime
dt2 = user1_scf_tdf.iloc[1].leaving_datetime
dt1, dt2

(Timestamp('2008-10-23 10:32:53'), Timestamp('2008-10-23 23:47:13'))

In [29]:
# select all points between the first two stops
user1_tid1_tdf = user1_tdf[(user1_tdf.datetime >= dt1) 
                           & (user1_tdf.datetime <= dt2)]
user1_tid1_tdf.head()

Unnamed: 0,lat,lng,datetime,uid
148,39.970511,116.341455,2008-10-23 10:32:53,1
149,39.977648,116.326925,2008-10-23 10:33:00,1
150,39.977586,116.326918,2008-10-23 10:33:05,1
151,39.977596,116.326894,2008-10-23 10:33:10,1
152,39.977661,116.326947,2008-10-23 10:33:14,1


In [30]:
# plot the trip
user1_tid1_map = user1_tid1_tdf.plot_trajectory(zoom=12, weight=5, opacity=0.9, tiles='Stamen Toner', )
user1_tid1_map

Compute the length of the trip and the distance between origin and destination

In [31]:
from skmob.utils.gislib import getDistanceByHaversine
from skmob.measures.individual import distance_straight_line
# take origin and destination of the trip
start_loc = user1_tid1_tdf.iloc[0][['lat', 'lng']]
end_loc = user1_tid1_tdf.iloc[-1][['lat', 'lng']]
# compute distance between origin and destination
print("distance:", getDistanceByHaversine(end_loc, start_loc))

distance: 5.425994961770171


In [32]:
distance_straight_line(user1_tid1_tdf)

100%|██████████| 1/1 [00:00<00:00, 57.71it/s]


Unnamed: 0,uid,distance_straight_line
0,1,8.92382


## Compute some features based on trips

In [33]:
def number_of_trips(tdf, stop_radius_factor=0.5, minutes_for_a_stop=20.0, spatial_radius_km=0.2):
    """
    Compute the number of trips for each object.
    """
    # detect the stops for each individual
    stdf = detection.stops(tdf, stop_radius_factor=stop_radius_factor, 
                             minutes_for_a_stop=minutes_for_a_stop, 
                           spatial_radius_km=spatial_radius_km, leaving_time=True)
    return stdf.groupby('uid').apply(lambda user_stdf: len(user_stdf)).reset_index().rename(columns={0: 'n_trips'})

In [34]:
number_of_trips(tdf)

Unnamed: 0,uid,n_trips
0,1,145
1,5,246


In [35]:
def number_of_evening_trips(tdf, stop_radius_factor=0.5, minutes_for_a_stop=20.0, 
                                   spatial_radius_km=0.2):
    """
    Number of subtrajectories that end in the evening.
    """
    def get_evening_trips(user_stdf, evening_time=['16:00', '20:00']):
        start_evening, end_evening = evening_time
        return len(user_stdf.set_index('leaving_datetime').between_time(start_evening, 
                                                                end_evening))
    stdf = detection.stops(tdf, stop_radius_factor=stop_radius_factor, 
                             minutes_for_a_stop=minutes_for_a_stop, 
                           spatial_radius_km=spatial_radius_km, 
                             leaving_time=True)
    return stdf.groupby('uid').apply(lambda user_stdf: get_evening_trips(user_stdf)).reset_index().rename(columns={0: 'evening_trips'})

In [36]:
number_of_evening_trips(tdf)

Unnamed: 0,uid,evening_trips
0,1,2
1,5,22


In [38]:
def average_stops_per_day(tdf, stop_radius_factor=0.5, minutes_for_a_stop=20.0, 
                                   spatial_radius_km=0.2):
    """
    Average number of stops per day
    """
    def get_stops_per_day(user_stdf):
        return user_stdf.groupby(user_stdf.leaving_datetime.dt.floor('d')).size().reset_index(name='count').mean()

    stdf = detection.stops(tdf, stop_radius_factor=stop_radius_factor, 
                             minutes_for_a_stop=minutes_for_a_stop, 
                           spatial_radius_km=spatial_radius_km, 
                             leaving_time=True)
    return stdf.groupby('uid').apply(lambda user_stdf: get_stops_per_day(user_stdf)).reset_index().rename(columns={'count': 'avg_stops_per_day'})             

In [39]:
average_stops_per_day(tdf)

Unnamed: 0,uid,avg_stops_per_day
0,1,3.372093
1,5,4.032787


## Find clusters of stops
- stops are clustered by spatial proximity using DBSCAN

- a new column `cluster` is added with cluster ID (`int`)

- 0 is the most visited, 1 the second most visited,  etc.

In [40]:
from skmob.preprocessing import clustering
user1_clscf_tdf = clustering.cluster(user1_scf_tdf)
user1_clscf_tdf.head()

Unnamed: 0,lat,lng,datetime,uid,leaving_datetime,cluster
0,39.978253,116.327275,2008-10-23 06:01:05,1,2008-10-23 10:32:53,0
1,40.013823,116.306532,2008-10-23 11:05:14,1,2008-10-23 23:47:13,1
2,39.979035,116.326414,2008-10-24 00:10:22,1,2008-10-24 01:49:43,0
3,39.979928,116.311075,2008-10-24 01:51:56,1,2008-10-24 03:50:45,6
4,39.978857,116.325902,2008-10-24 03:53:06,1,2008-10-24 05:33:44,0


In [41]:
user1_clscf_tdf.parameters

{'from_file': 'data/geolife_sample.txt.gz',
 'filter': {'function': 'filter',
  'max_speed_kmh': 500.0,
  'include_loops': False,
  'speed_kmh': 5.0,
  'max_loop': 6,
  'ratio_max': 0.25},
 'compress': {'function': 'compress', 'spatial_radius_km': 0.5},
 'detect': {'function': 'stops',
  'stop_radius_factor': 0.5,
  'minutes_for_a_stop': 20.0,
  'spatial_radius_km': 0.2,
  'leaving_time': True,
  'no_data_for_minutes': 1000000000000.0,
  'min_speed_kmh': None},
 'cluster': {'function': 'cluster',
  'cluster_radius_km': 0.1,
  'min_samples': 1}}

## Visualise clustered stops: 
- stops in the same clusters have the same color.

In [42]:
map_f = user1_clscf_tdf.plot_trajectory(start_end_markers=False, hex_color='black')
user1_clscf_tdf.plot_stops(map_f=map_f)

## Social Media: the <font color="blue">Brightkite</font> data set
[Brightkite](https://snap.stanford.edu/data/loc-brightkite.html) was a location-based social networking service provider where users shared their locations by checking-in in the period Apr 2008 - Oct 2010: 
- 58,228 users
- 4,491,143 checkins

In [43]:
# load the pandas DataFrame
url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
df = pd.read_csv(url, sep='\t', header=0, nrows=10000, names=['user', 'check-in_time', 'latitude', 'longitude', 'location id'])

# convert it to a TrajDataFrame
btdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-in_time', user_id='user')

print(btdf.shape, len(btdf['uid'].unique()))
btdf.head()

(10000, 5) 8


Unnamed: 0,uid,datetime,lat,lng,location id
0,0,2010-10-16 06:02:04+00:00,39.891383,-105.070814,7a0f88982aa015062b95e3b4843f9ca2
1,0,2010-10-16 03:48:54+00:00,39.891077,-105.068532,dd7cd3d264c2d063832db506fba8bf79
2,0,2010-10-14 18:25:51+00:00,39.750469,-104.999073,9848afcc62e500a01cf6fbf24b797732f8963683
3,0,2010-10-14 00:21:47+00:00,39.752713,-104.996337,2ef143e12038c870038df53e0478cefc
4,0,2010-10-13 23:31:51+00:00,39.752508,-104.996637,424eb3dd143292f9e013efa00486c907


In [44]:
btdf['leaving_datetime'] = btdf.datetime
# take the points of a single user
user0_btdf = btdf[btdf.uid == btdf.uid.unique()[0]]
# take a sample of 200 random points
user0_btdf_sample = user0_btdf.sample(200)
# plot the stops of the user
user0_map = user0_btdf_sample.plot_stops(zoom=3)
# plot the trajectory of the user
user0_btdf_sample.plot_trajectory(map_f=user0_map)

## Filtering
Filter out all points with speed higher than max_speed km/h from the previous point.

In [45]:
f_btdf = filtering.filter(btdf.drop('leaving_datetime', axis=1), max_speed_kmh=500.)
f_btdf.head()

Unnamed: 0,uid,datetime,lat,lng,location id
0,0,2009-05-25 20:56:10+00:00,37.774929,-122.419415,ee81ef22a22411ddb5e97f082c799f59
1,0,2009-05-25 21:35:28+00:00,37.600747,-122.382376,248b82709e6c11ddbf68003048c0801e
2,0,2009-05-25 21:37:44+00:00,37.600747,-122.382376,248b82709e6c11ddbf68003048c0801e
3,0,2009-05-25 21:42:47+00:00,37.600747,-122.382376,248b82709e6c11ddbf68003048c0801e
4,0,2009-05-25 22:13:23+00:00,37.615223,-122.389979,be2f1e669cc111dd9a50003048c0801e


In [46]:
print('Points of the raw trajectory: %s.'%len(btdf))
print('Points of the filtered trajectory: %s.'%len(f_btdf))

Points of the raw trajectory: 10000.
Points of the filtered trajectory: 9727.


## Compression
Reduce the number of points of the trajectory, preserving the structure.

In [47]:
cf_btdf = compression.compress(f_btdf, spatial_radius_km=0.5)
cf_btdf.head()

Unnamed: 0,uid,datetime,lat,lng,location id
0,0,2009-05-25 20:56:10+00:00,37.774929,-122.419415,ee81ef22a22411ddb5e97f082c799f59
1,0,2009-05-25 21:35:28+00:00,37.600747,-122.382376,248b82709e6c11ddbf68003048c0801e
2,0,2009-05-25 22:13:23+00:00,37.615223,-122.389979,be2f1e669cc111dd9a50003048c0801e
3,0,2009-05-26 02:21:12+00:00,39.878664,-104.682105,e12721ce84e911dd8019003048c0801e
4,0,2009-05-26 04:59:44+00:00,39.739154,-104.984703,ee8b1d0ea22411ddb074dbd65f1665cf


In [48]:

print('Points of the filtered trajectory: %s.'%len(f_btdf))
print('Points of the compressed trajectory: %s.'%len(cf_btdf))

Points of the filtered trajectory: 9727.
Points of the compressed trajectory: 5790.


### Visualise the filtered and compressed trajectories
Show the first 10000 points of the filtered trajectory.

In [49]:
user, imin, imax = 1, 0, 100
dt_start = f_btdf[f_btdf['uid'] == user]['datetime'].min()
dt_end = f_btdf[f_btdf['uid'] == user]['datetime'].max()

filtered_tdf = f_btdf[(f_btdf['datetime'] >= dt_start) \
                 & (f_btdf['datetime'] <= dt_end) \
                 & (f_btdf['uid'] == user)]

compressed_tdf = cf_btdf[(cf_btdf['datetime'] >= dt_start) \
                  & (cf_btdf['datetime'] <= dt_end) \
                  & (cf_btdf['uid'] == user)]

In [50]:
print(len(filtered_tdf), len(compressed_tdf))
filtered_tdf.head()

1179 567


Unnamed: 0,uid,datetime,lat,lng,location id
2056,1,2009-03-30 05:21:35+00:00,37.63049,-122.411084,ee81e0b8a22411dda6b9334d03ec0fee
2057,1,2009-03-30 17:25:55+00:00,37.584103,-122.366083,ee6b8534a22411dd96da6f185082f76e
2058,1,2009-03-31 02:58:28+00:00,37.385773,-121.898845,ee87ececa22411dd95d423e7290d3385
2059,1,2009-03-31 05:17:35+00:00,37.654656,-122.40775,ee8403d4a22411dd8fd49f8ab316fd86
2060,1,2009-03-31 16:35:49+00:00,37.654656,-122.40775,ee8403d4a22411dd8fd49f8ab316fd86


In [51]:
map_f = filtered_tdf.plot_trajectory(zoom=9, max_points=None, weight=5, hex_color='black', opacity=0.5, start_end_markers=False)
compressed_tdf.plot_trajectory(map_f=map_f, max_points=None, hex_color='red', start_end_markers=False)

In [52]:
from skmob.tessellation import tilers
from skmob.utils import plot
sm_tess = tilers.tiler.get('squared', base_shape='San Mateo, USA', meters=5000)

In [53]:
map_filtered_tdf = filtered_tdf.mapping(sm_tess, remove_na=True)
map_compressed_tdf = compressed_tdf.mapping(sm_tess, remove_na=True)
map_compressed_tdf.head()

Unnamed: 0,uid,datetime,lat,lng,location id,tile_ID
1175,1,2009-03-30 05:21:35+00:00,37.63049,-122.411084,ee81e0b8a22411dda6b9334d03ec0fee,49
1176,1,2009-03-30 17:25:55+00:00,37.584103,-122.366083,ee6b8534a22411dd96da6f185082f76e,67
1178,1,2009-03-31 05:17:35+00:00,37.654656,-122.40775,ee8403d4a22411dd8fd49f8ab316fd86,68
1179,1,2009-03-31 17:03:48+00:00,37.616602,-122.404417,ee789224a22411ddb18c3b583589fb8a,67
1180,1,2009-03-31 17:18:48+00:00,37.584103,-122.366083,ee6b8534a22411dd96da6f185082f76e,67


In [54]:
map_f = plot.plot_gdf(sm_tess, zoom=9, style_func_args={'color':'gray', 'fillColor':'gray', 'opacity':0.2})
map_f = map_filtered_tdf.plot_trajectory(map_f=map_f, max_points=None, weight=5, hex_color='black', opacity=0.5)
map_compressed_tdf.plot_trajectory(map_f=map_f, max_points=None, hex_color='red')

## Split trajectory by day

In [55]:
from skmob.utils import utils
groups = utils.group_df_by_time(map_compressed_tdf, 
                        offset_value=3, offset_unit='hours', add_starting_location=True)

In [56]:
map_f = groups[0].plot_trajectory(start_end_markers=False, hex_color='red', weight=3)
map_f = groups[1].plot_trajectory(map_f=map_f, start_end_markers=False, hex_color='blue', weight=3)
map_f = groups[5].plot_trajectory(map_f=map_f, start_end_markers=False, hex_color='green', weight=3)
map_f