## Notebook to test the effectiveness of the Stay Point Detection Algorithm (SPDA) <a name="top"></a>

This algorithm was proposed by Microsoft and modified by a group of researchers from China ([link to research paper](https://github.com/tyqiangz/Trajectory-Data-Mining/blob/master/Useful%20Research%20Materials/Stay%20Point%20Analysis%20in%20Automatic%20Identification%20System%20Trajectory%20Data.pdf)) to detect regions (called **stay points**) from a dataset of records with `timestamp`, `latitude`, `longitude` variables. Stay points are regions where moving objects are relatively stationary within a region of size not more than `distThres` metres, have stayed there for at least `timeThres` seconds and have at least `minPoints` number of geolocation records consecutively recorded in that region.

For details of the algorithm, read Section 4.2. of the paper or read the code [below](#SPDA).

<hr></hr>

The researchers have proposed to set the parameters as `distThres=200, timeThres=30*60, minPoints=50`. Depending on the density of the geolocation records, these parameters may not be optimal. In this notebook I list down what kind of travelling patterns will have stay points detected, which doesn't get detected.

**Visualisations of common scenarios where SPDA is accurate:**

- [Scenario #1](#scenario1): The object moved then stayed completely stationary for a while, then moved away. 1 stay point detected.
- [Scenario #2](#scenario2): The object moved then loitered around a few buildings for a while, then moved away. 1 stay point detected.
- [Scenario #3](#scenario3): The object is jumping back and forth over a large distance. No stay point detected.
- [Scenario #4](#scenario4): The object is jumping back and forth along a road. 1 stay point detected.
- [Scenario #5](#scenario5): The object is jumping around in a square shape pattern over a large distance. No stay point detected.
- [Scenario #6](#scenario6): The object is jumping around in a square shape pattern over a small distance. 1 stay point detected.

**Bonus**
- [Bonus scenario](#bonus_scenario): The object is a college student from Tsinghua university. The dataset is contributed by [Microsoft GeoLife](https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/).

<hr> </hr>

**Plotting style for all plots:**
- Start point: Blue marker
- End point: Blue marker
- Stay point: Blue circle
- All other points: Black circle

In [2]:
import pandas as pd
import numpy as np
from statistics import mean, median
from math import radians, cos, sin, asin, sqrt
from datetime import datetime, timedelta
import folium
from folium import plugins
import os

The following is the **Stay Point Detection Algorithm** and some auxiliary functions that SPDA relies on. <a name="SPDA"></a>

In [3]:
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance (in metres) between two points on the earth (specified in decimal degrees)
    
    :param lon1: longitude of point 1
    :param lat1: longitude of point 1
    :param lon2: longitude of point 2
    :param lat2: longitude of point 2
    :return: the distance between (lon1, lat1) and (lon2, lat2), in metres
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6378.1 # Radius of earth in kilometers. Use 3956 for miles
    return c * r * 1000

class stayPoint:
    def __init__(self, arrivalTime, departTime, startIndex, endIndex, location):
        '''
        :param arrivalTime: The time when the moving object arrived at this stay point.
        :param departTime: The time when the moving object departed this stay point.
        :param startIndex: The index in the object's trajectory dataset corresponding to `arrivalTime`.
        :param endIndex: The index in the object's trajectory dataset corresponding to `departTime`.
        :param location: The [`lon`, `lat`] values corresponding to the location of this stay point.
        '''
        self.arrivalTime = arrivalTime
        self.departTime = departTime
        self.startIndex = startIndex
        self.endIndex = endIndex
        self.location = location
        
    def toString(self):
        '''
        prints all the information about this stay point.
        '''
        print(f"(arrivalTime: {self.arrivalTime}, departTime: {self.departTime}, startIndex: {self.startIndex}, "+
            f"endIndex: {self.endIndex}, location: {self.location})")
        
def SPDA(traj, distThres=200, timeThres=30*60, minPoints=50):
    '''
    :param traj: a dataframe with `lat`, `lon` and `time` variables
    :param distThres: a threshold of the distance (in metres)
    :param timeThres: a threshold of the time (in seconds)
    :param minPoints: the minimum no. of points required in a stay-point region
    :output: a list of stay-points
    '''
    def distance(pointA, pointB):
        '''
        :param pointA: a point with lat and lon variables
        :param pointB: a point with lat and lon variables
        :return: the distance between pointA and pointB calculated by Haversine formula
        '''
        return haversine(pointA.lon, pointA.lat, pointB.lon, pointB.lat)
    
    def getCentroid(points, centroid_type):
        '''
        :param points: a list of points with lat and lon variables
        :param centroid_type: "median" or "mean"
        :return: the centre of the list of points, calculated by centroid_type function
        '''
#         print("centroid points:\n", points)
        if centroid_type == "median":
            return [median(points.loc[:,"lon"]), median(points.loc[:,"lat"])]
        elif centroid_type == "mean":
            return [mean(points.loc[:,"lon"]), mean(points.loc[:,"lat"])]
        
    i = 0
    pointNum = len(traj)
    stayPoints = []
    
    while i < pointNum:
        j = i+1
        token = 0
        while j < pointNum:
            print("Analysing point: " + str(j) + " "*10, "\r", end="")
            dist = distance(traj.iloc[j,:], traj.iloc[i,:])
            if dist > distThres:
                timeDiff = (traj.time[j] - traj.time[i]).total_seconds()
                if (timeDiff > timeThres) and (j-i >= minPoints):
                    centroid = getCentroid(traj.loc[i:(j-1),:], "median")
                    stayPoints.append(
                        stayPoint(
                            arrivalTime = traj.time[i], 
                            departTime = traj.time[j], 
                            startIndex = i,
                            endIndex = j,
                            location = centroid
                        )
                    )
                    
                    i = j
                    token = 1
                break
            j += 1
            
        if token != 1:
            i += 1
            
    return stayPoints

A function to make plotting more convenient. 

I am using the `folium.plugins.TimestampedGeoJson` function to animate the travel history and stay points. `folium.plugins.TimestampedGeoJson` takes in a [GeoJson](https://en.wikipedia.org/wiki/GeoJSON) object with timestamps as input and returns an animation of the travel.

In [48]:
def plot(traj, stayPoints, zoom_start=10, period='PT1M', add_last_point=True):
    '''
    :param traj: a dataframe with variables `time`, `lat`, `lon`
    :param stayPoints: a list of objects of `stayPoint` class
    :param zoom_start: integer between 0 and 18, determines how much you want to zoom into the map, 18 is the maximum you can zoom in.
    :param period: the period of the animation, default is 1 minute.
    :return m: an animation of the travel history and the stay points.
        Legend: 
            stay point: blue circle
            start point: blue marker
            end point: blue marker
            other points: black circle
    '''
    
    m = folium.Map(location=[traj.lat[0], traj.lon[0]], zoom_start=zoom_start)
    
    features = []
    
    startingFeature = {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            'coordinates': [traj.lon[0], traj.lat[0]]
        },
        "properties": {
            "time": str(traj.time[0]),
            "icon": "Marker",
            "style": {"color": "green"}
        }
    }
    
    features.append(startingFeature)
    
    endingFeature = {
        "type": "Feature",
        "geometry": {
            "type": "Point",
            'coordinates': [traj.lon[len(traj)-1], traj.lat[len(traj)-1]]
        },
        "properties": {
            "time": str(traj.time[len(traj)-1]),
            "icon": "Marker",
            "style": {"color": "red"}
        }
    }
    
    for i in range(1, len(traj)-1):
        feature = {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                'coordinates': [traj.lon[i], traj.lat[i]]
            },
            "properties": {
                "time": str(traj.time[i]),
                "icon": "circle",
                "style": {"color": "black"}
            }
        }
        features.append(feature)
    
    features.append(endingFeature)
    
    
    for stayPoint in stayPoints:
        feature = {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [stayPoint.location[0], stayPoint.location[1]]
            },
            "properties": {
                "time": str(stayPoint.arrivalTime),
                "icon": "circle",
                "style": {"color": "blue"}
            }
        }
        features.append(feature)
    
    plugins.TimestampedGeoJson({
        'type': 'FeatureCollection',
        'features': features,
    }, period=period, add_last_point=False, loop=False, auto_play=False).add_to(m)
    
    # adds a fullscreen button at the 'topright' corner of the plot
    plugins.Fullscreen(position='topright').add_to(m)
    
    return m

All travel histories start at 2020 Jan 1st. You can change it to whatever date you want.

In [5]:
STARTDATE = datetime(2020, 1, 1)

## Scenario 1 <a name="scenario1"></a>

The object moved then **stayed stationary** for a while, then moved away, **1 STAY POINT DETECTED**.

[Back to top](#top)

In [6]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.3521, 103.8198]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0, size=2)
    traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7414+randSmallNums[1]]
    
traj.loc[60, :]  = [traj.time[60-1] + timedelta(seconds=60), 1.5021, 104]

In [7]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.3521,103.82
1,2020-01-01 00:01:00,1.4927,103.741
2,2020-01-01 00:02:00,1.4927,103.741
3,2020-01-01 00:03:00,1.4927,103.741
4,2020-01-01 00:04:00,1.4927,103.741


In [8]:
traj.tail()

Unnamed: 0,time,lat,lon
56,2020-01-01 00:56:00,1.4927,103.741
57,2020-01-01 00:57:00,1.4927,103.741
58,2020-01-01 00:58:00,1.4927,103.741
59,2020-01-01 00:59:00,1.4927,103.741
60,2020-01-01 01:00:00,1.5021,104.0


In [9]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [10]:
plot(traj, stayPoints, zoom_start=11)

## Scenario 2 <a name="scenario2"></a>

The object moved then **loitered around a few buildings** for a while, then moved away. **1 STAY POINT DETECTED**

[Back to top](#top)

In [11]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4927, 103.75]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.001, size=2)
    traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7414+randSmallNums[1]]
    
traj.loc[60, :]  = [traj.time[60-1] + timedelta(seconds=60), 1.36, 103.71]

In [12]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4927,103.75
1,2020-01-01 00:01:00,1.49366,103.742
2,2020-01-01 00:02:00,1.49327,103.742
3,2020-01-01 00:03:00,1.49287,103.742
4,2020-01-01 00:04:00,1.4935,103.741


In [13]:
traj.tail()

Unnamed: 0,time,lat,lon
56,2020-01-01 00:56:00,1.49335,103.742
57,2020-01-01 00:57:00,1.49342,103.742
58,2020-01-01 00:58:00,1.49292,103.742
59,2020-01-01 00:59:00,1.49348,103.742
60,2020-01-01 01:00:00,1.36,103.71


In [14]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [15]:
plot(traj, stayPoints, zoom_start=16)

## Scenario 3 <a name="scenario3"> </a>

The object is jumping **back and forth** over a **large distance**. **NO STAY POINTS DETECTED**

[Back to top](#top)

In [16]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4521, 103.8198]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.05, size=2)
    
    if i % 2 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7414+randSmallNums[1]]
    else:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4521+randSmallNums[0], 103.8198+randSmallNums[1]]

In [17]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4521,103.82
1,2020-01-01 00:01:00,1.53609,103.761
2,2020-01-01 00:02:00,1.48384,103.85
3,2020-01-01 00:03:00,1.5265,103.786
4,2020-01-01 00:04:00,1.50125,103.838


In [18]:
traj.tail()

Unnamed: 0,time,lat,lon
55,2020-01-01 00:55:00,1.50402,103.758
56,2020-01-01 00:56:00,1.45299,103.869
57,2020-01-01 00:57:00,1.50423,103.784
58,2020-01-01 00:58:00,1.46915,103.852
59,2020-01-01 00:59:00,1.5237,103.763


In [19]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [20]:
plot(traj, stayPoints, zoom_start=12)

## Scenario 4 <a name="scenario4"></a>

The object is jumping **back and forth** along a **road** (less than 200m). **1 STAY POINT DETECTED**

[Back to top](#top)

In [21]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4927, 103.7359]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.001, size=2)
    
    if i % 2 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4917+randSmallNums[0], 103.7354+randSmallNums[1]]
    else:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7359+randSmallNums[1]]
        
traj.loc[60, :]  = [traj.time[60-1] + timedelta(seconds=60), 1.5, 104]

In [22]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4927,103.736
1,2020-01-01 00:01:00,1.49194,103.736
2,2020-01-01 00:02:00,1.49361,103.737
3,2020-01-01 00:03:00,1.49234,103.736
4,2020-01-01 00:04:00,1.4934,103.736


In [23]:
traj.tail()

Unnamed: 0,time,lat,lon
56,2020-01-01 00:56:00,1.4931,103.736
57,2020-01-01 00:57:00,1.4917,103.736
58,2020-01-01 00:58:00,1.49367,103.737
59,2020-01-01 00:59:00,1.49185,103.736
60,2020-01-01 01:00:00,1.5,104.0


In [24]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [25]:
plot(traj, stayPoints, zoom_start=18)

## Scenario 5 <a name="scenario5"></a>

The object is jumping **around in a square shape pattern** over a **large distance**. **NO STAY POINT DETECTED**

[Back to top](#top)

In [26]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4627, 103.7359]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.005, size=2)
    
    if i % 4 == 0:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7359+randSmallNums[1]]
    elif i % 4 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4697+randSmallNums[0], 103.7359+randSmallNums[1]]
    elif i % 4 == 2:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4697+randSmallNums[0], 103.7459+randSmallNums[1]]
    elif i % 4 == 3:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7459+randSmallNums[1]]

In [27]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4627,103.736
1,2020-01-01 00:01:00,1.47468,103.741
2,2020-01-01 00:02:00,1.47227,103.75
3,2020-01-01 00:03:00,1.4628,103.75
4,2020-01-01 00:04:00,1.46383,103.74


In [28]:
traj.tail()

Unnamed: 0,time,lat,lon
55,2020-01-01 00:55:00,1.4651,103.751
56,2020-01-01 00:56:00,1.46609,103.74
57,2020-01-01 00:57:00,1.47399,103.74
58,2020-01-01 00:58:00,1.47203,103.747
59,2020-01-01 00:59:00,1.4651,103.751


In [29]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [30]:
plot(traj, stayPoints, zoom_start=14)

## Scenario 6 <a name="scenario6"></a>

The object is jumping **around in a square shape pattern** over a **small distance**. **1 STAY POINT DETECTED**.

[Back to top](#top)

In [31]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4627, 103.7359]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.0005, size=2)
    
    if i % 4 == 0:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7399+randSmallNums[1]]
    elif i % 4 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4637+randSmallNums[0], 103.7399+randSmallNums[1]]
    elif i % 4 == 2:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4637+randSmallNums[0], 103.7409+randSmallNums[1]]
    elif i % 4 == 3:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7409+randSmallNums[1]]
    
traj.loc[60, :] = [traj.time[i-1] + timedelta(seconds=60), 1.46, 103.75]

In [32]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [33]:
plot(traj, stayPoints, zoom_start=16)

## Bonus scenario <a name="bonus_scenario"></a>

The object is a college student from Tsinghua university.

[Back to top](#top)

In [34]:
GEOLIFE_DATA_PATH =r"C:\Users\Tay\Documents\GitHub\Trajectory-Data-Mining\Geolife Trajectories 1.3\Data"

In [35]:
users = os.listdir(GEOLIFE_DATA_PATH)
print("Number of users in Geolife dataset: " + str(len(users)))

Number of users in Geolife dataset: 182


In [36]:
def readUserTraj(path_to_user):
    '''
    :param path_to_user: a path to the user's dataset according to the Geolife dataset file system
    :return userTraj: a dataframe containing all of the user's trajectories. 
    Note that a person can have more than 1 disconnected trajectory, they're all joined together in `userTraj`.
    '''
    trajs = os.listdir(path_to_user + r"\Trajectory")
    col_names = ["lat", "lon", "alt", "date", "time"]
    userTraj = pd.DataFrame(columns=["lat", "lon", "alt", "time"])
    
    for i in range(len(trajs)):
        TRAJ_PATH = path_to_user + r"\Trajectory" + r"\\" + trajs[i]
        traj = pd.read_csv(TRAJ_PATH, skiprows=6, header=None, usecols=[0,1,3,5,6], names=col_names)

        # to combine date and time and change it to `datetime` type
        for i in range(len(traj)):
            traj.loc[i, "time"] = pd.to_datetime(traj.loc[i, "date"] + " " + traj.loc[i, "time"])

        traj = traj.drop(columns="date")
        userTraj = userTraj.append(traj)
    
    userTraj = userTraj.reset_index(drop=True)
    
    return userTraj

In [37]:
%%time
trajDataFrame = readUserTraj(GEOLIFE_DATA_PATH + r"\\" + users[1])

Wall time: 1min 14s


In [38]:
trajDataFrame

Unnamed: 0,lat,lon,alt,time
0,39.984094,116.319236,492,2008-10-23 05:53:05
1,39.984198,116.319322,492,2008-10-23 05:53:06
2,39.984224,116.319402,492,2008-10-23 05:53:11
3,39.984211,116.319389,492,2008-10-23 05:53:16
4,39.984217,116.319422,491,2008-10-23 05:53:21
...,...,...,...,...
108602,39.977969,116.326651,311,2008-12-15 00:30:58
108603,39.977946,116.326653,310,2008-12-15 00:31:03
108604,39.977897,116.326624,310,2008-12-15 00:31:08
108605,39.977882,116.326626,310,2008-12-15 00:31:13


The dataset of size 108,607 is too large for visualisation. Note that the data is very dense (geolocation records are only 5 seconds apart) so I just took every tenth record of the dataset to make visualisation easier instead.

In [39]:
%%time
stayPoints = SPDA(trajDataFrame)

Wall time: 29min 39s606                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

In [40]:
print("Number of staypoints: " + str(len(stayPoints)))

Number of staypoints: 116


In [49]:
plot(trajDataFrame[::10].reset_index(drop=True), stayPoints, zoom_start=12, period='PT1H', add_last_point=False)