This repository shares the stop location algorithm, the aggregated data and the notebooks to repeat some of the experiments of the manuscript.
We code shared here is based on the paper "Project Lachesis: Parsing and Modeling Location Histories", implemented in a distributed fashion with Apache Spark.
We assume that you're using Python 3.6.
Then we assume these Python package dependencies:
To have the best performances the input and output data are parquet-formatted files. The input file must be placed in [] with the following format:
user_id: string
timestamp: datetime.datetime
latitude: float
longitude: float
The output file will be placed in the path specified by output_path, and it will have the following format:
user_id: string
timestamp: datetime.datetime
lat: float
lon: float
from: datetime.datetime
to: datetime.datetime
To run the code in Spark, locally (with 10 processes, 15G each), you can run this command:
./bin/spark-submit --master local[10] --conf spark.executor.memory=15G --conf spark.driver.memory=10G pyspark_stop_locations.pyWe shared the code to produce all the plots in the manuscript and supplementary information in notebooks/main_manuscript_plots.ipynb.
The aggregated data is shared in data/plots/ and they can be used on other papers as well, citing our main paper.