Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/_static/nomad-logo-black.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 9 additions & 55 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,6 @@
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('../src'))

# Custom function to get clean module names for autosummary
def autosummary_import(module_name):
"""Import module and return it, used by autosummary."""
return __import__(module_name, fromlist=[''])

# Custom function to format module names in autosummary table
def get_module_short_name(fullname):
"""Return the last component of a module name."""
return fullname.split('.')[-1]


# -- Project information -----------------------------------------------------

Expand All @@ -48,8 +38,6 @@ def get_module_short_name(fullname):
"sphinx.ext.todo",
"sphinx.ext.viewcode",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon", # Support for Google/NumPy style docstrings
'sphinx_rtd_theme',
'sphinx_design'
]
Expand All @@ -65,48 +53,6 @@ def get_module_short_name(fullname):
# Don't execute notebooks during build (use pre-executed outputs)
nbsphinx_execute = 'never'

# -- Options for autodoc ------------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#configuration

# Automatically generate summaries for autosummary
autosummary_generate = True

# Order members by source order, not alphabetically
autodoc_member_order = 'bysource'

# Include both class and __init__ docstrings
autoclass_content = 'both'

# Don't prepend module names to object names (cleaner display)
add_module_names = False

# Default options for autodoc directives
autodoc_default_options = {
'members': True,
'member-order': 'bysource',
'special-members': '__init__',
'undoc-members': False,
'exclude-members': '__weakref__'
}

# -- Options for Napoleon (Google/NumPy style docstrings) ---------------------
# https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html

napoleon_google_docstring = True
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = False
napoleon_include_private_with_doc = False
napoleon_include_special_with_doc = True
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = False
napoleon_use_admonition_for_references = False
napoleon_use_ivar = False
napoleon_use_param = True
napoleon_use_rtype = True
napoleon_preprocess_types = False
napoleon_type_aliases = None
napoleon_attr_annotations = True


# -- Options for HTML output -------------------------------------------------

Expand All @@ -115,7 +61,15 @@ def get_module_short_name(fullname):
#
html_theme = 'sphinx_rtd_theme'

# Theme options to customize the appearance
html_theme_options = {
'logo_only': True, # Only show the logo, not the project name
'display_version': False, # Don't display version info
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ['_static']

html_logo = "_static/nomad-logo-black.png"
100 changes: 77 additions & 23 deletions docs/stop_detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,61 +24,115 @@ A comparison of the stop-detection algorithms in NOMAD is shown below.

.. list-table::
:header-rows: 1
:widths: 20 25 25 30
:widths: 20 25 55

* - Method name
- Parameters
- Use case
- Characteristics
- Description

* - :ref:`DBSCAN <dbscan_stop>`
- eps (radius), min_samples
- General-purpose, uniform density, fast computation
- Density-based, finds stops as spatial clusters
* - :ref:`ta_dbscan <dbscan_stop>`
- dist_thresh, min_pts, time_thresh
- Density-based clustering that finds stops as spatial clusters where points are within distance and time thresholds

* - :ref:`HDBSCAN <hdbscan_stop>`
- min_cluster_size, min_samples
- Variable density stops, automatic parameter selection
- Hierarchical density-based, adapts to density variations
* - :ref:`st_hdbscan <hdbscan_stop>`
- time_thresh, min_pts, min_cluster_size
- Hierarchical density-based clustering that adapts to varying density levels and automatically selects clusters

* - :ref:`Grid-based <grid_based_stop>`
- grid_resolution, min_duration
- Privacy-preserving, sparse data, real-time processing
- Tessellation-based, privacy-aware, deterministic
* - :ref:`grid_based <grid_based_stop>`
- time_thresh, min_cluster_size, dur_min
- Tessellation-based approach that groups consecutive pings in the same spatial cell for privacy-aware stop detection

* - :ref:`Lachesis <lachesis_stop>`
- distance_threshold, time_threshold
- Trajectory segmentation, sparse temporal sampling
- Sequential algorithm, time-aware, good for low-frequency GPS
* - :ref:`lachesis <lachesis_stop>`
- dt_max, delta_roam, dur_min
- Sequential algorithm that scans trajectories chronologically, identifying stops based on spatial diameter and temporal gaps


.. _dbscan_stop:

DBSCAN
======

To be implemented.
The TA-DBSCAN (Temporal-Augmented DBSCAN) algorithm is an adaptation of DBSCAN. Unlike in plain DBSCAN, we also incorporate the time dimension to determine if two pings are "neighbors". This implementation relies on 3 parameters:

* ``time_thresh`` defines the maximum time difference (in minutes) between two consecutive pings for them to be considered neighbors within the same cluster.
* ``dist_thresh`` specifies the maximum spatial distance (in meters) between two pings for them to be considered neighbors.
* ``min_pts`` sets the minimum number of neighbors required for a ping to form a cluster.

Notice that this method also works with **geographic coordinates** (lon, lat), using Haversine distance.

**Source:** Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. *Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96)*, 226-231.

.. figure:: _images/source_tadbscan_demo_3_0.png
:target: source/tadbscan_demo.html
:align: center
:width: 80%

TA-DBSCAN stops with post-processing. See the :doc:`full example <source/tadbscan_demo>`.


.. _hdbscan_stop:

HDBSCAN
=======

To be implemented.
The HDBSCAN algorithm constructs a hierarchy of non-overlapping clusters from different radius values and selects those that maximize stability.

**Source:** Campello, R. J., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. *Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)*, 160-172.

.. figure:: _images/source_hdbscan_demo_3_0.png
:target: source/hdbscan_demo.html
:align: center
:width: 80%

HDBSCAN stops with post-processing. See the :doc:`full example <source/hdbscan_demo>`.


.. _grid_based_stop:

Grid-based
==========

To be implemented.
The stop detection algorithms implemented in ``nomad`` support different combinations of input formats that are common in commercial datasets, detecting default names when possible:

- timestamps in ``datetime64[ns, tz]`` or as unix seconds in integers
- geographic coordinates (``lon``, ``lat``) which use the Haversine distance or projected coordinates (``x``, ``y``) using meters and euclidean distance.
- Alternatively, if locations are only given through a spatial index like H3 or geohash, there is a **grid_based** clustering algorithm requiring no coordinates.

The algorithms work with the same call, provided there is at least a pair of coordinates (or a location/spatial index) as well as at least a temporal column.

**Source:** NOMAD implementation using spatial tessellation (H3, S2) for trajectory segmentation.

.. figure:: _images/source_grid_based_demo_3_0.png
:target: source/grid_based_demo.html
:align: center
:width: 80%

Grid-based stops. See the :doc:`full example <source/grid_based_demo>`.


.. _lachesis_stop:

Lachesis
========

To be implemented.
The Lachesis algorithm is a sequential algorithm inspired by the one in *Project Lachesis: Parsing and Modeling Location Histories* (Hariharan & Toyama). This algorithm for extracting stays is dependent on two parameters: the roaming distance and the stay duration.

* Roaming distance represents the maximum distance an object can move away from a point location and still be considered to be staying at that location.
* Stop duration is the minimum amount of time an object must spend within the roaming distance of a location to qualify as a stop.

The algorithm identifies stops as contiguous sequences of pings that stay within the roaming distance for at least the duration of the stop duration.

This algorithm has the following parameters, which determine the size of the resulting stops:

* ``dur_min``: Minimum duration for a stay in minutes.
* ``dt_max``: Maximum time gap permitted between consecutive pings in a stay in minutes (dt_max should be greater than dur_min).
* ``delta_roam``: Maximum roaming distance for a stay in meters.

**Source:** Hariharan, R., & Toyama, K. (2004). Project Lachesis: Parsing and modeling location histories. *International Conference on Geographic Information Science*, 106-124.

.. figure:: _images/source_lachesis_demo_3_0.png
:target: source/lachesis_demo.html
:align: center
:width: 80%

Lachesis stops. See the :doc:`full example <source/lachesis_demo>`.