Handling Large Datasets

Often, one desires to use tobac to identify and track features in large datasets ("big data"). This documentation strives to suggest various methods for doing so efficiently. Current versions of tobac do not allow for out-of-memory computation, meaning that these strategies may need to be employed for both computational and memory reasons.

Split Feature Detection

Current versions of threshold feature detection (see feature_detection_overview) are time independent, meaning that one can parallelize feature detection across all times (although not across space). tobac provides the :pytobac.utils.combine_tobac_feats function to combine a list of dataframes produced by a parallelization method (such as jug or multiprocessing.pool) into a single combined dataframe suitable to perform tracking with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big_datasets.rst

big_datasets.rst

Handling Large Datasets

Split Feature Detection

Files

big_datasets.rst

Latest commit

History

big_datasets.rst

File metadata and controls

Handling Large Datasets

Split Feature Detection