Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Store pipeline state + switch to argparse
The raw data and the analysis results do not constitute the entire state of a pipeline. In particular, if we store only the raw + analysis results, and then we try to run the pipeline again, we will end up with two copies of the analysis results. Instead, when we transfer data, it should include the raw data, the pipeline state, and the analysis results. Change this code to store the pipeline state as well. And since I am in there changing things anyway, switch to argparse to handle the arguments as well. ``` $ ./e-mission-py.bash bin/debug/extract_timeline_for_day_range_and_user.py -e test_output_gen_curr_ts -- 2010-01-01 2020-01-01 /tmp/test_dump storage not configured, falling back to sample, default configuration Connecting to database URL localhost INFO:root:================================================== INFO:root:Extracting timeline for user d4dfcc42-b6fc-4b6b-a246-d1abec1d039f day 2010-01-01 -> 2020-01-01 and saving to file /tmp/test_dump DEBUG:root:start_day_ts = 1262304000 (2010-01-01T00:00:00+00:00), end_day_ts = 1577836800 (2020-01-01T00:00:00+00:00) DEBUG:root:curr_query = {'user_id': UUID('d4dfcc42-b6fc-4b6b-a246-d1abec1d039f'), 'data.ts': {'$lte': 1577836800, '$gte': 1262304000}}, sort_key = data.ts DEBUG:root:orig_ts_db_keys = None, analysis_ts_db_keys = None DEBUG:root:finished querying values for None DEBUG:root:finished querying values for None DEBUG:root:curr_query = {'user_id': UUID('d4dfcc42-b6fc-4b6b-a246-d1abec1d039f'), 'data.start_ts': {'$lte': 1577836800, '$gte': 1262304000}}, sort_key = data.start_ts DEBUG:root:orig_ts_db_keys = None, analysis_ts_db_keys = None DEBUG:root:finished querying values for None DEBUG:root:finished querying values for None DEBUG:root:curr_query = {'user_id': UUID('d4dfcc42-b6fc-4b6b-a246-d1abec1d039f'), 'data.enter_ts': {'$lte': 1577836800, '$gte': 1262304000}}, sort_key = data.enter_ts DEBUG:root:orig_ts_db_keys = None, analysis_ts_db_keys = None DEBUG:root:finished querying values for None DEBUG:root:finished querying values for None INFO:root:Found 1449 loc entries, 27 trip-like entries, 19 place-like entries = 1495 total entries INFO:root:timeline has unique keys = {'stats/server_api_error', 'statemachine/transition', 'analysis/cleaned_stop', 'background/filtered_location', 'segmentation/raw_trip', 'background/location', 'segmentation/raw_stop', 'segmentation/raw_section', 'stats/client_time', 'background/motion_activity', 'analysis/recreated_location', 'segmentation/raw_place', 'analysis/cleaned_trip', 'background/battery', 'analysis/cleaned_section', 'stats/server_api_time', 'analysis/cleaned_place', 'stats/pipeline_time', 'stats/client_nav_event'} INFO:root:Found 6 pipeline states [6, 1, 2, 3, 11, 9] $ ls -1 /tmp/test_dump_* /tmp/test_dump_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz /tmp/test_dump_pipelinestate_d4dfcc42-b6fc-4b6b-a246-d1abec1d039f.gz ```
- Loading branch information