## Design of Experiments (Method-focused)

Purpose: validate maximal continuous reasons and anti-reasons on a representative subset of UCR univariate time-series datasets, using our distributed method.

- Selection source: the dataset notebook builds `results/doe_selection.csv`, stratified by EU complexity (simple/middle/hard). This notebook reuses that file; it does not re-derive or tabulate dataset properties here.
- Inclusion criteria: EU metrics present in `forest_report.json`; fixed train/test splits; reasonable class balance; preference for datasets with existing result archives in `results/` to accelerate verification.
- Cohorts: simple (lowest eu_complexity), middle (around the median), hard (highest eu_complexity); balanced sample sizes subject to availability.
- Model setup: Random Forest (50 trees) trained per dataset (or reused), endpoints universe (EU) extracted from internal thresholds.
- Method: compute reasons via distributed workers coordinated by a key–value store; prune via ICF and profile domination; confirm maximality by attempted left/right feature extensions.
- Validation outcomes: for each dataset, persist reason/anti-reason artifacts and profiles (GP/BP/AP), and logs for reproducibility; aggregate metrics are summarized in the dataset notebook only.
- Compute considerations: prefer workers collocated with the key–value store to minimize latency; 4–8 host workers recommended; remote workers optional when horizontally scaling.


# Results archive inspection

This notebook automatically inspects every `.zip` file stored in the `results` directory.
It parses the filename of each archive to extract useful metadata, relies on the included
`manifest.json` file to map Redis database dumps to their logical meaning, and previews
all extracted files directly below. Large files are truncated to the first bytes so the
notebook stays responsive.

In [1]:
from pathlib import Path
from etl.loader import etl, build_db10_worker_report, render_bitmap_heatmaps, render_db0_eu_analysis, \
	render_db0_sample_timeseries, render_worker_report

RESULTS_DIR = Path("results")
zip_paths = sorted(RESULTS_DIR.glob("*.zip"))
selected_zip_name,selected_manifest, selected_backups, selected_archive_data, selected_manifest_prefix = etl(zip_paths, RESULTS_DIR)

No ZIP archives found in results directory.
No archive selected.
No DB 0 data available for the current selection.


In [2]:
DB10_WORKER_REPORT = build_db10_worker_report(selected_zip_name, selected_manifest, selected_backups, selected_archive_data, selected_manifest_prefix, max_events=None)

DB 10 statistics are not available.


In [3]:
render_worker_report(selected_zip_name, selected_manifest, selected_backups, selected_archive_data, selected_manifest_prefix)

DB 10 statistics are not available.
No data available for the worker report.


In [4]:
render_db0_eu_analysis()

DB 0 EU entry not available for the current selection.


In [5]:
render_db0_sample_timeseries()

No DB 0 sample time series available.


In [6]:
render_bitmap_heatmaps(selected_manifest, selected_backups)

Bitmap heatmaps unavailable: no backups loaded.
