Skip to content

Add symmetry tests for downloaded ingested datasets#43

Merged
stevevanhooser merged 25 commits into
mainfrom
claude/add-symmetry-tests-7IWoH
Mar 21, 2026
Merged

Add symmetry tests for downloaded ingested datasets#43
stevevanhooser merged 25 commits into
mainfrom
claude/add-symmetry-tests-7IWoH

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Summary

This PR adds a new symmetry test suite for validating downloaded ingested datasets in the NDI Python stack. The tests mirror the MATLAB equivalents and verify that the Python implementation can correctly open, read, and validate dataset artifacts.

Key Changes

  • New test suite: test_download_ingested.py (make_artifacts)

    • Downloads a pre-built dataset archive (.tgz) from GitHub
    • Extracts and opens the dataset using the NDI Dataset API
    • Generates artifacts including session summaries and document counts
    • Exports results to datasetSummary.json for cross-validation
  • New test suite: test_download_ingested.py (read_artifacts)

    • Parameterized tests that validate artifacts from both MATLAB and Python sources
    • Verifies session lists match expected values
    • Compares per-session summaries using compareSessionSummary()
    • Validates document counts per session
    • Gracefully skips tests when artifacts are unavailable
  • New GitHub Actions workflow: test-symmetry-download.yml

    • Downloads the test dataset archive before running tests
    • Runs Python makeArtifacts suite to generate reference artifacts
    • Runs Python readArtifacts suite to validate the generated artifacts
    • Provides CI/CD integration for symmetry testing

Implementation Details

  • Tests are parameterized over SOURCE_TYPES (matlabArtifacts/pythonArtifacts) to enable cross-platform validation
  • Artifact directory structure mirrors MATLAB test organization for consistency
  • Uses existing NDI utilities (sessionSummary, compareSessionSummary, Query) for validation
  • Graceful error handling with informative skip/fail messages for missing artifacts
  • Dataset ID: 69a8705aa9ab25373cdc6563 (fixed reference dataset)

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt

claude added 25 commits March 19, 2026 22:48
- Add makeArtifacts test that downloads a dataset .tgz from
  Waltham-Data-Science/file-passing, extracts it, opens the dataset,
  and exports session summaries and document counts as artifacts
- Add readArtifacts test that verifies session counts, references,
  session summaries, and document counts from both matlabArtifacts
  and pythonArtifacts sources
- Add GitHub workflow (test-symmetry-download.yml) that downloads
  the dataset with curl before running the symmetry tests

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
The downloadIngested symmetry tests (both MATLAB and Python) need the
69a8705aa9ab25373cdc6563.tgz dataset pre-downloaded to /tmp. MATLAB's
websave doesn't work in GitHub Actions, so this must be done via curl
before either test suite runs.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
…mparison

The database may return probes in different order on re-open or across
languages, causing spurious symmetry test failures. This mirrors the
existing sort for daqSystemNames/daqSystemDetails.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
The existing test-symmetry.yml already runs all symmetry tests
(including the new downloadIngested tests) and now has the curl
download step. A separate workflow is unnecessary.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Three related bugs prevented Python from correctly reading ingested
epochs from the database:

1. document_properties returns a plain dict, but code used attribute
   access (props.base.id, props.epochfiles_ingested.epoch_id, etc.)
   which silently failed. Fixed in ndi_daq_reader, ndi_daq_metadatareader,
   ndi_file_navigator, and ingested2epochs_t0t1_epochclock.

2. session.id is a method but was referenced without calling it
   (self._session.id instead of self._session.id()) in query
   construction and document creation, causing session_id queries
   to always return empty results.

3. Ingested epochprobemap stored as TSV strings were returned raw
   instead of being parsed into ndi_epoch_epochprobemap objects.
   Added _parse_epochprobemap_tsv() and fixed _serialize_epochnode
   to handle lists of probemap objects.

Also sorts ingested epochs by epoch_id for deterministic ordering
matching MATLAB's behavior.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
document_properties returns a plain dict, not an attribute-accessible
object. Fix 2 occurrences in mfdaq.py and update 4 test mocks in
test_daq.py to use dict format instead of MagicMock attribute chains.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Include exp1_eg, exp1_eg_saved, exp_sg, immature, and spikesortdemo
example data directories. These are the same example sessions available
in NDI-matlab and are needed by ingestion symmetry tests and useful for
user exploration.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
- Fix session.id (missing parens) in system.py ingest(), which caused
  session_id mismatch errors during ingestion
- Add _serialize_epochprobemap() to serialize epochprobemap objects to
  TSV strings before storing in ingested documents, preventing JSON
  serialization errors

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
ndr.fun.ndrpath is a module, not a callable. Use
`from ndr.fun.ndrpath import ndrpath` to get the actual function.
This fixes the axon ingestion tests which were skipping because
the ABF example data couldn't be found despite being installed.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
MATLAB uses 'testIngestionAxonNDRArtifacts' (American spelling).
Python was using 'testIngestionAxonNDRArtefacts' (British spelling),
causing MATLAB artifact read tests to skip. Fixed both make and read
test files.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Python's json.dumps serializes NaN as literal NaN (non-standard JSON)
which MATLAB's jsondecode cannot parse. Convert NaN to None (null in
JSON) in daqreader_epochdata_ingested t0_t1 values before storing.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
- Remove @lru_cache from find_file_groups which returned stale file
  lists after raw data deletion, causing ghost epoch detection
- Generate session summary from artifact directory (not original temp
  session) so files/epochNodes match what MATLAB sees when reading
- Write placeholder sessionSummary.json before computing summary so
  the file list includes it
- Re-open session from artifact dir to ensure session doc exists in DB

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
…string

MATLAB stores the subject document ID (e.g. "00064d75bf4c0e79_06c53364fd25a...")
in probe structs, not the local_identifier string (e.g. "anteater27@nosuchlab.org").
Look up the subject document via does_subjectstring_match_session_document() to
match MATLAB's behavior and fix symmetry test failures.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
…ion_dir

If a session document cannot be created, this should raise an error rather
than silently continuing with a missing session document, which can cause
document count mismatches when other implementations (e.g. MATLAB) open
the same session and create their own session document.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Tarball of all artifacts from /tmp/NDI/symmetryTest/pythonArtifacts/
including JSON documents, SQLite databases, and session summaries
for cross-language comparison debugging.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
- Add getprobes() call after ingestion in intanNDR and axonNDR
  make_artifacts tests. getprobes() creates an element document for
  each probe, matching MATLAB's behavior (10 docs instead of 9).
- Remove macOS resource fork files (._*) after extracting the
  downloaded dataset archive in the downloadIngested make_artifacts test.
- Filter ._* files from session summary comparisons in the
  downloadIngested read_artifacts test.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Instead of hardcoding DATASET_ID as the expected subdirectory name,
find the single directory inside the extracted archive. Errors if
there is not exactly one directory.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
The ._ files were only being filtered from the actual (Python-generated)
summary but not from the expected (MATLAB-generated) summary in
datasetSummary.json. This caused mismatches when the MATLAB artifacts
archive contained macOS resource fork files.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
database_clear("yes") was destroying the session document created by
the ndi_session_dir constructor. This caused a document count mismatch
when MATLAB opened the Python-generated artifacts: MATLAB's constructor
would create a new session document (since none existed), resulting in
10 documents in the DB vs 9 exported JSON files.

Keeping only session.cache.clear() preserves the session document and
matches MATLAB's test behavior.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Regenerated after removing database_clear() from makeArtifacts tests.
Session documents are now preserved (not cleared). All 3 ingestion
artifact directories contain 9 JSON documents each.

MATLAB still reports 10 documents when opening these sessions — the
10th document is likely created by MATLAB's ndi_session_dir constructor.

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Python's getprobes() was creating ndi_probe objects in memory but never
calling database_add() for the element document. MATLAB's getprobes
persists the probe element document to the database, producing 10
documents in ingestion sessions vs Python's 9.

Added database_add(probe.newdocument()) after creating new probes,
matching MATLAB behavior. Updated pythonArtifacts.tar.gz with the
corrected artifacts (now 10 documents per ingestion session).

https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
@stevevanhooser stevevanhooser merged commit 44cd884 into main Mar 21, 2026
5 checks passed
@stevevanhooser stevevanhooser deleted the claude/add-symmetry-tests-7IWoH branch March 21, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants