Add symmetry tests for downloaded ingested datasets#43
Merged
Conversation
- Add makeArtifacts test that downloads a dataset .tgz from Waltham-Data-Science/file-passing, extracts it, opens the dataset, and exports session summaries and document counts as artifacts - Add readArtifacts test that verifies session counts, references, session summaries, and document counts from both matlabArtifacts and pythonArtifacts sources - Add GitHub workflow (test-symmetry-download.yml) that downloads the dataset with curl before running the symmetry tests https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
The downloadIngested symmetry tests (both MATLAB and Python) need the 69a8705aa9ab25373cdc6563.tgz dataset pre-downloaded to /tmp. MATLAB's websave doesn't work in GitHub Actions, so this must be done via curl before either test suite runs. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
…mparison The database may return probes in different order on re-open or across languages, causing spurious symmetry test failures. This mirrors the existing sort for daqSystemNames/daqSystemDetails. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
The existing test-symmetry.yml already runs all symmetry tests (including the new downloadIngested tests) and now has the curl download step. A separate workflow is unnecessary. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Three related bugs prevented Python from correctly reading ingested epochs from the database: 1. document_properties returns a plain dict, but code used attribute access (props.base.id, props.epochfiles_ingested.epoch_id, etc.) which silently failed. Fixed in ndi_daq_reader, ndi_daq_metadatareader, ndi_file_navigator, and ingested2epochs_t0t1_epochclock. 2. session.id is a method but was referenced without calling it (self._session.id instead of self._session.id()) in query construction and document creation, causing session_id queries to always return empty results. 3. Ingested epochprobemap stored as TSV strings were returned raw instead of being parsed into ndi_epoch_epochprobemap objects. Added _parse_epochprobemap_tsv() and fixed _serialize_epochnode to handle lists of probemap objects. Also sorts ingested epochs by epoch_id for deterministic ordering matching MATLAB's behavior. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
document_properties returns a plain dict, not an attribute-accessible object. Fix 2 occurrences in mfdaq.py and update 4 test mocks in test_daq.py to use dict format instead of MagicMock attribute chains. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Include exp1_eg, exp1_eg_saved, exp_sg, immature, and spikesortdemo example data directories. These are the same example sessions available in NDI-matlab and are needed by ingestion symmetry tests and useful for user exploration. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
- Fix session.id (missing parens) in system.py ingest(), which caused session_id mismatch errors during ingestion - Add _serialize_epochprobemap() to serialize epochprobemap objects to TSV strings before storing in ingested documents, preventing JSON serialization errors https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
ndr.fun.ndrpath is a module, not a callable. Use `from ndr.fun.ndrpath import ndrpath` to get the actual function. This fixes the axon ingestion tests which were skipping because the ABF example data couldn't be found despite being installed. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
MATLAB uses 'testIngestionAxonNDRArtifacts' (American spelling). Python was using 'testIngestionAxonNDRArtefacts' (British spelling), causing MATLAB artifact read tests to skip. Fixed both make and read test files. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Python's json.dumps serializes NaN as literal NaN (non-standard JSON) which MATLAB's jsondecode cannot parse. Convert NaN to None (null in JSON) in daqreader_epochdata_ingested t0_t1 values before storing. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
- Remove @lru_cache from find_file_groups which returned stale file lists after raw data deletion, causing ghost epoch detection - Generate session summary from artifact directory (not original temp session) so files/epochNodes match what MATLAB sees when reading - Write placeholder sessionSummary.json before computing summary so the file list includes it - Re-open session from artifact dir to ensure session doc exists in DB https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
…string MATLAB stores the subject document ID (e.g. "00064d75bf4c0e79_06c53364fd25a...") in probe structs, not the local_identifier string (e.g. "anteater27@nosuchlab.org"). Look up the subject document via does_subjectstring_match_session_document() to match MATLAB's behavior and fix symmetry test failures. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
…ion_dir If a session document cannot be created, this should raise an error rather than silently continuing with a missing session document, which can cause document count mismatches when other implementations (e.g. MATLAB) open the same session and create their own session document. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Tarball of all artifacts from /tmp/NDI/symmetryTest/pythonArtifacts/ including JSON documents, SQLite databases, and session summaries for cross-language comparison debugging. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
- Add getprobes() call after ingestion in intanNDR and axonNDR make_artifacts tests. getprobes() creates an element document for each probe, matching MATLAB's behavior (10 docs instead of 9). - Remove macOS resource fork files (._*) after extracting the downloaded dataset archive in the downloadIngested make_artifacts test. - Filter ._* files from session summary comparisons in the downloadIngested read_artifacts test. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Instead of hardcoding DATASET_ID as the expected subdirectory name, find the single directory inside the extracted archive. Errors if there is not exactly one directory. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
The ._ files were only being filtered from the actual (Python-generated) summary but not from the expected (MATLAB-generated) summary in datasetSummary.json. This caused mismatches when the MATLAB artifacts archive contained macOS resource fork files. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
database_clear("yes") was destroying the session document created by
the ndi_session_dir constructor. This caused a document count mismatch
when MATLAB opened the Python-generated artifacts: MATLAB's constructor
would create a new session document (since none existed), resulting in
10 documents in the DB vs 9 exported JSON files.
Keeping only session.cache.clear() preserves the session document and
matches MATLAB's test behavior.
https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Regenerated after removing database_clear() from makeArtifacts tests. Session documents are now preserved (not cleared). All 3 ingestion artifact directories contain 9 JSON documents each. MATLAB still reports 10 documents when opening these sessions — the 10th document is likely created by MATLAB's ndi_session_dir constructor. https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
Python's getprobes() was creating ndi_probe objects in memory but never calling database_add() for the element document. MATLAB's getprobes persists the probe element document to the database, producing 10 documents in ingestion sessions vs Python's 9. Added database_add(probe.newdocument()) after creating new probes, matching MATLAB behavior. Updated pythonArtifacts.tar.gz with the corrected artifacts (now 10 documents per ingestion session). https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a new symmetry test suite for validating downloaded ingested datasets in the NDI Python stack. The tests mirror the MATLAB equivalents and verify that the Python implementation can correctly open, read, and validate dataset artifacts.
Key Changes
New test suite:
test_download_ingested.py(make_artifacts)DatasetAPIdatasetSummary.jsonfor cross-validationNew test suite:
test_download_ingested.py(read_artifacts)compareSessionSummary()New GitHub Actions workflow:
test-symmetry-download.ymlImplementation Details
SOURCE_TYPES(matlabArtifacts/pythonArtifacts) to enable cross-platform validationsessionSummary,compareSessionSummary,Query) for validation69a8705aa9ab25373cdc6563(fixed reference dataset)https://claude.ai/code/session_01X3Dg23mnjFYU1fjJBrahjt