Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(schema): Load subject data #1977

Merged
merged 5 commits into from
May 24, 2024
Merged

Conversation

effigies
Copy link
Collaborator

@effigies effigies commented May 24, 2024

This fixes the implementation of:

  • rules.checks.dataset.SubjectFolders
  • rules.checks.dataset.ParticipantIDMismatch
  • rules.checks.dataset.PhenotypeSubjectsMissing

These started failing with bids-standard/bids-specification#1833, which activated previously disabled checks.

@rwblair

Closes #1978.

@effigies
Copy link
Collaborator Author

The implementation here is really suboptimal. I don't know if we have a way of creating a baseline context.dataset object that just gets passed around, but if so, this should go there. If not, we should make it so we're not potentially parsing many TSV files every time we look at another file.

Copy link

codecov bot commented May 24, 2024

Codecov Report

Attention: Patch coverage is 98.33333% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 87.43%. Comparing base (6e14422) to head (75cb145).

Current head 75cb145 differs from pull request most recent head 0aa1665

Please upload reports for the commit 0aa1665 to get more accurate results.

Files Patch % Lines
bids-validator/src/schema/context.ts 98.18% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1977      +/-   ##
==========================================
+ Coverage   85.68%   87.43%   +1.75%     
==========================================
  Files          91      130      +39     
  Lines        3792     6232    +2440     
  Branches     1220     1510     +290     
==========================================
+ Hits         3249     5449    +2200     
- Misses        457      692     +235     
- Partials       86       91       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@effigies
Copy link
Collaborator Author

Apparently Javascript doesn't allow you to do array equality with == or anything else, so checks like sorted(columns.participant_id) == sorted(dataset.subjects.sub_dirs) will always fail. I think the only way to handle this will be to create an isEqual(x,y) function in the expression language.

@rwblair
Copy link
Member

rwblair commented May 24, 2024

The implementation here is really suboptimal. I don't know if we have a way of creating a baseline context.dataset object that just gets passed around, but if so, this should go there. If not, we should make it so we're not potentially parsing many TSV files every time we look at another file.

I believe that a single reference to the dsContext is used in normal validation. Its instantiated here and then passed to the context generator:
https://github.com/bids-standard/bids-validator/blob/master/bids-validator/src/validators/bids.ts#L48

'll look at if it has the information needed to populate the subjects at creation time. If not we can add a check to the file context to only run load subjects if its undefined. I also need to remember why and when the default dsContext is used.

@effigies
Copy link
Collaborator Author

Added a commit to leave .subjects undefined until the first call, and then only create and populate it if undefined. That way we only duplicate effort if the context is regenerated.

@rwblair
Copy link
Member

rwblair commented May 24, 2024

Looking good.

A heads up the subjects listed in summary aren't dependent on the dsContext. Every context's filename gets checked for a sub entity and that is added as a subject to the summary:
https://github.com/bids-standard/bids-validator/blob/master/bids-validator/src/summary/summary.ts#L106

I think its trying to capture a case in which a malformed dataset has more subjects files in it than it has proper sub- directories. Another unlikely case is that subjects with completely empty subject dirs won't be counted in the summary.

I can see this potentially as a point of confusion, but also don't think it will matter 99.9% of the time. We can change it in a future PR if need be.

@effigies
Copy link
Collaborator Author

I think if there are no files in a subject dir, it won't make it into the fileTree, but I'm okay with a deviation between these if it does.

@effigies effigies merged commit ba937d6 into bids-standard:master May 24, 2024
26 of 31 checks passed
@effigies effigies deleted the feat/subjects branch May 24, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement allequal in expression language.
2 participants