Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADNI_to_BIDS utils : load_clinical_csv, hidden csv file matching, issue #1163 #1195

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

HuguesRoy
Copy link
Contributor

Fix issue #1163

Change re.search by re.match.

It will avoid hidden file matching by constraining to an exact match with the pattern

if we have two files:

  • correct file: ADNIMERGE_25Apr2024.csv
  • hidden file: .ADNIMERGE_25Apr2024.csv

the pattern is "ADNIMERGE"+ r"(_\d{1,2}[A-Za-z]{3}\d{4})?.csv", only the correct file will be returned.

@HuguesRoy HuguesRoy added the fix PR fixing a bug label May 29, 2024
@HuguesRoy HuguesRoy self-assigned this May 29, 2024
@AliceJoubert
Copy link
Contributor

I would prefer changing the rglob regex to not consider hidden files at all (either check on beginning of string or different rglob match but I would need to check how to). WDYT @NicolasGensollen ?

Copy link
Member

@NicolasGensollen NicolasGensollen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @HuguesRoy !

@@ -1644,7 +1644,7 @@ def load_clinical_csv(clinical_dir: str, filename: str) -> pd.DataFrame:

pattern = filename + r"(_\d{1,2}[A-Za-z]{3}\d{4})?.csv"
files_matching_pattern = [
f for f in Path(clinical_dir).rglob("*.csv") if re.search(pattern, (f.name))
f for f in Path(clinical_dir).rglob("*.csv") if re.match(pattern, (f.name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AliceJoubert. Not listing hidden files and folders will be more explicit than changing search for match.
I believe changing the rglob pattern for .rglob("[!.]*.csv") should do the trick.
Would be nice to add some unit tests to make sure this is working as expected, especially if there are csv files in subfolders where both the subfolders and the files could be hidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix PR fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants