Skip to content

[MRI] Check if CenterID and MRI_Alias are defined before going further #1228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ridz1208
Copy link

@ridz1208 ridz1208 commented Feb 7, 2025

No description provided.

@maximemulder
Copy link
Contributor

The $subjectIDsref variable is provided by the configuration so it might be a good thing to check that the site exists in the database before returning from this function in case the configuration is wrong.

Although that is Perl code so anything is fine with me.

@ridz1208
Copy link
Author

ridz1208 commented Feb 9, 2025

@maximemulder you make a good point actually for the database check. And is there a python equivalent to this code that I should be editing too ?

@maximemulder
Copy link
Contributor

maximemulder commented Feb 9, 2025

@ridz1208 Now that I think about it. If you add a database check, there is no reason for the configuration to specify both the site ID and the site MRI alias. Just specify the site ID and get the MRI alias from the database.

Technically there is some equivalent code in the Python DICOM to BIDS scripts here. Note that the logic is not exactly the same, and I don't think the whole script can work if the session is None.

I assume this site configuration feature is new (at least in main) ? It is currently not present in this file that defines the structure of the Python subject information configuration ($subjectIDsref). It would need to be added here. These classes are something I added this year to be able to type check code that uses the configuration, and while it works nicely in the code, if we are to add new optional features to the subject information configuration, it may be more ergonomic to go back to a dict inside the configuration file itself, and only when the file is loaded put that in a pretty SubjectInfo object.

To give my more general and long term impression on that code. I don't like the way the subject information configuration is handled in Python. The get_subject_info function from the configuration is executed in one place, and the resulting SubjectInfo object is used in several places in the code. IMO it would be better to execute the function, and directly use the SubjectInfo object to get the candidate, session, and site of the scan from the database. And not have the subject_info object linger around in the code. Some changes I made this year go in this direction but there is still work to do unfortunately.

@ridz1208
Copy link
Author

ridz1208 commented Feb 9, 2025

@ridz1208 Now that I think about it. If you add a database check, there is no reason for the configuration to specify both the site ID and the site MRI alias. Just specify the site ID and get the MRI alias from the database.

I couldnt agree more, see my first PR on LORIS MRI #1227

I assume this site configuration feature is new (at least in main) ? It is currently not present in this file that defines the structure of the Python subject information configuration ($subjectIDsref)

It's not new per-say, the prod file used by perl allows me to define whatever I want in the subjectIDref and I decided to mod my prod file to do the database check for the site there because I don't consider it done properly in the rest of the code (i.e. too broad preg matches with high risk of false positives) its a mix of having to do it as an override and not wanting to mess with existent behaviour at the same time. As for modifying the python side. Initially I had only found the .loris_mri/database_config.py file where I added this logic as seen below (pardon my python its been a while)

def get_subject_ids(db, dicom_value=None, scanner_id=None):

    subject_id_dict = {}

    imaging = Imaging(db, False)

    phantom_match   = re.search('^phantom_([A-Z]{3,4})_([A-Z]{3,4})_(.+)$', dicom_value, re.IGNORECASE)
    candidate_match = re.search('([^_]+)_(\d+)_([^_]+)', dicom_value, re.IGNORECASE)

    if phantom_match:
        subject_id_dict['isPhantom']  = True
        subject_id_dict['CandID']     = imaging.get_scanner_candid(scanner_id)
        project_alias                 = candidate_match.group(1)
        center_alias                  = candidate_match.group(2)
        projectq = db.pselect(
            "SELECT ProjectID FROM Project WHERE Alias = %s",
            [project_alias, ]
        )
        if len(projectq) > 0:
            subject_id_dict['ProjectID'] = projectq[0]['ProjectID']
        centerq = db.pselect(
            "SELECT CenterID FROM psc WHERE Alias = %s",
            [center_alias, ]
        )
        if len(centerq) > 0:
            subject_id_dict['CenterID'] = centerq[0]['CenterID']
        subject_id_dict['visitLabel'] = candidate_match.group(3)
        subject_id_dict['createVisitLabel'] = 1
        cohort = "Control"
        cohortq = db.pselect(
            "SELECT CohortID FROM cohort WHERE title = %s",
            [cohort, ]
        )
        if len(cohortq) > 0:
            subject_id_dict['CohortID'] = cohortq[0]['CohortID']

....

As for the long term direction... I can't really speak to it. I think there is a very wide range of naming conventions out there and it would be very hard to make a single system that handles all of them whether its in the Perl or python code. On CBIG I try to be as explicit as possible so generally for patient scans I rely as much as possible on the identifiers, for phantoms I ask that the name contain the site, project and visit to avoid any implicitely deduced parameters anywhere... I'm fairly new to these pipeline so i'm sure there is a cleaner way to do it but the question will be how disruptive they would be to existent projects

@maximemulder
Copy link
Contributor

I reviewed #1227. I agree on the broad regex matches and potential false positives in the code.

Thanks for sharing your config. I am pretty sure that as of LORIS-MRI 26 / main, the field "CenterID" in the Python configuration is not supported (and thus ignored). As per the function I posted earlier, the scan site is first gotten from the session, or if no session is found (but I don't see how the code can work then, I need to investigate deeper to be sure), from the site alias or MRI alias. Also, this configuration is only used by the LORIS DICOM to BIDS pipeline for now.

If this meets your needs, do not change the Python configuration for now IMO. I may do it myself in the coming months.

If you're interested in my current LORIS-MRI to-do list, it is the following:

  • Update the new DICOM archive script with the "specify session ID" option (code is done locally, I will test it this week).
  • Finish the incremental BIDS importer (it currently works~ish on C-BIG dev but still incomplete).
  • I guess continue to clean up the Python configuration and implement the site (new item as per this discussion !).

Though keep in mind that I have now a new full-time job (or rather PhD). So this last task may depend on my availability.

@ridz1208
Copy link
Author

ridz1208 commented Feb 10, 2025

I am pretty sure that as of LORIS-MRI 26 / main, the field "CenterID" in the Python configuration is not supported (and thus ignored)

Good to know , I'll change it back. I made the assumption that it works like the perl code

As per the function I posted earlier, the scan site is first gotten from the session, or if no session is found (but I don't see how the code can work then, I need to investigate deeper to be sure), from the site alias or MRI alias.

I'm guessing that was designed to mimic the perl code but I have the same problem as you. I'm not getting the 100% certainty that if a user makes a mistake it wont go rogue and try to pattern match the MRI alias from the patient name (That's really not a functionality I want) and the way it makes it past the session check in the perl code at least is if we are dealing with phantom scans and not patient scans. This is basically the reason for my PR, instead of defaulting to a case which may or may not result in a false positive (here)[https://github.com/aces/Loris-MRI/blob/main/uploadNeuroDB/NeuroDB/MRI.pm#L1122] I'm getting the CenterID myself in the prod file and adding this code to bypass the function entirely (really not ideal but safest change to not breake other projects I assumed)

Like I said I think the issue stems from the lack of consistency and standardization accross projects... you cant make a function to pattern match for a site alias when there is no specific convention where to look. Here is the code I out in my prod file which reduces false positives by pattern matching specific locations to specific entities

example prod file section
     # If patientName is phantom scan or test scan
     # CandID is scanner DCCID (based on site alias)
     # visitLabel is scan patient name
     # Set createVisitLable to 
     #      a. 1 if imaging pipeline should create the visit label (when visit label has not been created yet in the database. 
     #      b. 0 if imaging pipeline should not create the visit label (when visit label has not been created yet in the database. 
     # EXAMPLE PHANTOM PATIENTNAME: phantom_CAPT_CHUM_PHANTOM01
    if ($patientName =~ /^phantom_([A-Z]{3,4})_([A-Z]{3,4})_(.+)$/) {
        $subjectID{'CandID'} = NeuroDB::MRI::my_trim(NeuroDB::MRI::getScannerCandID($scannerID, $db));
        # MATCH 1: Project
        my $projectAlias = $1;
                my $query = "SELECT ProjectID FROM Project WHERE Alias = ?";
        my $sth = $${dbhr}->prepare($query);
        $sth->execute($projectAlias);

        if ($sth->rows > 0) {
            my $row = $sth->fetchrow_hashref();
            $subjectID{'ProjectID'} = $row->{'ProjectID'};
        }

        # MATCH 2: Center
        my $centerAlias = $2;
        my $query = "SELECT CenterID FROM psc WHERE Alias = ?";
        my $sth = $${dbhr}->prepare($query);
        $sth->execute($centerAlias);

        if ($sth->rows > 0) {
            my $row = $sth->fetchrow_hashref();
            $subjectID{'CenterID'} = $row->{'CenterID'};
          $subjectID{'CenterAlias'} = $centerAlias;
        }

        # MATCH 3: Visit
        $subjectID{'visitLabel'} = $3;

        # PRESET: Metadata
        $subjectID{'isPhantom'}  = 1;
        $subjectID{'createVisitLabel'} = 1;

        # PRESET: Cohort
        my $subprojectTitle = "Control";
        my $query = "SELECT CohortID FROM cohort WHERE title = ?";
        my $sth = $${dbhr}->prepare($query);
        $sth->execute($subprojectTitle);

        if ($sth->rows > 0) {
            my $row = $sth->fetchrow_hashref();

@maximemulder
Copy link
Contributor

maximemulder commented Feb 10, 2025

Okay, thanks for the additional context !

That comforts me in my opinion:

  1. Test and merge [Everything] DIE MRI ALIAS DIE #1227 first.
  2. Update this PR to get the MRI alias from the database or raise an error if the site ID is incorrect.
  3. Do not update the Python configuration for now. According to the configuration in the database, C-BIG uses the Perl DICOM to MINC pipeline by default. If you do not run the run_dicom_archive_loader.py (LORIS DICOM to BIDS) script manually it means C-BIG is not using the Python configuration.

FYI, I discussed with Cécile a few months ago and the impression is that the Python configuration was written by mimicking the Perl configuration but that phantoms are generally untested in Python, and might thus be broken.

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 20 days.

@github-actions github-actions bot added the Stale label Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants