Skip to content
Arnaud Dhaene edited this page Nov 2, 2021 · 3 revisions

Welcome to the graphmel wiki!

Data

The data is provided by colleagues from CHUV and HES-SO. Contact points are Daniel Abler and Michel Cuendet.

Clinical patient information

The data provided contains patient-level demographical information as well as clinical data including the results of specific blood tests. The demographic and treatment-logistic information is contained in melanoma_patient-level_summary_anonymized.csv and contains the following features:

TODO: add patient features description

The blood-test-related information is contained in melanoma_info_patient-treatment-blood-mutation_${DATE}_anonymized.csv with ${DATE} being the latest update date in the YYYY-MM-DD format. The features included in this file are the following:

TODO: add blood features description

Imaging data (lesion- and study-level)

The data regarding the studies are relatively useless with regards to modeling, it serves more as a treatment-logistics verification dataset. The information can be found in melanoma_study_level_summary_anonymized.csv. The features are as follows:

  • Each patient is attributed to a unique gpcr_id [int]
  • Each study is identified relative to the start of treatment with study_name [str] and the processed study_phase [int]
  • is_before_treatment [bool], is_during_treatment [bool], and is_after_treatment_end [bool] are boolean values that explain when the study occured relative to treatment start and end
  • nth_before_treatment [float], nth_after_treatment_start [float], nth_during_treatment [float], and nth_after_treatment_end [float] is the scan number relative to treatment start and end
  • n_days_to_treatment_start [int] and n_days_to_treatment_end [int] are the number of days to treatment start and end
  • is_malignant [int] is the aggregate number of malignant lesions in the exam, which is renamed to malignant_lesions [int] during preprocessing
  • Boolean values about segmentation existance are contained in brain_seg_exists [bool], bones_seg_exists [bool], spleen_seg_exists [bool], aorta_seg_exists [bool], heart_seg_exists [bool], kidney_right_seg_exists [bool], kidney_left_seg_exists [bool], lung_right_seg_exists [bool], lung_left_seg_exists [bool], and liver_seg_exists [bool]

The lesion-level information contains the actual imaging information that's extracted using the PARS software within the HES-SO lab. The features for this file, entitled melanoma_lesion-info_organ-overlap_${DATE}_anonymized_cleaned_all.csv with ${DATE} being the latest update date in the YYYY-MM-DD format, are:

The lesions dataset contains the following features:

  • Each patient is attributed to a unique gpcr_id [int]
  • Each study is identified relative to the start of treatment with study_name [str] and the processed study_phase [int]
  • roi_id [int] is an identifier for the lesion's ROI (Region Of Interest)
  • roi_name [str] is a textual identifier for the lesion
  • lesion_label_id [int] is an identifier for the lesion's label
  • pars_bodypart_petct [str], pars_region_petct [str], pars_subregion_petct [str], pars_laterality_petct [str] are categorical values output by PARS that help identify the location of the lesion
  • pars_classification_petct [str] is a categorical variable (either benign or suspicious)
  • vol_ccm [float] is the lesion volume in cubic centimeters
  • max_suv_val [float], mean_suv_val [float], min_suv_val [float], and sd_suv_val [float] are relative to the lesions SUV (Standardized Uptake Values)
  • is_malignant [bool] is the boolean value of pars_classification_petct == 'suspicious'
  • assigned_organ [str] is the lesion's assigned organ (which is PARS output)

Progression labels

The progression labels can be found in melanoma_petct-exams_progression-status_${DATE}_anonymized.csv with ${DATE} being the latest update date in the YYYY-MM-DD format. It contains the following features:

TODO: add blood features description

Labelling methodology

It is important to note that pseudorecist is a label attributed to a scan by a trained doctor that has access to the patient's reports written by his actual doctor. This means that the label is in some way indirect, as it is attributed by another doctor that is actually managing the patient's treatment, solely based on written reports.

Above that, prediction_score contains the results of an NLP model trained on said reports with progression status manually labeled. It is unclear at this stage whether these labels are contained in the training or testing set of the modeling stage. No literature has been published regarding the exact methodology employed by the CHUV/HES-SO team. For this reason, they are not used for training within the scope of this project.

Methods

Preprocessing

Fetching and processing

Generating graph representations

Dashboard

EDA

Graph connectivity generator

Modeling

Baseline

Graph Neural Networks

ML Tracking

Validation and testing