-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the graphmel wiki!
The data is provided by colleagues from CHUV and HES-SO. Contact points are Daniel Abler and Michel Cuendet.
The data provided contains patient-level demographical information as well as clinical data including the results of specific blood tests. The demographic and treatment-logistic information is contained in melanoma_patient-level_summary_anonymized.csv
and contains the following features:
TODO: add patient features description
The blood-test-related information is contained in melanoma_info_patient-treatment-blood-mutation_${DATE}_anonymized.csv
with ${DATE}
being the latest update date in the YYYY-MM-DD
format. The features included in this file are the following:
TODO: add blood features description
The data regarding the studies are relatively useless with regards to modeling, it serves more as a treatment-logistics verification dataset. The information can be found in melanoma_study_level_summary_anonymized.csv
. The features are as follows:
- Each patient is attributed to a unique
gpcr_id [int]
- Each study is identified relative to the start of treatment with
study_name [str]
and the processedstudy_phase [int]
-
is_before_treatment [bool]
,is_during_treatment [bool]
, andis_after_treatment_end [bool]
are boolean values that explain when the study occured relative to treatment start and end -
nth_before_treatment [float]
,nth_after_treatment_start [float]
,nth_during_treatment [float]
, andnth_after_treatment_end [float]
is the scan number relative to treatment start and end -
n_days_to_treatment_start [int]
andn_days_to_treatment_end [int]
are the number of days to treatment start and end -
is_malignant [int]
is the aggregate number of malignant lesions in the exam, which is renamed tomalignant_lesions [int]
during preprocessing - Boolean values about segmentation existance are contained in
brain_seg_exists [bool]
,bones_seg_exists [bool]
,spleen_seg_exists [bool]
,aorta_seg_exists [bool]
,heart_seg_exists [bool]
,kidney_right_seg_exists [bool]
,kidney_left_seg_exists [bool]
,lung_right_seg_exists [bool]
,lung_left_seg_exists [bool]
, andliver_seg_exists [bool]
The lesion-level information contains the actual imaging information that's extracted using the PARS software within the HES-SO lab. The features for this file, entitled melanoma_lesion-info_organ-overlap_${DATE}_anonymized_cleaned_all.csv
with ${DATE}
being the latest update date in the YYYY-MM-DD
format, are:
The lesions dataset contains the following features:
- Each patient is attributed to a unique
gpcr_id [int]
- Each study is identified relative to the start of treatment with
study_name [str]
and the processedstudy_phase [int]
-
roi_id [int]
is an identifier for the lesion's ROI (Region Of Interest) -
roi_name [str]
is a textual identifier for the lesion -
lesion_label_id [int]
is an identifier for the lesion's label -
pars_bodypart_petct [str]
,pars_region_petct [str]
,pars_subregion_petct [str]
,pars_laterality_petct [str]
are categorical values output by PARS that help identify the location of the lesion -
pars_classification_petct [str]
is a categorical variable (eitherbenign
orsuspicious
) -
vol_ccm [float]
is the lesion volume in cubic centimeters -
max_suv_val [float]
,mean_suv_val [float]
,min_suv_val [float]
, andsd_suv_val [float]
are relative to the lesions SUV (Standardized Uptake Values) -
is_malignant [bool]
is the boolean value ofpars_classification_petct == 'suspicious'
-
assigned_organ [str]
is the lesion's assigned organ (which is PARS output)
The progression labels can be found in melanoma_petct-exams_progression-status_${DATE}_anonymized.csv
with ${DATE}
being the latest update date in the YYYY-MM-DD
format. It contains the following features:
TODO: add blood features description
It is important to note that pseudorecist
is a label attributed to a scan by a trained doctor that has access to the patient's reports written by his actual doctor. This means that the label is in some way indirect, as it is attributed by another doctor that is actually managing the patient's treatment, solely based on written reports.
Above that, prediction_score
contains the results of an NLP model trained on said reports with progression status manually labeled. It is unclear at this stage whether these labels are contained in the training or testing set of the modeling stage. No literature has been published regarding the exact methodology employed by the CHUV/HES-SO team. For this reason, they are not used for training within the scope of this project.