New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ehr transformer ICU example #245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ella.
I've reviewed the files you mentioned (didn't get into all the details though).
Looks good, please see few comments inline
from fuse.utils.ndict import NDict | ||
|
||
|
||
class OpReadDataframeCinC(OpBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use OpReadDataFrame from fuse as is? What do you think?
If we should process the dataframe, we can do it before we pass is to OpReadDataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following your suggestion, I moved the creation of the patients dataframe to the "loading stage" and using OpReadDataFrame now.
from ops_read_cinc import OpReadDataframeCinC | ||
from fuse.data.utils.export import ExportDataset | ||
|
||
SOURCE = r"C:/D_Drive/Projects/EHR_Transformer/PhysioNet/predicting-mortality-of-icu-patients-the-physionetcomputing-in-cardiology-challenge-2012-1.0.0/predicting-mortality-of-icu-patients-the-physionet-computing-in-cardiology-challenge-2012-1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to read the path from env variable and provide instructions on how to download the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated configuration file to read from the environment variable and the main in dataset.py
|
||
|
||
class OpAddBMI(OpBase): | ||
def __call__(self, sample_dict) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant to all ops: if you think that the op might be useful for other datasets as well (and also for readability).
It's better to also get the input keys and output keys. in this case key_in_height, key_in_weight, key_out_bmi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added parameters of input and output keys
d_static = sample_dict["StaticDetails"] | ||
|
||
if ("Height" in d_static.keys()) & ("Weight" in d_static.keys()): | ||
height = d_static["Height"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect specific units, maybe we should specify it in call comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added information in the comment , please check if it in the right place
|
||
d_visits_sentences = sample_dict["VisitSentences"] | ||
n_visits = len(d_visits_sentences) | ||
# TODO add filtering patients with small number of visits - add to configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put this op in static_pipeline, you can return None in order to filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the filtering patients on the number of visits to the raw data loading method together with filtering based on time (less than X hours in the hospital)
|
||
d_static = sample_dict["StaticDetails"] | ||
|
||
if ("Height" in d_static.keys()) & ("Weight" in d_static.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not? is it a sample you want to keep or not?
If yes, maybe you should add -1 as a dummy value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to add BMI with -1 to the dict of static values in case it can't be calculated from missing Height or Weight.
It will be further converted to nan during percentile calculations and will be skipped in the next steps of trajectories build.
d_static = sample_dict["StaticDetails"] | ||
|
||
if ("Height" in d_static.keys()) & ("Weight" in d_static.keys()): | ||
height = d_static["Height"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's ndict, you can also access it as sample_dict["StaticDetails.Height"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
d_static = sample_dict["StaticDetails"] | ||
|
||
if ("Height" in d_static.keys()) & ("Weight" in d_static.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Height" in d_static
will also work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# convert continuous measurements to categorical ones based on defined percentiles | ||
|
||
# mapping static clinical characteristics (Age, Gender, ICU type, Height, etc) | ||
for k in sample_dict["StaticDetails"].keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for k in sample_dict["StaticDetails"]:
Will also work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
|
||
class OpMapToCategorical(OpBase): | ||
def __call__(self, sample_dict, percentiles: dict) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe rename percentiles to bins here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean in all places in the code?
…iAI/fuse-med-ml into ehr_transformer_example
…; other misc changes
…into ehr_transformer_example
… comments to files
This example is based on Vadim (@floccinauc) research. |
@ellabarkan , we still need to create a unittest. Can you please work on it in a new PR next week? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With some delay I did a quick review and added few small comments inline 😄
The code looks nice and clean! I think a small unittest can be useful here to make sure the example won't break in the future.
aux_next_vis_classification: ${aux_next_vis_classification} | ||
next_vis_loss_weight: 0.1 | ||
|
||
# uncomment to track in clearml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a better practice to use flags. for example:
config:
clearml:
track: 1 # 0 or 1
project_name: "ehr_transformer"
task_name: ${name}
tags: "fuse_example"
reuse_last_task_id: True
continue_last_task: False
and in the code:
if cfg.clearml.track:
# blablabla
# continue_last_task: False | ||
|
||
|
||
# uncomment for SGD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like the above :)
STATIC_FIELDS = ["Age", "Gender", "Height", "ICUType", "Weight"] | ||
|
||
|
||
class OpAddBMI(OpBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool :)
Suggestion:
if you think its useful - we can start an ops_clincal.py
file and store it over there for future reuse (might apply also for some UKBB ops)
:param track_clearml: optional - to track with clearml provide arguments to start_clearml_logger() | ||
""" | ||
|
||
if track_clearml is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where we can fit the flag (as a continuation for the previous comment)
@@ -521,3 +521,16 @@ def __call__(self, sample_dict: NDict, key: str, value: Any) -> Union[None, dict | |||
""" | |||
sample_dict[key] = value | |||
return sample_dict | |||
|
|||
|
|||
class OpSetIfNotExist(OpBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add it to the ops list in the data/REAME in a future PR
.