New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NDict
optimization
#271
NDict
optimization
#271
Conversation
…nto sagi/ndict_opt
…ncollate() func + mypy fixe
|
||
# convert continuous measurements to categorical ones based | ||
# on defined bins mapping static clinical characteristics | ||
# (Age, Gender, ICU type, Height, etc) | ||
for k in sample_dict["StaticDetails"]: | ||
sample_dict["StaticDetails"][k] = k + "_" + str(np.digitize(sample_dict["StaticDetails"][k], bins[k])) | ||
sample_dict[f"StaticDetails.{k}"] = k + "_" + str(np.digitize(sample_dict["StaticDetails"][k], bins[k])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small fix to match the new NDict
impl.
Since we now don't return a "real" nested dict, we can't change a returned sub-dict and expect the changes to be reflected in the original dictionary.
This kind of fix might be needed to be applied on other projects that are not covered by the CI tests.
train_metrics["gender_auc"] = Filter( | ||
MetricAUCROC(pred="model.output.gender", target="Gender"), | ||
"filter", | ||
pre_collect_process_func=filter_gender_label_unknown_for_metric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-reminder:
open an issue on the pre_collect_process_func
that cause to disproportionate amount of NDict's __init__
calls.
for key in keys: | ||
if isinstance(batch[key], torch.Tensor): | ||
for key in batch.keys(): | ||
if isinstance(batch[key], (torch.Tensor, np.ndarray, list)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why we first check for Tensors and ndarrays separately so I changed it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was safer, but let's change and see if someone encounter an issue
fuse/utils/ndict.py
Outdated
return self._stored[key] | ||
|
||
# the key is a prefix for other value(s) | ||
elif self.is_prefix(key): # TODO can be more optimized. we pass here once and in the "get_sub_dict" once again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see TODO
comment.
In my opinion is it enough to leave it like that for now (readability & code reuse VS optimization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove is_prefix.
Instead return None is get_subdict
fuse/utils/ndict.py
Outdated
def __delitem__(self, key: str) -> None: | ||
""" | ||
:param key: | ||
TODO should we delete both value and prefix ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think?
Currently we delete the value (and not the subdict!)
if the value doesn't exists but the subdict does, we delete the subdict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ans (talked offline):
delete both
Added a self-CR. |
for key in keys: | ||
if isinstance(batch[key], torch.Tensor): | ||
for key in batch.keys(): | ||
if isinstance(batch[key], (torch.Tensor, np.ndarray, list)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was safer, but let's change and see if someone encounter an issue
fuse/utils/ndict.py
Outdated
in deep copy, all values are copied recursively | ||
:param deepcopy: if true, does deep copy, otherwise does shalow copy | ||
|
||
:param deepcopy: if true, does deep copy, otherwise does a shallow copy | ||
""" | ||
if not deepcopy: | ||
return NDict(copy.copy(self._stored)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already_flat=True
fuse/utils/ndict.py
Outdated
NDict._flatten_static(value, cur_prefix, flat_dict) | ||
else: | ||
flat_dict[prefix] = item | ||
return self._stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self instead
@@ -163,83 +123,139 @@ def merge(self, other: dict) -> NDict: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to get NDict input
fuse/utils/ndict.py
Outdated
self[k] = v | ||
|
||
return | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove - and change signature
fuse/utils/ndict.py
Outdated
return self._stored[key] | ||
|
||
# the key is a prefix for other value(s) | ||
elif self.is_prefix(key): # TODO can be more optimized. we pass here once and in the "get_sub_dict" once again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove is_prefix.
Instead return None is get_subdict
fuse/utils/ndict.py
Outdated
suffix_key = None | ||
for kk in self.keypaths(): | ||
if kk.startswith(prefix_key): | ||
suffix_key = kk.replace(prefix_key, "", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk[len(prefix_key):]
fuse/utils/ndict.py
Outdated
res[suffix_key] = self[kk] | ||
|
||
if suffix_key is None and key not in self: | ||
raise NestedKeyError(key, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for key not in self
fuse/utils/ndict.py
Outdated
# set the value | ||
element[nested_key[-1]] = value | ||
# delete entire branch | ||
elif self.is_prefix(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant
fuse/utils/ndict.py
Outdated
if key in self._stored: | ||
return key | ||
|
||
key_parts = key.split(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use similarity between strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🚀
|
||
def keypaths(self) -> List[str]: | ||
def keypaths(self) -> dict_keys: | ||
""" | ||
:return: a list of keypaths (i.e. "a.b.c.d") to all values in the nested dict | ||
""" | ||
return list(self._stored.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the list
fuse/utils/ndict.py
Outdated
""" | ||
return self.keypaths() | ||
|
||
def top_level_keys(self) -> dict_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> List[str]
✅ Ready for review
Profiling
EHR Transformer (5 epochs)
Not Optimized
Optimized
ISIC (2 epochs)
Not Optimized
Optimized