`NDict` optimization #271

SagiPolaczek · 2023-02-07T19:00:43Z

✅ Ready for review

Profiling

EHR Transformer (5 epochs)

Not Optimized

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

2565    0.003    0.000    0.004    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:150(keys)
2565    0.004    0.000    0.005    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:159(items)
15390    0.043    0.000    0.117    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:223(get_closest_key)
15390    0.021    0.000    0.138    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:294(__contains__)
61560/2565    0.066    0.000    0.185    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:118(_flatten_static)
2565    0.008    0.000    0.193    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:100(flatten)
53905/2575    0.088    0.000    0.203    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:136(_keypaths_static)
2575    0.018    0.000    0.221    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:130(keypaths)
412420    0.497    0.000    1.916    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:172(__getitem__)
2565    4.561    0.002    4.561    0.002 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:264(<listcomp>)
2565    2.039    0.001    7.170    0.003 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:249(indices)
6275340/6272775    4.483    0.000   16.573    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:191(__setitem__)
330290    2.766    0.000   20.023    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:63(__init__)

Optimized

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

       10    0.000    0.000    0.000    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:106(keys)
     2565    0.004    0.000    0.005    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:115(items)
    12835    0.024    0.000    0.027    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:100(keypaths)
     5130    0.026    0.000    0.049    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:149(is_prefix)
     5130    0.019    0.000    0.051    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:224(get_closest_key)
     5130    0.005    0.000    0.056    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:303(__contains__)
     5140    0.022    0.000    0.075    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:47(__init__)
     2565    0.026    0.000    0.076    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:172(get_sub_dict)
153960/143700    0.080    0.000    0.174    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:129(__getitem__)
    56450    0.047    0.000    0.275    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:193(__setitem__)
     5130    8.242    0.002    8.242    0.002 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:267(<listcomp>)
     5130    3.152    0.001   11.579    0.002 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:251(indices)

ISIC (2 epochs)

Not Optimized

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

75/5    0.000    0.000    0.000    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:136(_keypaths_static)
5    0.000    0.000    0.000    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:130(keypaths)
4184    0.006    0.000    0.007    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:159(items)
25104    0.053    0.000    0.173    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:223(get_closest_key)
25104    0.024    0.000    0.197    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:294(__contains__)
65560    0.148    0.000    0.617    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:172(__getitem__)
4189    0.035    0.000    0.709    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:63(__init__)
179337/29303    0.350    0.000    0.960    0.000 /dccstor/mm_hcls/usr/sagi/fuse_2/fuse/utils/ndict.py:191(__setitem__)

Optimized

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

5    0.000    0.000    0.000    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:106(keys)
4184    0.006    0.000    0.007    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:115(items)
20925    0.031    0.000    0.035    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:100(keypaths)
4184    0.042    0.000    0.110    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:172(get_sub_dict)
90664/82296    0.048    0.000    0.191    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:129(__getitem__)
16736    0.105    0.000    0.198    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:149(is_prefix)
25104    0.056    0.000    0.228    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:224(get_closest_key)
25104    0.017    0.000    0.245    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:303(__contains__)
8373    0.070    0.000    0.368    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:47(__init__)
133313/133263    0.125    0.000    0.500    0.000 /dccstor/mm_hcls/usr/sagi/fuse_3/fuse/utils/ndict.py:193(__setitem__)

…nto sagi/ndict_opt

…ncollate() func + mypy fixe

SagiPolaczek · 2023-02-14T08:25:21Z

examples/fuse_examples/multimodality/ehr_transformer/dataset.py


        # convert continuous measurements to categorical ones based
        # on defined bins mapping static clinical characteristics
        # (Age, Gender, ICU type, Height, etc)
        for k in sample_dict["StaticDetails"]:
-            sample_dict["StaticDetails"][k] = k + "_" + str(np.digitize(sample_dict["StaticDetails"][k], bins[k]))
+            sample_dict[f"StaticDetails.{k}"] = k + "_" + str(np.digitize(sample_dict["StaticDetails"][k], bins[k]))


A small fix to match the new NDict impl.

Since we now don't return a "real" nested dict, we can't change a returned sub-dict and expect the changes to be reflected in the original dictionary.

This kind of fix might be needed to be applied on other projects that are not covered by the CI tests.

SagiPolaczek · 2023-02-14T08:28:01Z

examples/fuse_examples/multimodality/ehr_transformer/main_train.py

-        train_metrics["gender_auc"] = Filter(
-            MetricAUCROC(pred="model.output.gender", target="Gender"),
-            "filter",
-            pre_collect_process_func=filter_gender_label_unknown_for_metric,


self-reminder:

open an issue on the pre_collect_process_func that cause to disproportionate amount of NDict's __init__ calls.

SagiPolaczek · 2023-02-14T08:33:31Z

fuse/utils/data/collate.py

-        for key in keys:
-            if isinstance(batch[key], torch.Tensor):
+        for key in batch.keys():
+            if isinstance(batch[key], (torch.Tensor, np.ndarray, list)):


@mosheraboh

I didn't understand why we first check for Tensors and ndarrays separately so I changed it..

It was safer, but let's change and see if someone encounter an issue

SagiPolaczek · 2023-02-14T08:40:49Z

fuse/utils/ndict.py

+            return self._stored[key]
+
+        # the key is a prefix for other value(s)
+        elif self.is_prefix(key):  # TODO can be more optimized. we pass here once and in the "get_sub_dict" once again


Please see TODO comment.

In my opinion is it enough to leave it like that for now (readability & code reuse VS optimization)

remove is_prefix.
Instead return None is get_subdict

SagiPolaczek · 2023-02-14T08:46:06Z

fuse/utils/ndict.py

+    def __delitem__(self, key: str) -> None:
+        """
+        :param key:
+        TODO should we delete both value and prefix ?


@mosheraboh

What do you think?

Currently we delete the value (and not the subdict!)
if the value doesn't exists but the subdict does, we delete the subdict.

ans (talked offline):
delete both

SagiPolaczek · 2023-02-14T08:49:44Z

Added a self-CR.

mosheraboh · 2023-02-14T08:58:03Z

fuse/utils/data/collate.py

-        for key in keys:
-            if isinstance(batch[key], torch.Tensor):
+        for key in batch.keys():
+            if isinstance(batch[key], (torch.Tensor, np.ndarray, list)):


It was safer, but let's change and see if someone encounter an issue

mosheraboh · 2023-02-14T09:00:32Z

fuse/utils/ndict.py

        in deep copy, all values are copied recursively
-        :param deepcopy: if true, does deep copy, otherwise does shalow copy
+
+        :param deepcopy: if true, does deep copy, otherwise does a shallow copy
        """
        if not deepcopy:
            return NDict(copy.copy(self._stored))


already_flat=True

mosheraboh · 2023-02-14T09:01:06Z

fuse/utils/ndict.py

-                NDict._flatten_static(value, cur_prefix, flat_dict)
-        else:
-            flat_dict[prefix] = item
+        return self._stored


return self instead

mosheraboh · 2023-02-14T09:10:36Z

fuse/utils/ndict.py

@@ -163,83 +123,139 @@ def merge(self, other: dict) -> NDict:
        """


change to get NDict input

mosheraboh · 2023-02-14T09:11:09Z

fuse/utils/ndict.py

            self[k] = v

-        return
+        return self


remove - and change signature

mosheraboh · 2023-02-14T09:14:51Z

fuse/utils/ndict.py

+            return self._stored[key]
+
+        # the key is a prefix for other value(s)
+        elif self.is_prefix(key):  # TODO can be more optimized. we pass here once and in the "get_sub_dict" once again


remove is_prefix.
Instead return None is get_subdict

mosheraboh · 2023-02-14T09:16:22Z

fuse/utils/ndict.py

+        suffix_key = None
+        for kk in self.keypaths():
+            if kk.startswith(prefix_key):
+                suffix_key = kk.replace(prefix_key, "", 1)


kk[len(prefix_key):]

mosheraboh · 2023-02-14T09:16:50Z

fuse/utils/ndict.py

+                res[suffix_key] = self[kk]
+
+        if suffix_key is None and key not in self:
+            raise NestedKeyError(key, self)


return None

no need for key not in self

mosheraboh · 2023-02-14T09:20:02Z

fuse/utils/ndict.py

-        # set the value
-        element[nested_key[-1]] = value
+        # delete entire branch
+        elif self.is_prefix(key):


mosheraboh · 2023-02-14T09:23:46Z

fuse/utils/ndict.py

+        if key in self._stored:
+            return key
+
+        key_parts = key.split(".")


use similarity between strings

mosheraboh

Looks great! 🚀

mosheraboh · 2023-02-14T13:41:09Z

fuse/utils/ndict.py


-    def keypaths(self) -> List[str]:
+    def keypaths(self) -> dict_keys:
        """
        :return: a list of keypaths (i.e. "a.b.c.d") to all values in the nested dict
        """
        return list(self._stored.keys())


remove the list

mosheraboh · 2023-02-14T13:41:40Z

fuse/utils/ndict.py

+        """
+        return self.keypaths()
+
+    def top_level_keys(self) -> dict_keys:


-> List[str]

Sagi Polaczek added 3 commits February 7, 2023 16:17

first commit

3f68785

optimizing and mypy changes

6ac5dc1

progress in opt ndict. still draft

13142f4

SagiPolaczek added the enhancement New feature or request label Feb 7, 2023

SagiPolaczek marked this pull request as draft February 7, 2023 19:01

Sagi Polaczek added 16 commits February 7, 2023 22:39

fix ci failures

1c183e4

Merge branch 'master' into sagi/ndict_opt

3a2bf8c

some fixes to opt + more unittesting

c70159f

more fixes & unit tests

3b9a852

minor docu

e92b635

Merge branch 'master' into sagi/ndict_opt

1ae9e43

adopt EHR dataset to the opt ndict

d77a22c

minors

06eb7ba

Merge branch 'master' into sagi/ndict_opt

ae52af4

Merge branch 'master' into sagi/ndict_opt

cdf2160

Merge branch 'sagi/ndict_opt' of github.com:BiomedSciAI/fuse-med-ml i…

1c2f22f

…nto sagi/ndict_opt

after some cleanup

b1842f5

minors before profiling

3bf8abe

fixed something in EHR's main + fixed ndict keys() func + minors in u…

6c04780

…ncollate() func + mypy fixe

fixes in (mainly) print_tree() func

45ea62c

minors before CR

015559d

SagiPolaczek commented Feb 14, 2023

View reviewed changes

Merge branch 'master' into sagi/ndict_opt

fca45b8

SagiPolaczek marked this pull request as ready for review February 14, 2023 08:51

SagiPolaczek requested a review from mosheraboh February 14, 2023 08:51

mosheraboh reviewed Feb 14, 2023

View reviewed changes

address CR + changed 'keys()' - 'top_level_keys()'

c1f594d

SagiPolaczek requested a review from mosheraboh February 14, 2023 13:42

mosheraboh previously approved these changes Feb 14, 2023

View reviewed changes

returns dict_keys object

cdad20b

SagiPolaczek dismissed mosheraboh’s stale review via cdad20b February 14, 2023 13:54

SagiPolaczek requested a review from mosheraboh February 14, 2023 14:12

mosheraboh approved these changes Feb 14, 2023

View reviewed changes

SagiPolaczek merged commit b01394f into master Feb 14, 2023

SagiPolaczek deleted the sagi/ndict_opt branch March 8, 2023 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`NDict` optimization #271

`NDict` optimization #271

SagiPolaczek commented Feb 7, 2023 •

edited

SagiPolaczek Feb 14, 2023

SagiPolaczek Feb 14, 2023

SagiPolaczek Feb 14, 2023

mosheraboh Feb 14, 2023

SagiPolaczek Feb 14, 2023

mosheraboh Feb 14, 2023

SagiPolaczek Feb 14, 2023

SagiPolaczek Feb 14, 2023

SagiPolaczek commented Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

mosheraboh left a comment

mosheraboh Feb 14, 2023

mosheraboh Feb 14, 2023

		@@ -163,83 +123,139 @@ def merge(self, other: dict) -> NDict:
		"""

NDict optimization #271

NDict optimization #271

Conversation

SagiPolaczek commented Feb 7, 2023 • edited

Profiling

EHR Transformer (5 epochs)

Not Optimized

Optimized

ISIC (2 epochs)

Not Optimized

Optimized

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SagiPolaczek commented Feb 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mosheraboh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`NDict` optimization #271

`NDict` optimization #271

SagiPolaczek commented Feb 7, 2023 •

edited