Add event broadcasting capability #672

ankona · 2024-08-21T20:16:26Z

No description provided.

codecov · 2024-08-21T20:32:31Z

Codecov Report

Attention: Patch coverage is 0% with 183 lines in your changes missing coverage. Please review.

Please upload report for BASE (mli-feature@5d85995). Learn more about missing BASE report.

Files	Patch %	Lines
...mli/infrastructure/storage/backbonefeaturestore.py	0.00%	162 Missing ⚠️
...m/_core/mli/infrastructure/storage/featurestore.py	0.00%	18 Missing ⚠️
...e/mli/infrastructure/storage/dragonfeaturestore.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             mli-feature     #672   +/-   ##
==============================================
  Coverage               ?   69.84%           
==============================================
  Files                  ?      103           
  Lines                  ?     8708           
  Branches               ?        0           
==============================================
  Hits                   ?     6082           
  Misses                 ?     2626           
  Partials               ?        0

Files	Coverage Δ
...e/mli/infrastructure/storage/dragonfeaturestore.py	`0.00% <0.00%> (ø)`
...m/_core/mli/infrastructure/storage/featurestore.py	`0.00% <0.00%> (ø)`
...mli/infrastructure/storage/backbonefeaturestore.py	`0.00% <0.00%> (ø)`

smartsim/_core/mli/comm/channel/channel.py

smartsim/_core/mli/comm/channel/dragonchannel.py

AlyssaCote · 2024-08-21T21:50:27Z

smartsim/_core/mli/comm/channel/dragonchannel.py

+            # todo: consider that this could (under load) never exit. do we need
+            # to configure a maximum number to pull at once?


Hmmmm didn't think about this. We'd want all of the messages within the channel though, right?

i think it's possible with a while to never run out of new messages to receive. it doesn't even need to be a ton of buffered messages, just constant.

i think a for i in range(max_message_retrievals) may ensure that it doesn't get stuck and any loop triggering retrieval would end up getting called again, anyway...

I wonder if this should be the responsibility of the caller and not DragonCommChannel. It seems like recv might be less prone to edge case problems like the one described if it just simply returned the next message on the channel or None if the channel is empty. It is then up to the caller to takes as many messages as it should based on the context of what it is doing. Is this a good idea / bad idea / too big of a change / nonsense? Another approach maybe is that the timeout is not reset but it is decremented each iteration so it is applied across all messages? Something strikes me as too many responsibilities or unintuitive about the way this is designed now.

smartsim/_core/mli/comm/channel/dragonchannel.py

smartsim/_core/mli/infrastructure/storage/featurestore.py

AlyssaCote · 2024-08-21T22:28:53Z

tests/test_featurestore.py

+# The tests in this file belong to the group_a group
+pytestmark = pytest.mark.group_a


I think these tests are getting skipped in group a

it's ok for them to get skipped if dragon isn't installed). that will be done on purpose with dragon = pytest.importorskip("dragon")

However, if the dragon library is there, we can run all of these tests without needing a "real dragon" environment (only need the ability to import the dragon components/modules successfully). For example, these will run on hotlum w/a normal pytest like pytest ./tests/test_featurestore.py

Okay so we're cool with these tests being skipped in github actions then. Just wanted to double check.

I think you've still got a valid point... I didn't achieve what I thought I did, after reviewing the gh action. Investigating...

smartsim/_core/mli/infrastructure/storage/featurestore.py

tests/dragon/utils/channel.py

tests/mli/channel.py

tests/dragon/test_featurestore.py

tests/test_featurestore.py

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

tests/dragon/test_featurestore.py

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

mellis13

Just some comments before seeing the integration additions.

smartsim/_core/mli/comm/channel/dragonchannel.py

mellis13 · 2024-08-26T03:28:36Z

smartsim/_core/mli/comm/channel/dragonchannel.py

+        :param channel: a channel to use for communications
+        :param recv_timeout: a default timeout to apply to receive calls"""
+        serialized_ch = channel.serialize()
+        safe_descriptor = base64.b64encode(serialized_ch).decode("utf-8")


What do you mean by a "safe" descriptor? Have we run into a bug/issue?

smartsim/_core/mli/comm/channel/dragonchannel.py

mellis13 · 2024-08-26T19:33:02Z

smartsim/_core/mli/comm/channel/dragonchannel.py

+            # todo: consider that this could (under load) never exit. do we need
+            # to configure a maximum number to pull at once?


I wonder if this should be the responsibility of the caller and not DragonCommChannel. It seems like recv might be less prone to edge case problems like the one described if it just simply returned the next message on the channel or None if the channel is empty. It is then up to the caller to takes as many messages as it should based on the context of what it is doing. Is this a good idea / bad idea / too big of a change / nonsense? Another approach maybe is that the timeout is not reset but it is decremented each iteration so it is applied across all messages? Something strikes me as too many responsibilities or unintuitive about the way this is designed now.

smartsim/_core/mli/comm/channel/dragonchannel.py

mellis13 · 2024-08-26T22:31:22Z

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

+        return f"{self.uid}|{self.category}"
+
+
+class OnCreateConsumer(EventBase):


Consumer is used throughout this class (and the EventCategory). Would Subscriber be more accurate?

mellis13 · 2024-08-26T22:57:44Z

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

+
+    CONSUMER_CREATED: str = "consumer-created"
+    FEATURE_STORE_WRITTEN: str = "feature-store-written"
+    UNKNOWN: str = "unknown"


Is UNKNOWN used by other packages? I was wondering if DEFAULT is better

mellis13 · 2024-08-26T23:11:20Z

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

+        :raises SmartSimError: if any unexpected error occurs during send"""
+        try:
+            self._save_to_buffer(event)
+


Extra space?

mellis13 · 2024-08-26T23:33:14Z

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

+                try:
+                    comm_channel.send(next_event)
+                    num_sent += 1
+                except Exception as ex:


If one channel broadcast fails, then an error is raised. This means that other channels that may be OK will never get the message because it was already popped and not placed back on the deque. I wonder if we should stored failed sends and still go through other channels? We would still need to think about if the failed broadcasts should be retired or just thrown away...

mellis13 · 2024-08-26T23:33:25Z

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py

+        :raises SmartSimError: if any unexpected error occurs during send"""
+        try:
+            self._save_to_buffer(event)
+


extra space?

ankona force-pushed the 747 branch from 57691c5 to f246271 Compare August 21, 2024 20:19

ankona changed the title ~~Add event broadcasting via XxxCommChannel~~ Add event broadcasting capability Aug 21, 2024

ankona requested a review from AlyssaCote August 21, 2024 20:20

ankona marked this pull request as ready for review August 21, 2024 20:43

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/comm/channel/channel.py Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/comm/channel/dragonchannel.py Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/comm/channel/dragonchannel.py Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/featurestore.py Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/featurestore.py Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/featurestore.py Show resolved Hide resolved

ankona commented Aug 21, 2024

View reviewed changes

tests/dragon/utils/channel.py Outdated Show resolved Hide resolved

ankona commented Aug 21, 2024

View reviewed changes

tests/mli/channel.py Outdated Show resolved Hide resolved

ankona commented Aug 21, 2024

View reviewed changes

tests/dragon/test_featurestore.py Outdated Show resolved Hide resolved

ankona requested review from al-rigazzi, juliaputko and mellis13 August 21, 2024 22:46

ankona commented Aug 21, 2024

View reviewed changes

tests/test_featurestore.py Show resolved Hide resolved

ankona commented Aug 21, 2024

View reviewed changes

tests/test_featurestore.py Outdated Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py Show resolved Hide resolved

ankona commented Aug 21, 2024

View reviewed changes

tests/dragon/test_featurestore.py Outdated Show resolved Hide resolved

ankona commented Aug 21, 2024

View reviewed changes

tests/dragon/test_featurestore.py Outdated Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py Outdated Show resolved Hide resolved

AlyssaCote reviewed Aug 21, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py Show resolved Hide resolved

AlyssaCote reviewed Aug 22, 2024

View reviewed changes

smartsim/_core/mli/infrastructure/storage/backbonefeaturestore.py Show resolved Hide resolved

mellis13 reviewed Aug 26, 2024

View reviewed changes

ankona force-pushed the 747 branch from e221446 to 0a59ba8 Compare August 28, 2024 12:27

add event broadcasting

3f54739

ankona force-pushed the 747 branch from 0a59ba8 to 3f54739 Compare August 28, 2024 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add event broadcasting capability #672

Add event broadcasting capability #672

ankona commented Aug 21, 2024

codecov bot commented Aug 21, 2024 •

edited

Loading

AlyssaCote Aug 21, 2024

ankona Aug 21, 2024 •

edited

Loading

mellis13 Aug 26, 2024

AlyssaCote Aug 21, 2024

ankona Aug 21, 2024

AlyssaCote Aug 21, 2024

ankona Aug 21, 2024

mellis13 left a comment

mellis13 Aug 26, 2024

mellis13 Aug 26, 2024

mellis13 Aug 26, 2024

mellis13 Aug 26, 2024

mellis13 Aug 26, 2024

mellis13 Aug 26, 2024

mellis13 Aug 26, 2024

		# todo: consider that this could (under load) never exit. do we need
		# to configure a maximum number to pull at once?

		# The tests in this file belong to the group_a group
		pytestmark = pytest.mark.group_a

		return f"{self.uid}\|{self.category}"


		class OnCreateConsumer(EventBase):

Add event broadcasting capability #672

Are you sure you want to change the base?

Add event broadcasting capability #672

Conversation

ankona commented Aug 21, 2024

codecov bot commented Aug 21, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

ankona Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mellis13 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 21, 2024 •

edited

Loading

ankona Aug 21, 2024 •

edited

Loading