Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[@card] Realtime cards (1/N) #1550

Merged
merged 18 commits into from Dec 5, 2023
Merged

[@card] Realtime cards (1/N) #1550

merged 18 commits into from Dec 5, 2023

Conversation

valayDave
Copy link
Collaborator

@valayDave valayDave commented Sep 26, 2023

Realtime Cards

PR Stack

  1. Interface changes to current.card and machinery needed to render the cards / data-updates from the user code ([@card] Realtime cards (1/N) #1550) [Review-able]
  2. Metaflow Card UI Code changes to support real-time updates of components ([@card] Realtime cards (2/N) #1551) [Review-able]
  3. Make internal MetaflowCard and some MetaflowCardComponents real-time updatable ([@card] Realtime cards (3/N)  #1552) [Review-able]
  4. Introduce a card server command to launch a local card server that will allow viewing cards in "realtime" ([@card] Realtime cards (4/N) #1553) [WIP]

Metaflow UI Changes PRs

  1. Expose API in the Metaflow UI Service [[realtime-cards] backend API Methods metaflow-service#402]
  2. Make Client side changes in the Netflix/metaflow-ui

Implementation Memo

Key Changes in the PR

  1. Introduce the current.card.refresh interface for rendering / pushing data updates
  2. Introduce the current.card.components interface to update components
  3. Introduce the MetaflowCardComponent.update interface
  4. Introduce the MetaflowCard.render_runtime and MetaflowCard.refresh interfaces to render cards during runtime and handle the logic for data updates
  5. Lift card creation logic outside the card decorator so it can be called independently even if multiple decorators are given to the same card. We needed to change this because the current.card object is a singleton that is only instantiated once per task and is responsible for catching runtime rendering calls. Since all the card rendering logic was inside the instance of a decorator, the individual instance of a decorator was bound to only a single card. This PR lifts that logic into a separate abstraction called CardCreator

@valayDave valayDave marked this pull request as ready for review September 26, 2023 22:47
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstraction introduced to ensure that card creation logic can be safely handled by the CardComponentCollector class

@@ -88,15 +89,15 @@ def __init__(self, flow_datastore, pathspec=None):
self._temp_card_save_path = self._get_write_path(base_pth=TEMP_DIR_NAME)

@classmethod
def get_card_location(cls, base_path, card_name, card_html, card_id=None):
chash = sha1(bytes(card_html, "utf-8")).hexdigest()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from using hash to uuids since hashes will change for cards rendered during runtime.

Comment on lines +188 to +184
self.attributes,
self.card_options,
editable=self._is_editable,
customize=customize,
runtime_card=self._is_runtime_card,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional arguments are provided to _add_card so that they can be piped down when rendering a new card using the CardCreator class.

Comment on lines -220 to -280
if len(component_strings) > 0:
temp_file = tempfile.NamedTemporaryFile("w", suffix=".json")
json.dump(component_strings, temp_file)
temp_file.seek(0)
executable = sys.executable
cmd = [
executable,
sys.argv[0],
]
cmd += self._create_top_level_args() + [
"card",
"create",
runspec,
"--type",
self.attributes["type"],
# Add the options relating to card arguments.
# todo : add scope as a CLI arg for the create method.
]
if self.card_options is not None and len(self.card_options) > 0:
cmd += ["--options", json.dumps(self.card_options)]
# set the id argument.

if self.attributes["timeout"] is not None:
cmd += ["--timeout", str(self.attributes["timeout"])]

if self._user_set_card_id is not None:
cmd += ["--id", str(self._user_set_card_id)]

if self.attributes["save_errors"]:
cmd += ["--render-error-card"]

if temp_file is not None:
cmd += ["--component-file", temp_file.name]

response, fail = self._run_command(
cmd, os.environ, timeout=self.attributes["timeout"]
)
if fail:
resp = "" if response is None else response.decode("utf-8")
self._logger(
"Card render failed with error : \n\n %s" % resp,
timestamp=False,
bad=True,
)

def _run_command(self, cmd, env, timeout=None):
fail = False
timeout_args = {}
if timeout is not None:
timeout_args = dict(timeout=int(timeout) + 10)
try:
rep = subprocess.check_output(
cmd, env=env, stderr=subprocess.STDOUT, **timeout_args
)
except subprocess.CalledProcessError as e:
rep = e.output
fail = True
except subprocess.TimeoutExpired as e:
rep = e.output
fail = True
return rep, fail
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this logic got lifted into card_creator.py

Comment on lines +35 to +56
# RELOAD_POLICY determines whether UIs should
# reload intermediate cards produced by render_runtime
# or whether they can just rely on data updates

# the UI may keep using the same card
# until the final card is produced
RELOAD_POLICY_NEVER = "never"

# the UI should reload card every time
# render_runtime() has produced a new card
RELOAD_POLICY_ALWAYS = "always"

# derive reload token from data and component
# content - force reload only when the content
# changes. The actual policy is card-specific,
# defined by the method reload_content_token()
RELOAD_POLICY_ONCHANGE = "onchange"

# this token will get replaced in the html with a unique
# string that is used to ensure that data updates and the
# card content matches
RELOAD_POLICY_TOKEN = "[METAFLOW_RELOAD_TOKEN]"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important note on how card reload is expected to behave.

class WarningComponent(ErrorComponent):
def __init__(self, warning_message):
super().__init__("@card WARNING", warning_message)


class ComponentStore:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this abstraction to handle under-the-hood manipulation of components when users call current.card.components

Comment on lines 178 to 209
## Usage Patterns :

```python
current.card["mycardid"].append(component, id="comp123")
current.card["mycardid"].extend([component])
current.card["mycardid"].refresh(data) # refreshes the card with new data
current.card["mycardid"].components["comp123"] # returns the component with id "comp123"
current.card["mycardid"].components["comp123"].update()
current.card["mycardid"].components.clear() # Wipe all the components
del current.card["mycardid"].components["mycomponentid"] # Delete a component
current.card["mycardid"].components["mynewcomponent"] = Markdown("## New Component") # Set a new component
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage patterns for current.card["abc"] or current.card

return len(self._components)


class CardComponentManager:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • CardComponentCollector: manages all cards decorators present on a @step
  • CardComponentManager : manages the additon/remove/updates of individual components for a specific cards. This is returned when users call current.card["abc"]
  • ComponentStore : Abstraction to handle the actual operations (append/extend/updates) over the components when users access components through the CardComponentManager.components

return

# FIXME document
def refresh(self, task, data):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structure of data is given by the return value of CardComponentManager._get_latest_data

Copy link
Contributor

@romain-intel romain-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully done reviewing. First pass. I'll circle back after I looked at all the PRs to get a better understanding instead of asking stupid questions first. SOme comments though.

@@ -32,9 +32,34 @@ class MetaflowCard(object):
JSON-encodable dictionary containing user-definable options for the class.
"""

# RELOAD_POLICY determines whether UIs should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase this a bit. If I understood correctly, reload really refers to the "html" or basically the "non-data" part and refers to things like change of layout and what not. I would change the variable to something like FULL_RELOAD_POLICY or something that shows that it is not just about "reloading" data. It can be a bit confusing ow since there are actually two "loads:

  • the card itself (layout, etc)
  • the data.

The data is reloaded much more frequently.

metaflow/plugins/cards/card_modules/card.py Show resolved Hide resolved
metaflow/plugins/cards/card_modules/card.py Show resolved Hide resolved
metaflow/plugins/cards/card_modules/card.py Outdated Show resolved Hide resolved
metaflow/plugins/cards/component_serializer.py Outdated Show resolved Hide resolved
return _add_token_html(mf_card.render(task, data))
else:
return _add_token_html(mf_card.render(task))
elif mode == "render_runtime":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should check for both that data is not None and has the appropriate field. It uses it in the reload_token.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Fix!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This function now robustly handles many cases. I have also added a comment that explains what it is doing. I also added more comments in the card create function that helps understand the behavior for different modes and failure scenarios.

except:
if render_error_card:
error_stack_trace = str(UnrenderableCardException(type, options))
else:
raise UnrenderableCardException(type, options)
#

if error_stack_trace is not None:
if error_stack_trace is not None and mode != "refresh":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused on the mode "refresh" vs "render_runtime"? Is this a typo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, It's not a typo. Currently, the card_cli command takes three modes:

  1. refresh: Involves passing JSON data used for data updates (JSON shipped to S3)
  2. render_runtime: Involves rendering the card during runtime (HTML shipped to S3)
  3. render: Involves rendering the card after runtime has completed execution. (HTML shipped to S3)

If the mode is refresh, we need to ensure that we don't create error cards, which happens in the line below this line.

data_file = tempfile.NamedTemporaryFile("w", suffix=".json", delete=False)
json.dump(data, data_file)
data_file.seek(0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would close both files here if we are done with them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File deletion is taken care of by the cli using the --delete-input-files argument in card cli. So overall if the file wasn't closed, it shouldn't be an issue, since the tempfile will be out of scope once the function completes execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably prefer a pattern like

with tempfile....:
  # do stuff

as that would ensure cleanup even if the subprocess has an issue and never gets to cleanup. Here it feels like there are more cases where files can leak and stick around. We could use a similar mechanism to the one in S3 if you wanted to keep them around for debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context here.

many of these commands will run in async mode, so if we use a context manager, then we may delete the file before:

  1. the async process is created
    or
  2. the async process has read the file.

The card create cli command deletes the file right after reading it.

metaflow/plugins/cards/component_serializer.py Outdated Show resolved Hide resolved
It exposes the `append` /`extend` methods like an array to add components.
It also exposes the `__getitem__`/`__setitem__` methods like a dictionary to access the components by thier Ids.

The reason this has dual behavior is because components cannot be stored entirely as a map/dictionary because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that dicts are ordered per the language starting in py 3.7 (insertion order) so we may be able to simplify this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! It will certainly remove all the complexity of managing/updating two objects (a dict and list). We can keep the class as is but make its internal implementation much much simpler!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't changed yet right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't changed; I think changing this will result in breaking functionality for anything < py37; Currently cards can work < py37

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are fine with new functionality not working for such old versions of python. This will actually work in python 3.6+ so it's really just 3.5. Up to you but it would simplify things a bit :) .

@valayDave valayDave force-pushed the valay/rtc-stack-0 branch 4 times, most recently from 21d21fb to fd1502d Compare October 18, 2023 00:44
Copy link
Collaborator Author

@valayDave valayDave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed @romain-intel 's comments and added a little more code. Here is the summary of all the new changes added to the code :

  1. card_cli.py
    • Refactored the update_card method to handle many different cases gracefully.
    • The logic of the function is the same, it just gracefully handles the cases where render_runtime/refresh are not implemented
    • The main logic responsible rendering error cards now robustly handles cases of timeout and missing implementation of the render_runtime method.
    • Added more comments that help with the readability of the code.
  2. card.py
    • rename MetaflowCard.IS_RUNTIME_CARD to MetaflowCard.RUNTIME_UPDATABLE
    • rename MetaflowCardComponent.id to MetaflowCardComponent.component_id
  3. card_creator.py:
    • fix bug for the edge case were async sidecars handling card creation may not have completed execution.
  4. card_decorator.py:
    • Refactor based on the rename of MetaflowCard.IS_RUNTIME_CARD to MetaflowCard.RUNTIME_UPDATABLE
  5. component_serializer.py:
    • Refactor based on the rename of MetaflowCardComponent.id to MetaflowCardComponent.component_id
    • Remove the FIXME in the __setitem__ function to not allow overwriting to card components via current.card.components[ID] = NewComponent
    • Add functionality to the current.card.refresh method that allows re-rendering cards when the layout of the card has changed. This mean that when new components are added to a card or when components are removed from a card then a re-render should be triggered. Modified the ComponentStore to expose a property that tells when the component array was last modified. The CardComponentManager uses this property in the ComponentStore to determine if a re-render should be triggered.
    • Add checks to ensure that objects passed to current.card.refresh(data) are JSON serializable. If these are not JSON serializable then warn the user that they are not JSON serializable.
  6. exception.py
    • Introduce a new exception for when users try to over-write components.

Comment on lines 710 to 731
# In the entire card rendering process, there are a few cases we want to handle:
# - [mode == "render"]
# 1. Card is rendered successfull (We store it in the datastore as a HTML file)
# 2. Card is not rendered successfully and we have --save-error-card flag set to True
# (We store it in the datastore as a HTML file with stack trace)
# 3. Card render timeout and we have --save-error-card flag set to True
# (We store it in the datastore as a HTML file with stack trace)
# 4. `render` returns nothing and we have --save-error-card flag set to True.
# (We store it in the datastore as a HTML file with some message saying you returned nothing)
# - [mode == "render_runtime"]
# 1. Card is rendered successfully (We store it in the datastore as a HTML file)
# 2. `render_runtime` is not implemented but gets called and we have --save-error-card flag set to True.
# (We store it in the datastore as a HTML file with some message saying the card should not be a runtime card if this method is not Implemented)
# 3. `render_runtime` is implemented and raises an exception and we have --save-error-card flag set to True.
# (We store it in the datastore as a HTML file with stack trace)
# 4. `render_runtime` is implemented but returns nothing and we have --save-error-card flag set to True.
# (We store it in the datastore as a HTML file with some message saying you returned nothing)
# 5. `render_runtime` is implemented but times out and we have --save-error-card flag set to True.
# (We store it in the datastore as a HTML file with stack trace)
# - [mode == "refresh"]
# 1. Data update is created successfully (We store it in the datastore as a JSON file)
# 2. `refresh` is not implemented. (We do nothing. Don't store anything.)
# 3. `refresh` is implemented but it raises an exception. (We do nothing. Don't store anything.)
# 4. `refresh` is implemented but it times out. (We do nothing. Don't store anything.)

Copy link
Collaborator Author

@valayDave valayDave Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romain-intel : since the code block below has a fair amounts of complexity in it; this comment is meant to simplify the understanding of its behavior. I did this because we now have more modes to the command than earlier.

Comment on lines 738 to 768
if (
rendered_info.is_implemented
and rendered_info.timed_out
and mode != "refresh"
and render_error_card
):
timeout_stack_trace = (
"\nCard rendering timed out after %s seconds. "
"To increase the timeout duration for card rendering, please set the `timeout` parameter in the @card decorator. "
"\nStack Trace : \n%s"
) % (timeout, rendered_info.timeout_stack_trace)
rendered_content = error_card().render(
task,
stack_trace=timeout_stack_trace,
)
elif (
rendered_info.is_implemented
and rendered_info.data is None
and render_error_card
and mode != "refresh"
):
rendered_content = error_card().render(
task, stack_trace="No information rendered From card of type %s" % type
)
elif (
not rendered_info.is_implemented
and render_error_card
and mode == "render_runtime"
):
message = (
"Card of type %s is a runtime time card with no `render_runtime` implemented. "
"Please implement `render_runtime` method to allow rendering this card at runtime."
) % type
rendered_content = error_card().render(task, stack_trace=message)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tiny drive-by chore. This code block achieves the following:

  1. If a card has timed out then provide a nicer visible message of what timed out and the stack trace of the timeout
  2. If a card returned no information then inform the user that the render method returned nothing so nothing got rendered.
  3. If a card that gets set to RUNTIME_UPDATABLE doesn't have a render_runtime method implemented then inform that to the user.

Comment on lines 60 to 68
def _get_data(self) -> Optional[dict]:
# currently an internal method to retrieve a card's data.
data_paths = self._card_ds.extract_data_paths(
card_type=self.type, card_hash=self.hash, card_id=self._card_id
)
if len(data_paths) == 0:
return None
return self._card_ds.get_card_data(data_paths[0])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The card client now has an internal method to get data updates. This will be useful when building the card server and modifying the metaflow UI codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is going to be used by a card server/UI, I would make it public

@@ -172,7 +181,7 @@ def _get_card(self, index):
if index >= self._high:
raise IndexError
path = self._card_paths[index]
card_info = self._card_ds.card_info_from_path(path)
card_info = self._card_ds.info_from_path(path, suffix=CardNameSuffix.CARD)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by function rename since it's now got a multi-faceted utility.

The info_from_path can be used to infer information from a path for data updates / cards. info_from_path provides a named tuple containing hash / id / type etc.

Comment on lines +191 to +209
if _async_proc and _async_proc.poll() is None:
if time.time() - _async_started > async_timeout:
CardProcessManager._remove_card_process(card_uuid)
# Since we have removed the card process, we are free to run a new one
# This will also ensure that when a old process is removed a new one is replaced.
return self._run_command(
cmd, card_uuid, env, wait=wait, timeout=timeout
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a process took longer then timeout then kill the process and re-run the command that re-creates the process.

Comment on lines 118 to 131
@property
def component_id(self):
return self._component_id

@component_id.setter
def component_id(self, value):
if not isinstance(value, str):
raise TypeError("id must be a string")
self._component_id = value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romain-intel : Replaced id in favor of component_id

Comment on lines +107 to +125
if self._component_map.get(key) is not None:
# This is the equivalent of calling `current.card.components["mycomponent"] = Markdown("## New Component")`
# We don't support the replacement of individual components in the card.
# Instead we support rewriting the entire component array instead.
# So users can run `current.card[ID] = [FirstComponent, SecondComponent]` which will instantiate an entirely
# new ComponentStore.
# So we should throw an error over here, since it is clearly an operation which is not supported.
raise ComponentOverwriteNotSupportedException(
key, self._user_set_id, self._card_type
)
else:
self._store_component(value, component_id=key)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tuulos : FYI

What's allowed :

current.card["mycard_id"] = [FirstComponent, SecondComponent]

Whats NOT allowed:

current.card["mycard_id"].components["mycomponent"] = NewComponent

If users try to over-write individual components in a card then we raise an Exception. The key reason for doing this and not throwing warnings is to make it explicit and avoid any behaviors that confuse users. Here is an example to explain its reasoning. Consider the following code block :

current.card.append(Markdown("hi"), id="A")
current.card.components["A"].update("new Hi")

current.card.components["A"] = Image(...)
current.card.components["A"].update(new_image_bytes)

If we throw warnings instead of an exception, the above case will throw a TypeError or a ValueError since component A was still a markdown but the user assumed it was an image after setting current.card.components["A"] = Image(...). This may end up confusing the user. Instead, it's better to be completely unambiguous about supported behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Comment on lines 290 to 300
# This block of code will render the card in `render_runtime` mode when:
# 1. refresh is called with `force=True`
# 2. Layout of the components in the card has changed. i.e. The actual elements in the component array have changed.
# 3. The last time the card was rendered was more the minimum interval after which they should be rendered.
last_rendered_before_minimum_interval = (
nu - self._last_refresh
) > RUNTIME_CARD_RENDER_INTERVAL
layout_has_changed = (
self._last_layout_change != self.components.layout_last_changed_on
)

if force or last_rendered_before_minimum_interval or layout_has_changed:
self._render_seq += 1
self._last_render = nu
self._card_proc("render_runtime")
self._last_layout_change = self.components.layout_last_changed_on
else:
self._card_proc("refresh")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code logic that determines whether to call the card CLI in render_runtime mode or refresh mode.

Comment on lines +722 to +783
def components(self):
# FIXME: document
if self._default_editable_card is None:
if len(self._card_component_store) == 1:
card_type = list(self._cards_meta.values())[0]["type"]
_warning_msg = [
"Card of type `%s` is not an editable card." % card_type,
"Components list will not be updated and `current.card.components` will not work for any call during this runtime execution.",
"Please use an editable card", # todo : link to documentation
]
else:
_warning_msg = [
"`current.card.components` cannot disambiguate between multiple @card decorators.",
"Components list will not be accessible and `current.card.components` will not work for any call during this runtime execution.",
"To fix this set the `id` argument in all @card when using multiple @card decorators over a single @step and reference `current.card[ID].components`", # todo : Add Link to documentation
"to update/access the appropriate card component.",
]
if not self._warned_once["components"]:
self._warning(" ".join(_warning_msg))
self._warned_once["components"] = True
return

return self._card_component_store[self._default_editable_card].components
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tuulos : Something interesting is happening in this case where we may need to change the behavior. The current behavior of cards is that current.card.append will result in warnings when you try to add components to a card that doesn't exist or when current.card cannot figure out the default editable card.

Now that we are supporting runtime updates, we need to have a clear story of how we deal with this scenario:

# `current.card` cannot disambiguate the default editable card
current.card.append(componentA, id="a") # this will throw a warning. 

current.card.components["a"].update(...) # This will crash since currently we are not returning anything
componentA.update(...) # this will work, but nothing will change on the UI

The main point is that if append and extend only throw warnings, it will also affect how the code behaves when update is called. Should we now throw exceptions when users try to append to cards that cannot be disambiguated?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main case we want to handle is card updates hidden deep inside a library, referring to a card that you forgot to add --with card. To continue the policy that cards are best-effort and will never crash the user code, we could handle even

current.card.components["a"].update(...)

leniently and make a non-existing ID return a stub object that just shows a warning. I don't think this is a common case but being lenient sometimes and not other times is confusing too. We could just be always lenient - just show warnings, never crash

metaflow/plugins/cards/component_serializer.py Outdated Show resolved Hide resolved
@valayDave valayDave force-pushed the valay/rtc-stack-0 branch 3 times, most recently from 6890054 to ec58dd0 Compare October 19, 2023 21:48
Comment on lines +706 to +701
rendered_info = CardRenderInfo(
mode=mode,
is_implemented=True,
data=None,
timed_out=False,
timeout_stack_trace=None,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug discovered because of Tests.

@valayDave
Copy link
Collaborator Author

valayDave commented Oct 19, 2023

Tests are failing rn because they are timing out get_card method called in the testing code; We are calling get_card to verify that cards are getting created during runtime. This happens when we run all tests at the same time and not when we individually run only the card tests in the same gh action runner. Will keep posted once I find a fix.

Comment on lines +22 to +30
# We run realtime cards tests separately because there these tests validate the asynchronous updates to the
# information stored in the datastore. So if there are other processes starving resources then these tests will
# surely fail since a lot of checks have timeouts.
run_runtime_card_tests() {
CARD_GRAPHS="small-foreach,small-parallel,nested-branches,single-linear-step,simple-foreach"
cd test/core && PYTHONPATH=`pwd`/../../ python3 run_tests.py --num-parallel 8 --contexts python3-all-local-cards-realtime --graphs $CARD_GRAPHS && cd ../../
}

install_deps && install_extensions && run_tests && run_runtime_card_tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the changes from the latest commit I pushed. This will ensure that the realtime cards tests run after the normal metaflow test suite and are not included as a part of the test suite. The card tests will fail if there are too many processes; When too many processes are running it starves the metaflow's card-update processes of resources. Due to starvation of resources metaflow's card-update processes are unable to ship card/data updates and the test time's out. When ran in the happy case(i.e. no resource starvation) tests test succeed.

Nothing else has been changed in the latest rebase.

Copy link
Contributor

@romain-intel romain-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have a few high level concerns and a bunch of small comments. At a high level:

  • I think we can clean up the render/render_runtime methods a bit and make it clearer around what refreshable cards need.
  • I am not sure if we wouldn't need "section" in the data file given the breaking up of a card into components
  • I have some concerns around the launching of subprocesses and the possibility of having run-away ones.

@@ -68,8 +94,43 @@ def render(self, task) -> str:
"""
return NotImplementedError()

# FIXME document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please document. I've been trying to add docstrings and all that so please document according to that. In particular, metaflow.Task (and not Task) and Data can be Optional[Any] I suppose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tuulos ;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense - render takes a metaflow.Task always

@component_id.setter
def component_id(self, value):
if not isinstance(value, str):
raise TypeError("id must be a string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Component ID must be a string.

rendered_info = None
def update_card(mf_card, mode, task, data, timeout_value=None):
"""
This method will be resposible for returning creating a card/data-update based on the `mode` passed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: responsible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "returning creating"?

metaflow/plugins/cards/card_cli.py Show resolved Hide resolved
metaflow/plugins/cards/card_cli.py Outdated Show resolved Hide resolved
It exposes the `append` /`extend` methods like an array to add components.
It also exposes the `__getitem__`/`__setitem__` methods like a dictionary to access the components by thier Ids.

The reason this has dual behavior is because components cannot be stored entirely as a map/dictionary because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't changed yet right?

metaflow/plugins/cards/component_serializer.py Outdated Show resolved Hide resolved
or self._last_layout_change is None
)

if force or last_rendered_before_minimum_interval or layout_has_changed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can proably bypass stuff if force is true :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand on that? I didn't quite get what meant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I menat that the part before this can probably be skipped since we know we are going to force it.

metaflow/plugins/cards/card_creator.py Show resolved Hide resolved
metaflow/plugins/cards/card_creator.py Show resolved Hide resolved
metaflow/plugins/cards/card_creator.py Show resolved Hide resolved
metaflow/plugins/cards/card_datastore.py Show resolved Hide resolved
metaflow/plugins/cards/card_decorator.py Show resolved Hide resolved
metaflow/plugins/cards/card_decorator.py Show resolved Hide resolved
@@ -68,8 +94,43 @@ def render(self, task) -> str:
"""
return NotImplementedError()

# FIXME document
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tuulos ;

It exposes the `append` /`extend` methods like an array to add components.
It also exposes the `__getitem__`/`__setitem__` methods like a dictionary to access the components by thier Ids.

The reason this has dual behavior is because components cannot be stored entirely as a map/dictionary because
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't changed; I think changing this will result in breaking functionality for anything < py37; Currently cards can work < py37

metaflow/plugins/cards/component_serializer.py Outdated Show resolved Hide resolved
or self._last_layout_change is None
)

if force or last_rendered_before_minimum_interval or layout_has_changed:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand on that? I didn't quite get what meant

tuulos and others added 7 commits November 20, 2023 23:03
- added interface to MetaflowCardComponent for making it REALTIME_UPDATABLE
- add realtime component rendering capabiity.
- Changed ville's abstraction of CardComponents to CardComponentManager
- introduced the `current.card.components` / `current.card['abc'].components` interface
- `current.card.components` interface helps access/remove components
- abstraction to handle card creation for decorator (and in future outside code)
- Create CardProcessManager class that helps manage the card processes running
- Remove `_card_proc` logic from code.
- fix card-refresh-bug caused by cardproc instance-method refactor
- the `_card_proc` method is an instance method of a card decorator which is passed to componentCollector. This is done so that ComponentCollector can call the method when a refresh is called for an individual card.
- Since there is only one copy of the ComponentCollector it created an issue when other cards were trying to call refresh (since ComponentCollector was instantiated with a **single card decorator's** `_card_proc`)
- This commit refactored the code to handle multiple cards calling refresh
1. `card_cli.py`
    - Refactored the `update_card` method to handle many different cases gracefully.
    - The logic of the function is the same, it just gracefully handles the cases where `render_runtime`/`refresh` are not implemented
    - The main logic responsible rendering error cards now robustly handles cases of timeout and missing implementation of the `render_runtime` method.
    - Added more comments that help with the readability of the code.
2. `card.py`
    - rename `MetaflowCard.IS_RUNTIME_CARD` to `MetaflowCard.RUNTIME_UPDATABLE`
    - rename `MetaflowCardComponent.id` to `MetaflowCardComponent.component_id`
3. `card_creator.py`:
    - fix bug for the edge case were async sidecars handling card creation may not have completed execution.
4. `card_decorator.py`:
    - Refactor based on the rename of `MetaflowCard.IS_RUNTIME_CARD` to `MetaflowCard.RUNTIME_UPDATABLE`
5. `component_serializer.py`:
    - Refactor based on the rename of `MetaflowCardComponent.id` to `MetaflowCardComponent.component_id`
    - Remove the `FIXME` in the `__setitem__` function to not allow overwriting to card components via `current.card.components[ID] = NewComponent`
    - Add functionality to the `current.card.refresh` method that allows re-rendering cards when the layout of the card has changed. This mean that when new components are added to a card or when components are removed from a card then a re-render should be triggered. Modified the `ComponentStore` to expose a property that tells when the component array was last modified. The `CardComponentManager` uses this property in the `ComponentStore` to determine if a re-render should be triggered.
    - Add checks to ensure that objects passed to `current.card.refresh(data)` are JSON serializable. If these are not JSON serializable then warn the user that they are not JSON serializable.
    - introduced a `component_update_ts` filed in the return value of `current.card._get_latest_data` method.
    - Added comments to `current.card._get_lastest_data` for readability.
6. `exception.py`
    - Introduce a new exception for when users try to over-write components.
- Added methods to card datastore to read data updates
- Add a private method in card client that can help get the latest data.
- Refactored methods in the code to explictly provide all arguments to disambiguate between data reads and HTML reads.
- component-id setter
- safe-access components that don't exist
- warnings for `update` methods of un-implemented UserComponents
- warnings when components don't exist
… processes

- There is a case where we could have a async process and another non async process run in parallel. This is something we are not okay with because the async one(render_runtime) can override the values of the sync process.
- This commit fixes this race-condition.
- When we call the final render+refresh, we will wait for any async process to finish complete execution and ensure that both methods/processes complete execution.
- Introduced a `runtime_data` property that has access to the final data object when `render` is called.
- Remove the usage of `inspect`
Copy link
Contributor

@romain-intel romain-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I looked at the diff you sent me and quickly at this. I think it looks ok so we should be good for an initial merge. I haven't extensively played with it too much but I trust you ahve :).

@romain-intel romain-intel merged commit f8e4e6b into master Dec 5, 2023
22 checks passed
@romain-intel romain-intel deleted the valay/rtc-stack-0 branch December 5, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants