Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry [:commanded, :process_manager, :handle] #418

Merged
merged 4 commits into from
Jan 28, 2021

Conversation

davydog187
Copy link
Contributor

@davydog187 davydog187 commented Dec 7, 2020

Adds telemetry events for Process Managers.

Contributes to #387

Adds telemetry events for Process Managers.
process_manager_name: String.t() | Inspect.t(),
process_manager_module: module(),
process_state: term(),
process_uuid: String.t()}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also passes the commands dispatched

@davydog187
Copy link
Contributor Author

Some feedback from @slashdotdash a few weeks ago

Ben Smith @slashdotdash Dec 09 15:23
Did you look at what happens when events are retried on error as the handle_event_error function calls back into the process_unseen_event function when the error callback returns :retry?

Looking at how the telemetry is implemented I think it would do:

[:commanded, :process_manager, :handle, :start] (1st attempt)
[:commanded, :process_manager, :handle, :start] (2nd attempt)
[:commanded, :process_manager, :handle, :start] (3rd attempt)
[:commanded, :process_manager, :handle, :start] (Nth attempt)

[:commanded, :process_manager, :handle, :stop | :exception] (Nth attempt)

[:commanded, :process_manager, :handle, :stop | :exception] (3rd attempt)
[:commanded, :process_manager, :handle, :stop | :exception] (2nd attempt)
[:commanded, :process_manager, :handle, :stop | :exception] (1st attempt)

Dave Lucia @davydog187 Dec 09 15:23
i didn't, its a good point
i'll add some tests to validate, what would you expect to happen if that occurs?

Ben Smith @slashdotdash Dec 09 15:27
The event handler telemetry does the following:
[:commanded, :process_manager, :handle, :start] (1st attempt)
[:commanded, :process_manager, :handle, :stop | :exception] (1st attempt)
[:commanded, :process_manager, :handle, :start] (2nd attempt)
[:commanded, :process_manager, :handle, :stop | :exception] (2nd attempt)
[:commanded, :process_manager, :handle, :start] (Nth attempt)
[:commanded, :process_manager, :handle, :stop | :exception] (Nth attempt)

Dave Lucia @davydog187 Dec 09 15:27


Ben Smith @slashdotdash Dec 09 15:27
The telemetry span covers each attempt and stop (error) or exception
It might be worth including command dispatch in the Process manager telemetry too

Dave Lucia @davydog187 Dec 09 15:28
it does on the stop event
i think i missed that in the docs

Ben Smith @slashdotdash Dec 09 15:28
Both event handling and invidual command dispatches can be failed and retried

Dave Lucia @davydog187 Dec 09 15:28
i'll add a comment
oh i see what you mean
having events just for the dispatch
i was thinking that the dispatcher should handle that, no?

Ben Smith @slashdotdash Dec 09 15:30
It could be:

[:commanded, :process_manager, :handle, :start]
for each command:

[:commanded, :process_manager, :dispatch, :start]
[:commanded, :process_manager, :dispatch, :stop]
after dispatching

[:commanded, :process_manager, :handle, :stop]
Since handling an event for a process manager instance covers the event handling and all command dispatching


{:stop, _error, _state} = reply ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this was here previously, it's not part of the handle/2 API

Copy link
Member

@slashdotdash slashdotdash Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{:stop, reason :: term()} is a supported return value for a process manager's error/3 callback function.

It isn't required, so safe to remove.

Comment on lines +216 to +221
commands
|> List.wrap()
|> dispatch_commands(opts, state, event)
|> case do
:ok ->
telemetry_stop(start_time, telemetry_metadata, commands)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited this slightly as the previous formatting was tripping my brain for some reason. I can change back if you prefer the with

_metadata}
end

test "events are emitted with discrete start/stop on retries" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test doesn't cover all retry cases, covering every one would be a bit of a burden, but i'm happy to do it if we think its worthwhile.

[:commanded, :process_manager, :handle, :exception]
],
fn event_name, measurements, metadata, reply_to ->
num = Agent.get_and_update(agent, fn num -> {num, num + 1} end)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decorated with an incrementing number here so i could make sure the assert_receive's were being received in the right order. Without this you don't get that guarantee, so the tests were misleading

@slashdotdash slashdotdash merged commit 8b35122 into commanded:master Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants