Success markers telemetry #10065

alwx · 2021-11-02T17:36:03Z

Proposed changes:

fix Success makers: telemetry #9830

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

alwx · 2021-11-02T17:38:29Z

This PR adds events specified in this comment: #9830 (comment)

usc-m

Not 100% sure if we should be referring to as "Markers Evaluation" - maybe "Evaluation - Success Markers" (@aeshky @ka-bu any opinions?). Otherwise looks good though!

usc-m · 2021-11-02T17:41:36Z

docs/docs/telemetry/events.json

+      ]
+    },
+    "Markers Stats Computed": {
+      "description": "Triggered when marker stats has been computed.",


Suggested change

"description": "Triggered when marker stats has been computed.",

"description": "Triggered when marker stats have been computed.",

docs/docs/telemetry/events.json

rasa/cli/evaluate.py

aeshky

Thanks for the PR @alwx.
Are we not tracking the number of markers defined? (I can see that it's not in the final list here). If not, why did we decide to drop it? It's informative because it tells us whether users have 1-2 KPIs or are abusing the feature by defining dozens of things to track.

docs/docs/telemetry/events.json

aeshky · 2021-11-02T18:51:49Z

docs/docs/telemetry/events.json

+      "required": [
+        "count"
+      ]


I think this is saying that count is required, however, it's only required when strategy is first_n or sample. Or am I misunderstanding how this is used?

Yep, that's a mistake, fixed it in the latest commit.

docs/docs/telemetry/events.json

changelog/10065.misc.md

docs/docs/telemetry/events.json

aeshky · 2021-11-02T18:57:19Z

rasa/cli/evaluate.py

@@ -123,6 +124,10 @@ def _run_markers(
        stats_file: (Optional) Path to write out statistics about the extracted
                    markers.
    """
+    telemetry.track_markers_evaluation_initiated(


different name suggestion: track_evaluate_markers_initiated?

the event is now called "Markers Extraction Initiated" (based on one of your suggestions) so I renamed this function to have a similar name

aeshky · 2021-11-02T19:33:09Z

@usc-m sorry I didn't read your comment until after I finished my review:

Not 100% sure if we should be referring to as "Markers Evaluation" - maybe "Evaluation - Success Markers"

I made a similar observation but suggested a different name. I like your suggestion: "Evaluation - Success Markers"

alwx · 2021-11-03T09:28:19Z

Re-requesting your review because I changed all the names and fixed the issue with required in events.json.

Are we not tracking the number of markers defined?

I think we can do this if it makes sense. When does this event need to be tracked?

aeshky · 2021-11-03T09:53:34Z

Are we not tracking the number of markers defined?

I think we can do this if it makes sense. When does this event need to be tracked?

What I mean is the number of markers in the config file. So as soon as the config file is processed we would know the number of markers. I can dig into the code to see where this happens.

@usc-m anything else you want to track? You suggested a measure of marker complexity. It's a good idea, but maybe not as trivial to implement so maybe we can add it later. What do you think?

usc-m · 2021-11-03T09:58:03Z

What I mean is the number of markers in the config file. So as soon as the config file is processed we would know the number of markers. I can dig into the code to see where this happens.

Number of markers in the config file should be a matter of counting the number of sub-markers of the marker returned from Marker.from_config in the CLI code I believe. (so len(markers.sub_markers) in _run_markers in rasa\cli\evaluate.py)

@usc-m anything else you want to track? You suggested a measure of marker complexity. It's a good idea, but maybe not as trivial to implement so maybe we can add it later. What do you think?

Not strictly necessary but something like maximum depth of nested markers, or how wide they get (maximum number of sub-markers under a marker that isn't top-level). Don't think these would be hard to add but also aren't as necessary - I think we'd get some useful info out about how the feature is used but it's something we could add later perhaps?

aeshky · 2021-11-03T11:03:17Z

Not strictly necessary but something like maximum depth of nested markers, or how wide they get (maximum number of sub-markers under a marker that isn't top-level). Don't think these would be hard to add but also aren't as necessary

It tells us how complex the conditions get, and thus how people are using (or potentially abusing?) Markers.

I think we'd get some useful info out about how the feature is used but it's something we could add later perhaps?

Definitely useful but not urgent. Let's just count how many Markers are defined for now 👍

aeshky · 2021-11-03T11:04:26Z

docs/docs/telemetry/events.json

+      "required": [
+        "strategy",
+        "only_extract",
+        "seed",
+        "count"


@usc-m are all of these required? 🤔 or just the top two?

From the perspective of telemetry I think it makes sense - we'd want to know if it's being used or not. The types of those are string or null so if it's not present we'd see an explicit null. Would be good to check with someone who understands the telemetry schema here better (though I think this is only used in the docs to explain what data we send back to users and isn't used to actually validate anything internally)

aeshky · 2021-11-03T11:05:02Z

docs/docs/telemetry/events.json

+      ]
+    },
+    "Markers Stats Computed": {
+      "description": "Triggered when marker stats has been computed.",


Suggested change

"description": "Triggered when marker stats has been computed.",

"description": "Triggered when marker statistics have been computed.",

Let's use the full word in text, and reserve "stats" for code :)

No way to actually change config path, now fixed with nargs

usc-m · 2021-11-03T14:51:09Z

OK, I think I've added the extra telemetry. Is there any other changes we need to make? Anything more to collect, names all fine etc.?

usc-m · 2021-11-03T14:52:17Z

rasa/telemetry.py

+        {
+            "strategy": strategy,
+            "only_extract": only_extract,
+            "seed": seed,


This will report the actual seed used by the user - do we want the actual seed or do we want to just know if they used a seed? Is this something that's worth changing or are we considering the actual seed value not important enough to be careful about not collecting?

We only want to know if the user set a seed. I don't think it counts as private info so maybe it doesn't matter if we collect it. But if we can do true/false that might be better.

ka-bu · 2021-11-03T15:15:02Z

rasa/cli/evaluate.py

@@ -139,6 +147,10 @@ def _run_markers(
            "Please see errors listed above and fix before running again."
        )

+    # Subtract one to remove the virtual OR over all markers
+    num_markers = len(markers) - 1


this is the total number of all conditions and operators used in all marker configurations that were evaluated -- to get the number of user-defined markers we should

get rid of the special case here and always add an "ANY_MARKER

~~len(markers.sub_markers) instead of len(markers)~~ ... My bad, iter gives us all conditions and operators but len is just the sub-markers all good 👍 - but because of 1. there might not be sub-markers if there is a single user defined marker

@usc-m , I could help and add that while you work on other comments if you like?

we could also track something like max( len(sub_marker) for sub_marker in markers), i.e. the maximum number of conditions and operators used in a single user defined marker, to get a glimpse of how complex the queries are

Yeah, I suggested that too, also the branching factor (maximum number of children under any marker)

Oh yeah, that's also a nice idea -- will definitely tell us if people miss begin able to re-use a marker definition

ka-bu · 2021-11-03T15:15:37Z

rasa/core/evaluation/marker_base.py

@@ -608,6 +609,9 @@ def evaluate_trackers(
            if tracker:
                tracker_result = self.evaluate_events(tracker.events)
                processed_trackers[tracker.sender_id] = tracker_result
+
+        processed_trackers_count = len(processed_trackers)


do we really want the number of processed trackers or processed sessions (or both)?

I think in this case processed trackers is actually useful because we get it before, as a command line argument (the users intent) and after (what they actually got). Might help to highlight issues in our models of how tracker stores work. Right now we don't collect session info anywhere so I'm not sure what info we could get from it, but we could also just collect it as well if you think it would be useful

I have no clue how people use sender_ids and trackers usually, so no idea if that is better - but I guess it won't hurt if we don't add this now (and maybe later)

ka-bu · 2021-11-03T16:47:22Z

rasa/cli/evaluate.py

-    telemetry.track_markers_parsed_count(num_markers)
+    max_depth = markers.depth() - 1
+    # Find maximum branching of marker
+    branching_factor = max(len(marker) - 1 for marker in markers)


branching_factor = max(len(sub_marker) - 1 for marker in markers.sub_markers for for sub_marker in marker) to exclude the artificial Or marker?

Oh yep, otherwise we can end up with cases where the number of markers (which we already get) is returned twice - good spot

ka-bu

LGTM

aeshky

Looks good. There are still a couple of unresolved typos. Did you miss them? (one spotted by you even 😄)

* Markers telemetry * Everything without tests * Specified events.json * Test added * Changelog entry * Naming fixes * Fix lint, fix CLI bug No way to actually change config path, now fixed with nargs * Add markers parsed telemetry * Document telemetry functions * always add ANY_MARKER; add test * Add complexity telemetry * Skip root marker to avoid double reporting total marker count Co-authored-by: Matthew Summers <m.summers@rasa.com> Co-authored-by: ka-bu <kathrin.bujna@gmail.com>

alwx requested review from usc-m and aeshky November 2, 2021 17:36

alwx requested a review from a team as a code owner November 2, 2021 17:36

usc-m reviewed Nov 2, 2021

View reviewed changes

aeshky suggested changes Nov 2, 2021

View reviewed changes

alwx requested review from usc-m and aeshky November 3, 2021 09:26

aeshky reviewed Nov 3, 2021

View reviewed changes

alwx and others added 9 commits November 3, 2021 14:23

Markers telemetry

2d32b83

Everything without tests

56fb74f

Specified events.json

2d8b79e

Test added

4f7a6ff

Changelog entry

15c1c6d

Naming fixes

fd1d7e0

Fix lint, fix CLI bug

c95dcde

No way to actually change config path, now fixed with nargs

Add markers parsed telemetry

7359438

Document telemetry functions

1b4535a

usc-m force-pushed the markers-telemetry branch from 02c60c0 to 1b4535a Compare November 3, 2021 14:45

usc-m reviewed Nov 3, 2021

View reviewed changes

ka-bu reviewed Nov 3, 2021

View reviewed changes

always add ANY_MARKER; add test

015cf4d

Add complexity telemetry

bef34e8

usc-m requested review from ka-bu and aeshky November 3, 2021 16:37

ka-bu reviewed Nov 3, 2021

View reviewed changes

Skip root marker to avoid double reporting total marker count

58174e6

ka-bu approved these changes Nov 3, 2021

View reviewed changes

usc-m enabled auto-merge (squash) November 3, 2021 16:53

aeshky approved these changes Nov 3, 2021

View reviewed changes

usc-m and others added 8 commits November 3, 2021 17:11

Unfix erroneous CLI fix, fix typos

4c98d1a

Fix event schema test

1df9776

Fix mock count in telemetry test

059be98

trigger test re-run

c617ebd

Make sure seed is converted to bool correctly

b25b422

Fix telemetry test

979ecd1

Merge branch 'main' into markers-telemetry

442f2bf

Merge branch 'main' into markers-telemetry

c030770

usc-m merged commit 8e922dd into main Nov 5, 2021

usc-m deleted the markers-telemetry branch November 5, 2021 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Success markers telemetry #10065

Success markers telemetry #10065

alwx commented Nov 2, 2021

alwx commented Nov 2, 2021

usc-m left a comment

usc-m Nov 2, 2021

aeshky left a comment

aeshky Nov 2, 2021

alwx Nov 3, 2021

aeshky Nov 2, 2021

alwx Nov 3, 2021

aeshky commented Nov 2, 2021

alwx commented Nov 3, 2021 •

edited

Loading

aeshky commented Nov 3, 2021

usc-m commented Nov 3, 2021

aeshky commented Nov 3, 2021

aeshky Nov 3, 2021

usc-m Nov 3, 2021

aeshky Nov 3, 2021

aeshky Nov 3, 2021 •

edited

Loading

usc-m commented Nov 3, 2021

usc-m Nov 3, 2021

aeshky Nov 3, 2021

ka-bu Nov 3, 2021 •

edited

Loading

ka-bu Nov 3, 2021

ka-bu Nov 3, 2021

usc-m Nov 3, 2021

ka-bu Nov 3, 2021

ka-bu Nov 3, 2021

usc-m Nov 3, 2021

ka-bu Nov 3, 2021

ka-bu Nov 3, 2021

usc-m Nov 3, 2021

ka-bu left a comment

aeshky left a comment

	"description": "Triggered when marker stats has been computed.",
	"description": "Triggered when marker stats have been computed.",

Success markers telemetry #10065

Success markers telemetry #10065

Conversation

alwx commented Nov 2, 2021

alwx commented Nov 2, 2021

usc-m left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aeshky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aeshky commented Nov 2, 2021

alwx commented Nov 3, 2021 • edited Loading

aeshky commented Nov 3, 2021

usc-m commented Nov 3, 2021

aeshky commented Nov 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aeshky Nov 3, 2021 • edited Loading

Choose a reason for hiding this comment

usc-m commented Nov 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ka-bu Nov 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ka-bu left a comment

Choose a reason for hiding this comment

aeshky left a comment

Choose a reason for hiding this comment

alwx commented Nov 3, 2021 •

edited

Loading

aeshky Nov 3, 2021 •

edited

Loading

ka-bu Nov 3, 2021 •

edited

Loading