New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a way to sample requests that are sent out from amp-analytics. #2368
Conversation
Can you add a documentation label if one isn't there?
|
console./*OK*/error(this.getName_(), 'Invalid sampling spec.'); | ||
return resolve; | ||
} | ||
const key = this.expandTemplate_(spec['sampleOn'], trigger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the sampling algorithm a bit?
Could we instead just once create a random number and then always use that same number to determine whether you are in the sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: with crypto.getRandomValues
being in all supported browsers, is that a better way to determine the sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: I had called the 'sampleOn' attribute the trait in my initial design.
GA wants to consistently sample hits based on clientIds[1]. Others might want to sample based on random numbers, page path or something else. That is the reason I added the attribute 'sampleOn'.
If we were to use getRandomValues, we lose the ability to specify what to sample based on. You suggested that we always use the same number to sample. Any ideas on how that would work? Will the random number be a trigger level thing? or a tag level thing or something else?
[1] Sampling based on clientIds results in user sessions to be consistently in the reports or out of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we should definitely not hash anything on each actual event. That is what I meant. The user's sample state should only be calculated once. If this was e.g. used on an interval or active-use trigger it would get really expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of return value would this.expandTemplate_(spec['sampleOn'], trigger)
typically give. You are using requestCount
in the test. Is that a typical usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Mon, Apr 11, 2016 at 1:51 PM, Malte Ubl notifications@github.com wrote:
In extensions/amp-analytics/0.1/amp-analytics.js
#2368 (comment):
- * not based on sampleSpec.
- * @Private
- */
- isSampledIn_(trigger) {
- const spec = trigger['sampleSpec'];
- const resolve = Promise.resolve(true);
- if (!spec) {
return resolve;
- }
- const threshold = spec['threshold'];
- if (!spec['sampleOn'] ||
Number.isNaN(parseFloat(threshold)) || !Number.isFinite(threshold)) {
console./_OK_/error(this.getName_(), 'Invalid sampling spec.');
return resolve;
- }
- const key = this.expandTemplate_(spec['sampleOn'], trigger);
What type of return value would this.expandTemplate_(spec['sampleOn'],
trigger) typically give. You are using requestCount in the test. Is that
a typical usage?
GA uses client id to sample hits. Some people might want random too.
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/ampproject/amphtml/pull/2368/files/ddb55ee8e687cbb11c8fe886147cf218cd3d4e5e#r59278956
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite follow: Would it return CLIENT_ID(…)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec['sampleOn']
can be any string with one or more macros. I used requestCount
in test but it could be RANDOM
or CLIENT_ID(...)
in real world.
this.expandTemplate_(spec['sampleOn'], trigger)
uses var definitions from vendors.js
and trigger configs and converts them into something that can be used by url-replacements.js
class. The call to urlReplacementsFor(this.getWin()).expand()
below will then convert that template string into another string with macros substituted for real values.
In the tests, the value of requestCount will always be 0 since no requests is generated at that point. But that actually doesn't matter because I stub the values returned by sha384. I can change the tests to use RANDOM
instead if that makes it less confusing.
92189b3
to
f537420
Compare
@ampsauce retry |
ptal. Now we do sampling calculations once per trigger. |
@avimehta , sorry just noticed this. Is this ready to be reviewed? Could you please rebase to master? There are some conflicts... |
@dvoytenko this is ready for review. Forgot to update the PR. |
/to @cramforce to follow up on the initial review. |
@avimehta Sorry, this got neglected. I think the PR is ready from my POV. Could you rebase? Thanks. |
rebased. Marging soon. |
Still pending in this PR: Documentation for the new feature. Rest of it is ready for review.
Fixes #1550