-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reintroduce window in days, log warning when sampling occurs #9480
reintroduce window in days, log warning when sampling occurs #9480
Conversation
airbyte-integrations/connectors/source-google-analytics-v4/source_google_analytics_v4/source.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-google-analytics-v4/source_google_analytics_v4/source.py
Outdated
Show resolved
Hide resolved
Please, don't forget to run tests and checks before every comit and especially before pushing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only requested small changes is needed.
…rce_google_analytics_v4/source.py Co-authored-by: Sergei Solonitcyn <11441558+sergei-solonitcyn@users.noreply.github.com>
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
37b1d68
to
635d0fc
Compare
Made some updates, but tests are still broken and there are suspections that the code as well. |
/test connector=connectors/source-google-analytics-v4
|
/test connector=connectors/source-google-analytics-v4
|
/test connector=connectors/source-google-analytics-v4
|
/test connector=connectors/source-google-analytics-v4
|
Codecov Report
@@ Coverage Diff @@
## master #9480 +/- ##
=========================================
Coverage ? 87.63%
=========================================
Files ? 2
Lines ? 275
Branches ? 0
=========================================
Hits ? 241
Misses ? 34
Partials ? 0 Continue to review full report at Codecov.
|
/publish connector=connectors/source-google-analytics-v4
|
Improve sampling awareness and reintroduce parameter for mitigation in source google analytics
Fixes #8570
Reason
Google analytics may return sampled data when there are too many sessions (about 500k) in a given period.
Also, it may return provisional data (isDataGolden=false) meaning that the report can change if requested again in the future.
The Sampling issue can be partially mitigated (but not completely eliminated) by using smaller values for window_in_days parameter.
For those cases when even using window_in_days equal to one is not enough to avoid sampling (when there are more than 500k sessions or other dimension values in a report in a day) the user should be warned in the logs.
Also, when isDataGolden=false the user should be warned in logs
Confirmation
How does the code change in the PR fix the issue?
Recommended reading order