-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate data #19
Comments
Thanks for this - this must be connected with the v4 batching. It works out how many fetches from the max parameter, perhaps if the max is set to 27000 it will work? Of course you can't always know this in advance, so I will look at the logic on working out the batches to see whats happening. |
Hi Mark, As you suggested, it does the right thing if I set the maximum to 27k. It also does if I set it to 30k or 50k. When I set it to 100k, the query returns 36,399 rows. Here's the logic behind what I'm doing: I want to retrieve monthly reporting data from a large site that's been using GA since mid-2006. I find that, if I run a single large query, there's very likely to be a server hiccup or some other issue that causes the entire pull to get botched. (It's not unusual to get a 500 error once every 100,000 rows or so.) So I query one month at a time. The problem is that I don't know how many rows each query will return. Probably fewer than 100k, but I don't want to discover--or worse, fail to discover--that I'm missing some data. So I just set the maximum really high, to effectively turn it off. Best, |
Hi David, Have you tried the same query using the v3 I think that has more robust batching at the moment(?), and should deal with the occasional 500 errors by backing off and trying again. With that you can set it to a high max value, it will ignore it anyway if over 10000 |
Dear @biologicaldynamics , thanks for this report. It turned out to be something a bit ridiculous - R turned 10000 into "1e5" which in turn the API thought was 1. For every batch over 10,000 it added rows from 1-10000 again(!). Ouch. But now fixed I hope, thanks to you 👍 |
This isn't fixed... |
Ok, think I got it now. |
Hi Mark,
Another issue using the v4 API. The following query returns 126,399 rows. It is expected to return 26,399 rows. As far as I can tell, it's just repeating some of the result rows. The rows that I checked looked to have the correct data.
When I ran the equivalent queries on RGA and RGoogleAnalytics, I got the expected behavior. These are, I believe, using the v3 API.
Thanks and let me know if you need any more sleuthing. Happy to help--it's the least I can do.
Best,
David
The text was updated successfully, but these errors were encountered: