Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between pageviews on GA and the returned resultsof GetReportData #27

Open
ghost opened this issue May 31, 2016 · 4 comments
Open

Comments

@ghost
Copy link

ghost commented May 31, 2016

Hi,

I followed the instructions for linking the core API, and I seem to get results, but when I run:

query.list <- Init(start.date = "2016-05-01",
                   end.date = "2016-05-03",
                   dimensions = "ga:date,ga:pagePath",
                   metrics = "ga:pageviews",
                   max.results = 10000,
                   sort = "-ga:date",
                   table.id = "ga:xxxxx")
ga.query <- QueryBuilder(query.list)
mydata <- GetReportData(ga.query, token, split_daywise = T)

I get a mismatch between what I see on Google Analytics and the result from GetReportData. Any pointers?

Thanks and best wishing, I love your work!

Dan

@BobbyBarbeau
Copy link

Apologies if the following is too basic, but a few things off the bat that could result in discrepancies:

  • Does the web interface indicate any sampling going on? For a three day span with so few dimensions and metrics, I don't imagine GA would be sampling. But just wanted to rule that out as a possible cause.
  • Have you confirmed that the table.id you've queried is the same for the View you're looking at in the web interface? If the web view is pulling from one View (e.g. one that excludes a range of internal IPs) and the R query is pulling from a different view (e.g. a raw data View with no filters applied), that could explain some discrepancy. Likewise, if you're looking at a segment in the web view and haven't applied that segment to the API query, that could cause mismatches.
  • Is the date dimension causing the discrepancy? If you're just looking at the Behavior > Site Content > All Pages report in the GA interface, you can't pull date in as a dimension, so you'd need to perform some additional calculations on the data returned through the API to match that report. (e.g. using dplyr, mydata %>% group_by(pagePath) %>% summarise(pageviews = sum(pageviews)))

When I've experienced data mismatches, I've often found the source was related to sampling, table.id mismatch, filters/segments, or the like.

@ghost
Copy link
Author

ghost commented May 31, 2016

Hi Bobby,
I imagine it has something to do with sampling, but I didn't manage to figure out how to turn this on/off. The image you attached is, unfornunately, not too instructive for me.
I did check the table.id, and I made sure I was looking at the correct view in GA. In fact, I only have 1 view and it's not segmented, so it shouldn't be a problem.
I'm not sure about your final possibility. Perhaps a pointer toward the sampling issue, and if that doesn't work, I will get more into the dimension thing.
Best Wishes,
Dan

@BobbyBarbeau
Copy link

BobbyBarbeau commented May 31, 2016

Dan,

For more info on sampling, see https://support.google.com/analytics/answer/2637192?hl=en

Another way to see if sampling is an issue is to simply rerun your query, but drop the split_daywise argument. (split_daywise eliminates sampling for the most part by breaking a large data set into smaller daily data sets.)

Try rerunning your query like this:

mydata <- GetReportData(ga.query, token)

Are there any messages in R about the query being sampled? If there are, then the mismatch is caused by sampling.

If not, then neither GA interface nor the API should be using sampled data.

In terms of my last point about dimensions, rather than wrangling with the data via dplyr, it would be much easier simply to run the query without the date dimension:

query.list <- Init(start.date = "2016-05-01",
                   end.date = "2016-05-03",
                   dimensions = "ga:pagePath",
                   metrics = "ga:pageviews",
                   max.results = 10000,
                   #sort = "-ga:date",
                   table.id = "ga:xxxxx")
ga.query <- QueryBuilder(query.list)
mydata <- GetReportData(ga.query, token)

Again, if you see messages in R about sampling, then you'd need to rerun with split_daywise included.

But the above should match the pageview data reported in Behavior > Site Content > All Pages report in the GA interface if the data isn't being sampled.

HTH

EDIT: commented out the sort as that would throw an error since there would be no date dimension to sort on. You could just delete it, but I'm commenting it out just to explain my edit.

@JerryWho
Copy link
Contributor

JerryWho commented Jun 2, 2016

When I get strange results using the API I often use the Query Explorer (https://ga-dev-tools.appspot.com/query-explorer/) to double-check the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants