Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical queries return different data #140

Closed
arosenbe opened this issue Jul 10, 2017 · 7 comments
Closed

Identical queries return different data #140

arosenbe opened this issue Jul 10, 2017 · 7 comments
Labels

Comments

@arosenbe
Copy link

Hi there and thanks for the awesome package.

I've been using pytrends to download time series from Google Trends for each state/metro-area in the US. I noticed that the results of some of my queries were changing considerably across successive runs. The data on the Google Trends website doesn't exhibit this sort of behavior. I'm wondering if this is the result of a bug or just user error (in which case I'd be grateful for some advice).

Here's a (usually) reproducible example

import pytrends
import time
import numpy 

google_username = ********
google_password = ********

def get_df(google_username, google_password):
    pytrend = TrendReq(google_username, google_password)
    pytrend.build_payload(kw_list=['bagel'], 
                          geo = 'US', 
                          timeframe = '2004-01-01 2010-01-01')
    df = pytrend.interest_over_time()
    return df

df1 = get_df(google_username, google_password)
time.sleep(600)
df2 = get_df(google_username, google_password)

print df1.equals(df2) # False
print numpy.corrcoef(df1['bagel'], df2['bagel']) # Not all 1, can be quite low

I've run this a few times, so I don't think it's related to Google changing over to a new random sample. My understanding is that Google only makes this change once per day (line 197 here). However anecdotally, I seem to experience the largest discrepancies in results on less-searched terms and smaller geographic areas: exactly where I would expect sampling error/random noise to wreak the most havoc.

Let me know if I can provide more information, and thanks in advance!


P.S. I don't think I can provide an actually reproducible example because the results seem to be stochastic, and there's some positive probability that two samples from the same (discrete) distribution will yield the same results.

@dreyco676
Copy link
Contributor

Hmm I wonder if they are using a cookie to keep serving back the same results. I'm going to try accessing it from two different computers/google accounts to see if the results differ.

@arosenbe
Copy link
Author

Hey @dreyco676, any results from the test above?

@dreyco676
Copy link
Contributor

So I'm seeing the same thing. I don't think its anything I can control for as its on Google's end. If you figure out a way to ensure it let me know.

@arosenbe
Copy link
Author

Thanks for the confirmation! I was worried that there wasn't going to be an easy fix on your end.

My understanding is thus that the pytrends payload doesn't contain the same data as what's on the Google Trends site at runtime. If this is the case, do you have a sense of what data the payload does contain (e.g., old samples of Google Trends data or random noise)?

@LRonHubs
Copy link

Huh? So there's a chance this library just returns random noise?

@dreyco676
Copy link
Contributor

@LRonHubs it seems like the endpoint I'm hitting might not be 100% consistent for low volume searches it seems like it is more of a rounding up or down rather than them intentionally putting noise on the data. I don't have any contact with the Google Trends team to know how or why it does this.

@susierao
Copy link

@dreyco676 Thank you for mentioning the low volume searches. Do you know how to enable the functionality "Include low search volume regions" as shown in the screenshot? Obviously, as the previous posts have mentioned, pytrends tends to return different sets of cities in various runs. Thks a lot.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants