Skip to content

Commit

Permalink
Support passing parameters to rquests
Browse files Browse the repository at this point in the history
Raise excpetion if there are no more proxies left
Updated and fixed the readme
  • Loading branch information
Doron.Grinzaig authored and Doron.Grinzaig committed May 6, 2020
1 parent 16492e1 commit fde7be5
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 15 deletions.
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

Unofficial API for Google Trends

Allows simple interface for automating downloading of reports from Google Trends. Main feature is to allow the script to login to Google on your behalf to enable a higher rate limit. Only good until Google changes their backend again :-P. When that happens feel free to contribute!
Allows simple interface for automating downloading of reports from Google Trends.
Only good until Google changes their backend again :-P. When that happens feel free to contribute!

**Looking for maintainers!**

Expand Down Expand Up @@ -56,12 +57,13 @@ or if you want to use proxies as you are blocked due to Google rate limit:

from pytrends.request import TrendReq

pytrends = TrendReq(hl='en-US', tz=360, timeout=(10,25), proxies=['https://34.203.233.13:80',], retries=2, backoff_factor=0.1)
pytrends = TrendReq(hl='en-US', tz=360, timeout=(10,25), proxies=['https://34.203.233.13:80',], retries=2, backoff_factor=0.1, requests_args={'verify':False})

* `timeout(connect, read)`

- See explantation on this on [requests docs](https://requests.readthedocs.io/en/master/user/advanced/#timeouts)
* tz
- Timezone Offset
- For example US CST is ```'360'```
- For example US CST is ```'360'``` (note **NOT** -360, Google uses timezone this way...)

* `proxies`

Expand All @@ -76,6 +78,9 @@ or if you want to use proxies as you are blocked due to Google rate limit:

- A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). urllib3 will sleep for: ```{backoff factor} * (2 ^ ({number of total retries} - 1))``` seconds. If the backoff_factor is 0.1, then sleep() will sleep for [0.0s, 0.2s, 0.4s, …] between retries. It will never be longer than Retry.BACKOFF_MAX. By default, backoff is disabled (set to 0).

* `requests_args`
- A dict with additional parameters to pass along to the underlying requests library, for example verify=False to ignore SSL errors

Note: the parameter `hl` specifies host language for accessing Google Trends.
Note: only https proxies will work, and you need to add the port number after the proxy ip address

Expand Down Expand Up @@ -316,7 +321,6 @@ Returns dictionary

* This is not an official or supported API
* Google may change aggregation level for items with very large or very small search volume
* Google will send you an email saying that you had a new login after running this.
* Rate Limit is not publicly known, let me know if you have a consistent estimate
* One user reports that 1,400 sequential requests of a 4 hours timeframe got them to the limit. (Replicated on 2 networks)
* It has been tested, and 60 seconds of sleep between requests (successful or not) is the correct amount once you reach the limit.
Expand Down
2 changes: 1 addition & 1 deletion examples/example.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from pytrends.request import TrendReq

# Login to Google. Only need to run this once, the rest of requests will use the same session.
# Only need to run this once, the rest of requests will use the same session.
pytrend = TrendReq()

# Create payload and capture API tokens. Only needed for interest_over_time(), interest_by_region() & related_queries()
Expand Down
25 changes: 16 additions & 9 deletions pytrends/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
import requests



from pandas.io.json._normalize import nested_to_record
from requests.packages.urllib3.util.retry import Retry

Expand Down Expand Up @@ -38,7 +37,7 @@ class TrendReq(object):
TODAY_SEARCHES_URL = 'https://trends.google.com/trends/api/dailytrends'

def __init__(self, hl='en-US', tz=360, geo='', timeout=(2, 5), proxies='',
retries=0, backoff_factor=0):
retries=0, backoff_factor=0, requests_args=None):
"""
Initialize default values for params
"""
Expand All @@ -55,6 +54,7 @@ def __init__(self, hl='en-US', tz=360, geo='', timeout=(2, 5), proxies='',
self.retries = retries
self.backoff_factor = backoff_factor
self.proxy_index = 0
self.requests_args = requests_args or {}
self.cookies = self.GetGoogleCookie()
# intialize widget payloads
self.token_payload = dict()
Expand All @@ -78,14 +78,16 @@ def GetGoogleCookie(self):
'https://trends.google.com/?geo={geo}'.format(
geo=self.hl[-2:]),
timeout=self.timeout,
proxies=proxy
proxies=proxy,
**self.requests_args
).cookies.items()))
except requests.exceptions.ProxyError:
print('Proxy error. Changing IP')
if len(self.proxies) > 0:
if len(self.proxies) > 1:
self.proxies.remove(self.proxies[self.proxy_index])
else:
print('Proxy list is empty. Bye!')
print('No more proxies available. Bye!')
raise
continue

def GetNewProxy(self):
Expand Down Expand Up @@ -119,10 +121,10 @@ def _get_data(self, url, method=GET_METHOD, trim_chars=0, **kwargs):
s.proxies.update({'https': self.proxies[self.proxy_index]})
if method == TrendReq.POST_METHOD:
response = s.post(url, timeout=self.timeout,
cookies=self.cookies, **kwargs) # DO NOT USE retries or backoff_factor here
cookies=self.cookies, **kwargs, **self.requests_args) # DO NOT USE retries or backoff_factor here
else:
response = s.get(url, timeout=self.timeout, cookies=self.cookies,
**kwargs) # DO NOT USE retries or backoff_factor here
**kwargs, **self.requests_args) # DO NOT USE retries or backoff_factor here
# check if the response contains json and throw an exception otherwise
# Google mostly sends 'application/json' in the Content-Type header,
# but occasionally it sends 'application/javascript
Expand Down Expand Up @@ -400,7 +402,8 @@ def trending_searches(self, pn='united_states'):
# forms = {'ajax': 1, 'pn': pn, 'htd': '', 'htv': 'l'}
req_json = self._get_data(
url=TrendReq.TRENDING_SEARCHES_URL,
method=TrendReq.GET_METHOD
method=TrendReq.GET_METHOD,
**self.requests_args
)[pn]
result_df = pd.DataFrame(req_json)
return result_df
Expand All @@ -412,7 +415,8 @@ def today_searches(self, pn='US'):
url=TrendReq.TODAY_SEARCHES_URL,
method=TrendReq.GET_METHOD,
trim_chars=5,
params=forms
params=forms,
**self.requests_args
)['default']['trendingSearchesDays'][0]['trendingSearches']
result_df = pd.DataFrame()
# parse the returned json
Expand All @@ -434,6 +438,7 @@ def top_charts(self, date, hl='en-US', tz=300, geo='GLOBAL'):
method=TrendReq.GET_METHOD,
trim_chars=5,
params=chart_payload,
**self.requests_args
)['topCharts'][0]['listItems']
df = pd.DataFrame(req_json)
return df
Expand All @@ -450,6 +455,7 @@ def suggestions(self, keyword):
params=parameters,
method=TrendReq.GET_METHOD,
trim_chars=5,
**self.requests_args
)['default']['topics']
return req_json

Expand All @@ -463,6 +469,7 @@ def categories(self):
params=params,
method=TrendReq.GET_METHOD,
trim_chars=5,
**self.requests_args
)
return req_json

Expand Down

0 comments on commit fde7be5

Please sign in to comment.