Skip to content

Commit

Permalink
merge devel into master
Browse files Browse the repository at this point in the history
  • Loading branch information
ckoepp committed Jun 5, 2015
2 parents 9c307a8 + fbb0997 commit 1c6158f
Show file tree
Hide file tree
Showing 13 changed files with 448 additions and 36 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.rst
@@ -1,6 +1,14 @@
Change history
**************

1.0.1
#####

* added support for user-defined callback-method while performing API queries (issue #25)
* added support for advanced query operators (issue #24)
* adjusted search term parsing in method TwitterSearchOrder.set_search_url()
* auto-handling of keywords with spaces

1.0.0
#####

Expand Down
6 changes: 5 additions & 1 deletion README.rst
Expand Up @@ -22,7 +22,11 @@ TwitterSearch
:target: https://raw.githubusercontent.com/ckoepp/TwitterSearch/master/LICENSE
:alt: MIT License

This library allows you easily create a search through the Twitter API without having to know too much about the API details. Based on such a search you can even iterate throughout all tweets reachable via the Twitter Search API. There is an automatic reload of the next pages while using the iteration.
.. image:: https://readthedocs.org/projects/twittersearch/badge/?version=latest
:target: https://twittersearch.readthedocs.org/en/latest/
:alt: Documentation

This library allows you easily create a search through the Twitter API without having to know too much about the API details. Based on such a search you can even iterate throughout all tweets reachable via the Twitter Search API. There is an automatic reload of the next pages while using the iteration. TwitterSearch was developed as part of an interdisciplinary project at the `Technische Universität München <http://www.tum.de/en/>`_.

Reasons to use TwitterSearch
############################
Expand Down
16 changes: 15 additions & 1 deletion TwitterSearch/TwitterSearch.py
Expand Up @@ -108,6 +108,9 @@ def __init__(self, consumer_key, consumer_secret,
# statistics
self.__statistics = [0,0]

# callback
self.__callback = None

# verify
if "verify" in attr:
self.authenticate(attr["verify"])
Expand Down Expand Up @@ -181,16 +184,23 @@ def check_http_status(self, http_status):
raise TwitterSearchException(http_status,
self.exceptions[http_status])

def search_tweets_iterable(self, order):
def search_tweets_iterable(self, order, callback=None):
""" Returns itself and queries the Twitter API. Is called when using \
an instance of this class as iterable. \
See `Basic usage <basic_usage.html>`_ for examples
:param order: An instance of TwitterOrder class \
(e.g. TwitterSearchOrder or TwitterUserOrder)
:param callback: Function to be called after a new page \
is queried from the Twitter API
:returns: Itself using ``self`` keyword
"""

if callback:
if not callable(callback):
raise TwitterSearchException(1018)
self.__callback = callback

self.search_tweets(order)
return self

Expand Down Expand Up @@ -242,6 +252,10 @@ def send_search(self, url):
self.__statistics[0] += 1
self.__statistics[1] += seen_tweets

# call callback if available
if self.__callback:
self.__callback(self)

# if we've seen the correct amount of tweets there may be some more
# using IDs to request more results
# (former versions used page parameter)
Expand Down
1 change: 1 addition & 0 deletions TwitterSearch/TwitterSearchException.py
Expand Up @@ -30,6 +30,7 @@ class TwitterSearchException(Exception):
1015: 'No keywords given',
1016: 'Invalid dict',
1017: 'Invalid argument: need either a user ID or a screen-name',
1018: 'Not a callable function',
}

def __init__(self, code, msg=None):
Expand Down
159 changes: 145 additions & 14 deletions TwitterSearch/TwitterSearchOrder.py
Expand Up @@ -20,12 +20,25 @@ class TwitterSearchOrder(TwitterOrder):
syntax of the Twitter Search API.
"""

# Attitude filter search strings (<negative>,<positive>) as taken from
# https://dev.twitter.com/rest/public/search
_attitudes = (":)", ":(")

# Question filter search string
_question = "?"

# Link filter search string
_link = "filter:links"

# Source filter prefix string
_source = "source:"

# default value for count should be the maximum value to minimize traffic
# see https://dev.twitter.com/docs/api/1.1/get/search/tweets
_max_count = 100

# taken from http://www.loc.gov/standards/iso639-2/php/English_list.php
iso_6391 = ['aa', 'ab', 'ae', 'af', 'ak', 'am', 'an', 'ar', 'as',
iso_6391 = ('aa', 'ab', 'ae', 'af', 'ak', 'am', 'an', 'ar', 'as',
'av', 'ay', 'az', 'ba', 'be', 'bg', 'bh', 'bi', 'bm',
'bn', 'bo', 'br', 'bs', 'ca', 'ce', 'ch', 'co', 'cr',
'cs', 'cu', 'cv', 'cy', 'da', 'de', 'dv', 'dz', 'ee',
Expand All @@ -45,61 +58,167 @@ class TwitterSearchOrder(TwitterOrder):
'sw', 'ta', 'te', 'tg', 'th', 'ti', 'tk', 'tl', 'tn',
'to', 'tr', 'ts', 'tt', 'tw', 'ty', 'ug', 'uk', 'ur',
'uz', 've', 'vi', 'vo', 'wa', 'wo', 'xh', 'yi', 'yo',
'za', 'zh', 'zu']
'za', 'zh', 'zu')

def __init__(self):
""" Constructor """

self.arguments = {'count': '%s' % self._max_count}
self.searchterms = []
self.url = ''
self.remove_all_filters()

def remove_all_filters(self):
""" Removes all filters """

# attitude: None = no attitude, True = positive, False = negative
self.attitude_filter = self.source_filter = None
self.question_filter = self.link_filter = False

def set_source_filter(self, source):
""" Only search for tweets entered via given source
:param source: String. Name of the source to search for. An example \
would be ``source=twitterfeed`` for tweets submitted via TwitterFeed
:raises: TwitterSearchException
"""

if isinstance(source, str if py3k else basestring) and len(source) >= 2:
self.source_filter = source
else:
raise TwitterSearchException(1009)

def remove_source_filter(self):
""" Remove the current source filter """

self.source_filter = None


def set_link_filter(self):
""" Only search for tweets including links """

self.link_filter = True

def remove_link_filter(self):
""" Remove the current link filter """

self.link_filter = False

def add_keyword(self, word):
def set_question_filter(self):
""" Only search for tweets asking a question """

self.question_filter = True

def remove_question_filter(self):
""" Remove the current question filter """

self.question_filter = False

def set_positive_attitude_filter(self):
""" Only search for tweets with positive attitude """

self.attitude_filter = True

def set_negative_attitude_filter(self):
""" Only search for tweets with negative attitude """

self.attitude_filter = False

def remove_attitude_filter(self):
""" Remove attitude filter """

self.attitude_filter = None

def add_keyword(self, word, or_operator=False):
""" Adds a given string or list to the current keyword list
:param word: String or list of at least 2 character long keyword(s)
:param or_operator: Boolean. Concatenates all elements of parameter \
word with ``OR``. Is ignored is word is not a list. Thus it is \
possible to search for ``foo OR bar``. Default value is False \
which corresponds to a search of ``foo AND bar``.
:raises: TwitterSearchException
"""

if isinstance(word, str if py3k else basestring) and len(word) >= 2:
self.searchterms.append(word)
elif isinstance(word, list):
self.searchterms += word
self.searchterms.append(word if " " not in word else '"%s"' % word)
elif isinstance(word, (tuple,list)):
word = [ (i if " " not in i else '"%s"' % i) for i in word ]
self.searchterms += [" OR ".join(word)] if or_operator else word
else:
raise TwitterSearchException(1000)

def set_keywords(self, words):
def set_keywords(self, words, or_operator=False):
""" Sets a given list as the new keyword list
:param words: A list of at least 2 character long new keywords
:param or_operator: Boolean. Concatenates all elements of parameter \
word with ``OR``. Enables searches for ``foo OR bar``. Default value \
is False which corresponds to a search of ``foo AND bar``.
:raises: TwitterSearchException
"""

if not isinstance(words, list):
if not isinstance(words, (tuple,list)):
raise TwitterSearchException(1001)
self.searchterms = words
words = [ (i if " " not in i else '"%s"' % i) for i in words ]
self.searchterms = [" OR ".join(words)] if or_operator else words

def set_search_url(self, url):
""" Reads given query string and stores key-value tuples
:param url: A string containing a valid URL to parse arguments from
"""

self.__init__()

if url[0] == '?':
url = url[1:]

args = parse_qs(url)
self.searchterms = args['q']
del args['q']

# urldecode keywords
for item in self.searchterms:
item = unquote(item)
for arg in args['q']:
self.searchterms += [ unquote(i) for i in arg.split(" ") ]
del args['q']

self.arguments = {}
for key, value in args.items():
self.arguments.update({key: unquote(value[0])})

# look for advanced operators: attitudes
for attitude in self._attitudes:
try:
i = self.searchterms.index(attitude)
del self.searchterms[i]
self.attitude_filter = (i == 1)
except ValueError:
pass

# look for advanced operators: question
try:
del self.searchterms[ self.searchterms.index(self._question) ]
self.question_filter = True
except ValueError:
pass

# look for advanced operators: link-filter
try:
del self.searchterms[ self.searchterms.index(self._link) ]
self.link_filter = True
except ValueError:
pass



# look for advanced operators: source-filter
i = None
for element in self.searchterms:
if element.startswith(self._source):
i = element
break
if i:
del self.searchterms[ self.searchterms.index(i) ]
self.source_filter = i[ len(self._source): ]

def create_search_url(self):
""" Generates (urlencoded) query string from stored key-values tuples
Expand All @@ -112,6 +231,18 @@ def create_search_url(self):
url = '?q='
url += '+'.join([quote_plus(i) for i in self.searchterms])

if self.attitude_filter is not None:
url += '+%s' % quote_plus(self._attitudes[0 if self.attitude_filter else 1])

if self.source_filter:
url += '+%s' % quote_plus(self._source + self.source_filter)

if self.link_filter:
url += '+%s' % quote_plus(self._link)

if self.question_filter:
url += '+%s' % quote_plus(self._question)

for key, value in self.arguments.items():
url += '&%s=%s' % (quote_plus(key), (quote_plus(value)
if key != 'geocode'
Expand Down
2 changes: 1 addition & 1 deletion TwitterSearch/__init__.py
@@ -1,5 +1,5 @@
__title__ = 'TwitterSearch'
__version__ = '1.0.0'
__version__ = '1.0.1'
__author__ = 'Christian Koepp'
__license__ = 'MIT'
__copyright__ = 'Copyright 2013 Christian Koepp'
Expand Down
29 changes: 15 additions & 14 deletions docs/advanced_usage_ts.rst
Expand Up @@ -37,10 +37,10 @@ Proxy usage

To use a HTTPS proxy at initialization of the :class:`TwitterSearch` class, an addition argument named ``proxy='some.proxy:888'`` can be used. Otherwise the authentication will fail if the client has no direct access to the Twitter API.

Delaying requests to avoid rate-limitation
------------------------------------------
Avoid rate-limitation using a callback method
----------------------------------------------

Sometimes there is the need to build in certain delays in order to avoid being `rate-limited <https://dev.twitter.com/rest/public/rate-limiting>`_ by Twitter. This is not exactly build in into the library as it is pretty easy to do it manually with the build-in module ``time`` of Python.
Sometimes there is the need to build in certain delays in order to avoid being `rate-limited <https://dev.twitter.com/rest/public/rate-limiting>`_ by Twitter. One way to add an artificial delay to your queries is to use the build-in module ``time`` of Python in combination with a callback method. The following example demonstrates how to use the ``callback`` argument of the ``TwitterSearch.search_tweets_iterable()`` method properly. In this particular case every 5th call to the Twitter API activates a delay of 60 seconds.

.. code-block:: python
Expand All @@ -58,23 +58,24 @@ Sometimes there is the need to build in certain delays in order to avoid being `
access_token_secret = '333444'
)
def my_callback_closure(current_ts_instance): # accepts ONE argument: an instance of TwitterSearch
queries, tweets_seen = current_ts_instance.get_statistics()
if queries > 0 and (queries % 5) == 0: # trigger delay every 5th query
time.sleep(60) # sleep for 60 seconds
counter = 0 # rate-limit counter
sleep_at = 123 # enforce delay after receiving 123 tweets
sleep_for = 60.5 # sleep for 60.5 seconds (just to show that floats also work here)
for tweet in ts.search_tweets_iterable(tso):
for tweet in ts.search_tweets_iterable(tso, callback=my_callback_closure):
print( '@%s tweeted: %s' % ( tweet['user']['screen_name'], tweet['text'] ) )
counter += 1 # increase counter
if counter >= sleep_at: # it's time to apply the delay
counter = 0
time.sleep(sleep_for) # sleep for n secs
except TwitterSearchException as e:
print(e)
As you might know there is a certain amount of `meta-data <#access-meta-data>`_ available when using *TwitterSearch*. Advanced users might want to rely on the ``get_statistics()`` method of the :class:`TwitterSearch` class directly in order to avoid using an own counter. This function returns a tuple of two integers. The first integer represents the amount of queries sent to Twitter so far, while the second one is an automatically increasing counter of the so far received tweets during those queries. Thus, an example taking those two meta-information into account could look like:
Remember that the callback is called every time a query to the Twitter API is performed. It's in your responsibility to make sure that your code doesn't have any unwanted side-effects or throws unintended exceptions. Also, every closure submitted via the ``callback`` argument is called with a the current instance of :class:`TwitterSearch`. Performing a delay is just one way to use this callback pattern.


Avoid rate-limitation manually
------------------------------

As you might know there is a certain amount of `meta-data <#access-meta-data>`_ available when using *TwitterSearch*. Some users might want to rely only on the ``get_statistics()`` method of the :class:`TwitterSearch` to trigger, for example, an artificial delay. This function returns a tuple of two integers. The first integer represents the amount of queries sent to Twitter so far, while the second one is an automatically increasing counter of the so far received tweets during those queries. Thus, an example taking those two meta-information into account could look like:

.. code-block:: python
Expand Down
4 changes: 4 additions & 0 deletions docs/advanced_usage_tse.rst
Expand Up @@ -55,6 +55,10 @@ All exceptions based on issues within TwitterSearch do have ``TwitterSearchExcep
1015 No keywords given
------ --------------------------------------
1016 Invalid dict
------ --------------------------------------
1017 Invalid user id or screen-name
------ --------------------------------------
1018 Not a callable function
====== ======================================

HTTP based exceptions
Expand Down

0 comments on commit 1c6158f

Please sign in to comment.