diff --git a/README.md b/README.md index 98a9497..97936aa 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Linux and macOS: ```bash git clone https://github.com/bisguzar/twitter-scraper.git cd twitter-scraper -sudo python3 setup.py install +sudo python3 setup.py install ``` Also, you can install with PyPI. @@ -37,23 +37,51 @@ pip3 install twitter_scraper Just import **twitter_scraper** and call functions! -### → function **get_tweets(query: str [, pages: int])** -> dictionary -You can get tweets of profile or parse tweets from hashtag, **get_tweets** takes username or hashtag on first parameter as string and how much pages you want to scan on second parameter as integer. +### → function **get_tweets(query: str, search: str [, pages: int])** -> dictionary +You can get tweets of profile or parse tweets from hashtag, **get_tweets** takes username or hashtag on first parameter as string and how many pages you want to scan on second parameter as integer. -#### Keep in mind: -* First parameter need to start with #, number sign, if you want to get tweets from hashtag. -* **pages** parameter is optional. +*get_tweets* function now supporting 'search' paramter for new search functionality. + +To enable backwards compatibility with existing twitter_scraper API users, `query` can be directly addressed by using `query=` or by providing a positional string. You can get tweets of a given twitter user or parse tweets from a provided hashtag. + +Example: ```python -Python 3.7.3 (default, Mar 26 2019, 21:43:19) +Python 3.7.3 (default, Mar 26 2019, 21:43:19) [GCC 8.2.1 20181127] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from twitter_scraper import get_tweets ->>> +>>> >>> for tweet in get_tweets('twitter', pages=1): ... print(tweet['text']) -... -spooky vibe check +... +Which will function identically to: +>>> from twitter_scraper import get_tweets +>>> +>>> for tweet in get_tweets(query='twitter', pages=1): +... print(tweet['text']) +... +… +``` + +If `search` is specified, **get_tweets** will yield a dictionary for each tweet which contains the given term. The term can be any string, supporting search keywords of twitter. + + +#### Keep in mind: +* You must specify either `query`, or `search`. If you supply one string, `query` will be used by default. +* You can not use more than one string, and you cannot specify more than one of the two search arguments (`query`,`search`) +* **pages** parameter is optional, default is 25. + +```python +Python 3.7.3 (default, Mar 26 2019, 21:43:19) +[GCC 8.2.1 20181127] on linux +Type "help", "copyright", "credits" or "license" for more information. +>>> from twitter_scraper import get_tweets +>>> +>>> for tweet in get_tweets(search='to:bugraisguzar', pages=1): +... print(tweet['text']) +... +pic.twitter.com/h24Q6kWyX8 … ``` @@ -78,7 +106,7 @@ It returns a dictionary for each tweet. Keys of the dictionary; You can get the Trends of your area simply by calling `get_trends()`. It will return a list of strings. ```python -Python 3.7.3 (default, Mar 26 2019, 21:43:19) +Python 3.7.3 (default, Mar 26 2019, 21:43:19) [GCC 8.2.1 20181127] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from twitter_scraper import get_trends @@ -91,7 +119,7 @@ You can get personal information of a profile, like birthday and biography if ex ```python -Python 3.7.3 (default, Mar 26 2019, 21:43:19) +Python 3.7.3 (default, Mar 26 2019, 21:43:19) [GCC 8.2.1 20181127] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from twitter_scraper import Profile @@ -109,7 +137,7 @@ Type "help", "copyright", "credits" or "license" for more information. **to_dict** is a method of *Profile* class. Returns profile datas as Python dictionary. ```python -Python 3.7.3 (default, Mar 26 2019, 21:43:19) +Python 3.7.3 (default, Mar 26 2019, 21:43:19) [GCC 8.2.1 20181127] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from twitter_scraper import Profile @@ -118,8 +146,6 @@ Type "help", "copyright", "credits" or "license" for more information. {'name': 'Buğra İşgüzar', 'username': 'bugraisguzar', 'birthday': None, 'biography': 'geliştirici@peptr', 'website': 'bisguzar.com', 'profile_photo': 'https://pbs.twimg.com/profile_images/1199305322474745861/nByxOcDZ_400x400.jpg', 'banner_photo': 'https://pbs.twimg.com/profile_banners/1019138658/1555346657/1500x500', 'likes_count': 2512, 'tweets_count': 756, 'followers_count': 483, 'following_count': 255, 'is_verified': False, 'is_private': False, user_id: "1019138658"} ``` - - ## Contributing to twitter-scraper To contribute to twitter-scraper, follow these steps: @@ -139,6 +165,7 @@ Thanks to the following people who have contributed to this project: * @bisguzar (maintainer) * @lionking6792 * @ozanbayram +* @sean-bailey * @xeliot diff --git a/test.py b/test.py index d5d55d4..23c7336 100644 --- a/test.py +++ b/test.py @@ -40,6 +40,17 @@ def test_languages(self): self.assertIsInstance(tweets[0]["replies"], int) self.assertGreaterEqual(tweets[1]["retweets"], 0) +class TestSearch(unittest.TestCase): + def search_25pages(self): + tweets = list(get_tweets(search="hello, world!", pages=2)) + self.assertGreater(len(tweets), 1) + def search_user(self): + user = "gvanrossum" + tweets = list(get_tweets(user, pages=2)) + self.assertGreater(len(tweets), 1) + + + class TestTrends(unittest.TestCase): def test_returned(self): diff --git a/twitter_scraper/modules/tweets.py b/twitter_scraper/modules/tweets.py index 24f5075..34fd274 100644 --- a/twitter_scraper/modules/tweets.py +++ b/twitter_scraper/modules/tweets.py @@ -6,13 +6,22 @@ session = HTMLSession() -def get_tweets(query, pages=25): +def get_tweets(query=None, search=None, pages=25): """Gets tweets for a given user, via the Twitter frontend API.""" + if not query and not search: + raise RuntimeError("Please specify a 'query' or a 'search' to check the tweets on.") + elif query and search: + raise RuntimeError("Please specify only one of either a 'search' or 'query'.") + after_part = ( f"include_available_features=1&include_entities=1&include_new_items_bar=true" ) - if query.startswith("#"): + if not query: # if query not exists, it's a search method + search_term=quote(search) + url = f"https://twitter.com/i/search/timeline?f=tweets&vertical=default&q={search_term}&src=tyah&reset_error_state=false&" + + elif query.startswith("#"): query = quote(query) url = f"https://twitter.com/i/search/timeline?f=tweets&vertical=default&q={query}&src=tyah&reset_error_state=false&" else: @@ -59,13 +68,9 @@ def gen_tweets(pages): tweet_id = tweet.attrs["data-item-id"] - tweet_url = profile.attrs["data-permalink-path"] - username = profile.attrs["data-screen-name"] - user_id = profile.attrs["data-user-id"] - is_pinned = bool(tweet.find("div.pinned")) time = datetime.fromtimestamp(