Python script that uses Tweepy and Twitter API to collect tweets.
This project was done as a part of the CSE 535 course on Information Retrieval at SUNY Buffalo.
Current configuration of the script is as follows:
1. In one go, the Twitter API returns at the most 100 tweets. Currently this script is configured to query the API 10 times, thus returning around 1000 tweets. By changing the 'max_iter' and 'count' parameters in the script, it can be configured to gather as many tweets as required.
2. It collects tweets related to the following topics: USA, NY, Buffalo, and Niagara. By changing these query terms, it can be used to query for any required topic.
3. It collects tweets in English. By changing the 'lang' parameter, it can be used to collect tweets in any language.
4. The output generated by this script is in json format, and is stored in a .txt file. For any particular tweet, the following fields are extracted:
i. Language of the tweet ii. Actual text content of the tweet iii. Date and time of the tweet iv. URL of the tweet v. ID of the tweet vi. Hashtags in the tweet (if any)After making any desired modifications, the above script can simply be executed from the command line as shown below.
python tweet_collector.py