# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

# Fetching tweets

In this notebook, you will learn to
1. [Get tweets with a specific word in it](#tweet)
2. [Increase the number of tweets fetched per request](#increase)
3. [Get tweet text and full tweet text](#full)
4. [Remove retweets](#remove)
5. [Get other information in a tweet such as the retweet count, user screen name and created at](#info)
6. [Keep tweets which are written in English language only](#english)
7. [Remove generic tweets](#generic)

Note: The output for the code will change during runtime.

## Authenticate and create an API object - you already know this!

In [1]:
# Import libraries
import tweepy
import os
import sys
sys.path.append("..")

# Import the get_twitter_tokens from the FMDA_quantra module
# The code of this module can be found in the downloads (last section) of this course
# You need to edit the FMDA_quantra.py file and add your Twitter tokens manually before you continue
from data_modules.FMDA_quantra import get_twitter_tokens

# Method in sentiment_analysis_quantra module to get the dictionary of consumer key and consumer secret
twitter_tokens = get_twitter_tokens()

# Set the consumer key and secret from the twitter_tokens dictionary
consumer_key = twitter_tokens['consumer_key']
consumer_secret = twitter_tokens['consumer_secret']


auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

<a id='tweet'></a> 
## Tweets with specific words in it

The `search` method of the API object can be used to search for the presence of a specific word in the tweet. For example, to search for the word 'AMZN' with a dollar sign in the tweet pass the parameter as shown below to the search method. 

In [2]:
tweets_search_result = api.search(q = '$AMZN')

The output of the `search` method is the `SearchResults` object. It contains the information of the tweets which match the search criteria. By default, a maximum of 15 recent tweets are returned that matches the search criteria. The output is stored in `tweets_search_result`.

In [3]:
type(tweets_search_result)

tweepy.models.SearchResults

In [4]:
tweets_search_result

[Status(_api=<tweepy.api.API object at 0x000001EDFD40BCD0>, _json={'created_at': 'Wed May 26 08:43:16 +0000 2021', 'id': 1397473397471330307, 'id_str': '1397473397471330307', 'text': 'Market Briefing For Wednesday, May 26 $DAL $AMZN $SPY $BTC.X https://t.co/kD2dp0guAj', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [{'text': 'DAL', 'indices': [38, 42]}, {'text': 'AMZN', 'indices': [43, 48]}, {'text': 'SPY', 'indices': [49, 53]}, {'text': 'BTC.X', 'indices': [54, 60]}], 'user_mentions': [], 'urls': [{'url': 'https://t.co/kD2dp0guAj', 'expanded_url': 'https://talkmarkets.com/content/us-markets/market-briefing-for-wednesday-may-26?post=313480', 'display_url': 'talkmarkets.com/content/us-mar…', 'indices': [61, 84]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="https://talkmarkets.com" rel="nofollow">TalkMarketsApp</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id

In [5]:
len(tweets_search_result)

11

<a id='increase'></a> 
## Increase the number of tweets fetched

To increase the number of tweets fetched per request, you can use the `count` parameter of the `api.search` method. But you can get a maximum of 100 tweets per request. For example, in the below code, the count is set to 150 but only 100 tweets are returned.

In [6]:
tweets_search_result = api.search(q = '$AMZN', 
                                  count=150)
len(tweets_search_result)

100

<a id='full'></a> 
## Get the tweet text

Each tweet contains information such as tweet text, user screen name, and created date. To fetch the text of the first tweet in the `tweets_search_result` you can use the below code.

In [7]:
# Store the first tweet information in the tweet variable
tweet = tweets_search_result[0]

# Access the text using the property text of the variable tweet
tweet_text = tweet.text

# Print the text of the tweet
print(tweet_text)

Market Briefing For Wednesday, May 26 $DAL $AMZN $SPY $BTC.X https://t.co/kD2dp0guAj


## Get full tweet text

The text of the tweet is truncated and only 140 characters are returned for this tweet. Therefore, if the actual tweet text is more than 140 characters then you will not be able to see the tweet. To fetch full tweet text, you will need to set the `tweet_mode` to `extended` as shown below.

In [8]:
tweets_search_result = api.search(q = '$AMZN', 
                                  count=100, 
                                  tweet_mode='extended')

tweet = tweets_search_result[0]
print(tweet.full_text)

Market Briefing For Wednesday, May 26 $DAL $AMZN $SPY $BTC.X https://t.co/kD2dp0guAj


<a id='remove'></a> 
## Remove retweets

However, this partially solves our problem as the retweets are still truncated to 140 characters. But we have the original tweet and the number of times it was retweeted. Therefore, we can remove all the retweets by adding `-filter:retweets` in the search string as shown below.

In [9]:
tweets_search_result = api.search(q = '$AMZN -filter:retweets', 
                                  count=100, 
                                  tweet_mode='extended')

tweet = tweets_search_result[0]
print(tweet.full_text)

Market Briefing For Wednesday, May 26 $DAL $AMZN $SPY $BTC.X https://t.co/kD2dp0guAj


<a id='info'></a> 
## Print tweet information

We have defined a function `print_tweet_info` to print other information such as id: a unique number to identify a tweet, the date and time (UTC) when the tweet was created, user screen name, tweet text, retweet count, favourite count and language of the tweet in a tabular format.

In [10]:
# tabulate helps to print the data in a tabular format
from tabulate import tabulate

# This function takes as input the tweet (tweepy.models.Status object) and prints information in that tweet.
def print_tweet_info(tweet):      
    tweet_info = [
                    ['Tweet ID: ', tweet.id_str],
                    ['Created At (UTC):', tweet.created_at],
                    ['User Screen Name: ', tweet.user.screen_name],
                    ['Tweet Text: ', tweet.full_text],
                    ['Retweet Count:', tweet.retweet_count],
                    ['Favourite Count:', tweet.favorite_count],
                    ['Language:', tweet.lang],                                   
                ]
    print(tabulate(tweet_info))

In [11]:
print_tweet_info(tweet)

-----------------  ------------------------------------------------------------------------------------
Tweet ID:          1397473397471330307
Created At (UTC):  2021-05-26 08:43:16
User Screen Name:  TalkMarkets
Tweet Text:        Market Briefing For Wednesday, May 26 $DAL $AMZN $SPY $BTC.X https://t.co/kD2dp0guAj
Retweet Count:     0
Favourite Count:   0
Language:          en
-----------------  ------------------------------------------------------------------------------------


<a id='english'></a> 
## Keep only tweets in the English language

To keep only the tweets in the English language, add the parameter `language='en'` to the search method.

In [12]:
tweets_search_result = api.search(q = '$AMZN -filter:retweets', 
                                  count=100, 
                                  tweet_mode='extended', 
                                  language='en')

<a id='generic'></a> 
## Remove generic tweets 

There are many times when the tweet is generic and not specific to a particular stock and has all the stock tickers mentioned in it. We believe that such tweets help little in determining the sentiment around any stock and should be removed.

You can count for the occurrence of the dollar sign in the tweet using the count method on the tweet text. If the occurrence of the dollar sign is more than 8 then it is a generic tweet and should be removed. Otherwise, the tweet can be kept.

We have shown below two sample tweets, one with many tickers in it and the other with only one stock ticker in it. 

In [13]:
tweet_text = "$BBDA draws huge volume and runs up 300%. $WDBG $LFAP $GNCP $SHLDQ $MXMG $MSPC $HPIL $CSOC $SHMP $FTEG $ACB $SGCP $VAPE $WWIO $DRUS $BLSP $WNBD $AAPL $TSLA $FB $DSCR $MSFT $BGFT $IFXY $ADTM $AMZN $SNAP $MLHC $MGTI $BRKK $NNRX $HRI $DMHI $VYST $PMPG https://t.co/n3jsNtItMy"
if tweet_text.count('$') >= 8:
    print("The tweet contains more than 8 tickers")
else:
    print("The tweet doesn't contains more than 8 tickers")


The tweet contains more than 8 tickers


In [14]:
tweet_text = "$AMZN draws huge volume and is in bull run."
if tweet_text.count('$') >= 8:
    print("The tweet contains more than 8 tickers")
else:
    print("The tweet doesn't contain more than 8 tickers")

The tweet doesn't contain more than 8 tickers


In the next notebook unit, you will learn how to fetch the tweets posted by a specific user and how to fetch tweets between specific dates.