# Twitter Crawler

The first thing you need to do is to create an application:

[Twitter Apps](https://apps.twitter.com/) Select the **Create New App** button and follow instructions to the end.

and obtain the following keys/tokens for authentication:

* consumer_key
* consumer_secre
* access_token
* access_token_secret

**Note** Generating Twitter API keys can take anywhere from minutes to weeks 

# **Tweepy**

> Tweepy is one of the best packages for working with twiter APIs [More](https://www.tweepy.org/)

In [None]:
import tweepy
from columnar import columnar

#First update below varibales with your own information
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""


# Setting up Tweepy authorization
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

**Getting user’s Tweets**
>Main parameters:
> * id – Specifies the ID or screen name of the user.
> * count – Max amount of most recent tweets of user. <br>
> * [More Details]("https://tweepy.readthedocs.io/en/latest/api.html#API.user_timeline/")

In [None]:
import json

username = 'boredbengio'
count = 5

# Only iterate through the first n statuses
tweets = tweepy.Cursor(api.user_timeline,
                       id=username).items(count)


# Pulling information from tweets iterable object
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]


#print tweets
headers = ['id', ' created_at','text']
table = columnar(tweets_list, headers, no_borders=True)
print(table)


# what are the current attributes/tags in a tweet?
# https://jsoneditoronline.org/
tweet = api.get_status('1420646753863225349')
print(json.dumps(tweet._json))




**Pagination**
>Main parameters:
> * count – Max number of pages. <br>
> * [More Details]("https://docs.tweepy.org/en/stable/pagination.html")

In [None]:

# pagination.. iterate through pages
count = 1
for page in tweepy.Cursor(api.user_timeline,id=username).pages(count):
    searched_tweets = [status for status in page]
    ids_texts = [(json_obj.id, json_obj.text) for json_obj in searched_tweets]
    for id, text in ids_texts:
        print(id, text[:30])
    # searched_tweets = [status._json for status in page]
    #json_strings = [json.dumps(json_obj) for json_obj in searched_tweets]  
    #print(json_strings[0])
    


**Getting a user's followers**
>Main parameters:
> * user_id – Specifies the ID of the user.
> * [More Details]("http://docs.tweepy.org/en/v3.5.0/api.html#API.followers_ids")


In [None]:

user_id='14861663'
count = 5

followers = tweepy.Cursor(api.followers_ids,
                          user_id=user_id).items(count)

user_list = [[user] for user in followers]

headers = ['user_id']
table = columnar(user_list, headers, no_borders=True)
print(table) 

**Getting user's followees**
>Main parameters:
> * user_id – Specifies the ID of the user.
> * [More Details]("http://docs.tweepy.org/en/v3.5.0/api.html#API.friends_ids")

In [None]:
user_id='14861663'
count = 5
    
friends = tweepy.Cursor(api.friends,
                        user_id=user_id).items(count)
    
# Pulling information from tweets iterable object
user_list = [[user.id,  user.screen_name, user.created_at] for user in friends]

#print users
headers = ['user_id', ' screen_name','created_at']
table = columnar(user_list, headers, no_borders=True)
print(table)   



**Getting tweet with specific id**
> helpful when you only have tweet ids and would like to get the corresponding attributes such as text.


In [None]:
import json 

tweet_id='1255894886051713030'

tweet = api.get_status(tweet_id)

tweet_list = [tweet.text, tweet.favorite_count, tweet.retweet_count]
print(tweet_list)

json_tweet = json.dumps(tweet._json)

print( json_tweet)

**Twitter Search**
 > To search Twitter for recent tweets, we will define search terms and a start date of for search. [More Details](http://docs.tweepy.org/en/latest/api.html#API.search)<br>
 > - For creating complex queries please see [Building standard queries](https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/overview/standard-operators)
 > - Twitter API only allows you to access the past few weeks of tweets, so you cannot dig into the history too far.


In [None]:
# Define the search term and the date_since date

search_words = "#disneyland -filter:retweets"

# Collect tweets
tweets = tweepy.Cursor(api.search,
                       q=search_words,
                       lang="en").items(5)

# Pulling information from tweets iterable object
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]

#print tweets
headers = ['id', ' created_at','text']
table = columnar(tweets_list, headers, no_borders=True)
print(table)


**Twitter Streaming API**
> The Twitter streaming API is used to download twitter messages in real time. In Tweepy, an instance of tweepy.Stream establishes a streaming session and routes messages to StreamListener instance. The on_data method of a stream listener receives all messages and calls functions according to the message type.<br>
> Using the streaming api has three steps: 
> - Create a class inheriting from StreamListener
> - Using that class create a Stream object
> - Connect to the Twitter API using the Stream.
[More Details](https://docs.tweepy.org/en/v3.5.0/streaming_how_to.html)

*What kinds of filters can be used?*: [see here](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/api-reference/post-statuses-filter)

*What are the error codes and how to handel them*: [see here](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/streaming-message-types)

In [None]:

# Creating a StreamListener
class MyStreamListener(tweepy.StreamListener):

    #override tweepy.StreamListener to add logic to on_status
    def on_status(self, status):
        print('{}\t{}\n'.format(status.created_at,status.text))
        
# Creating a Stream
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)        

# Starting a Stream

# we will use filter to stream all tweets containing the hashtag '#covid19' and the query 'apple'
myStream.filter(track=['#covid19','apple'])




In [None]:
consumer_key = "**"
consumer_secret = "**"
access_token = "**"
access_token_secret = "**"