# Twitter Crawler

The first thing you need to do is log into Twitter and create an application.

[Twitter Apps](https://apps.twitter.com/)

Select the **Create New App** button and follow instructions to the end.

You will ultimately need the following pieces of information:

* consumer_key
* consumer_secre
* access_token
* access_token_secret

**Note** Generating Twitter API keys can take anywhere from minutes to weeks 

# **Tweepy**

> Tweepy is one of the best packages for working with twiter APIs [More](https://www.tweepy.org/)

In [1]:
import tweepy
from columnar import columnar

#First update below varibales with your own information
consumer_key = "**"
consumer_secret = "**"
access_token = "**"
access_token_secret = "**"


# Setting up Tweepy authorization
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

1. **Getting user’s Tweets**
>Main parameters:
> * id – Specifies the ID or screen name of the user.
> * count – Max amount of most recent tweets of user. <br>
> * [More Details]("https://tweepy.readthedocs.io/en/latest/api.html#API.user_timeline/")

In [2]:
username = 'boredbengio'
count = 5

# Collect tweets
tweets = tweepy.Cursor(api.user_timeline,id=username).items(count)


# Pulling information from tweets iterable object
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]

#print tweets
headers = ['id', ' created_at','text']
table = columnar(tweets_list, headers, no_borders=True)
print(table)


        
  ID                    CREATED_AT          TEXT                                
    
  1311056662007025666  2020-09-29 21:34:01  RT @mehdi_samsami: Deep learning p  
                                            ioneers (who won the 2018 Turing a  
                                            ward) in a rap contest! I was kind  
                                             of hoping that their tweets were   
                                            gen…                                
  1310800703905058816  2020-09-29 04:36:55  It's yo day to atone, so let me th  
                                            row u a bone                        
                                            Your ConvNet's simplistic, unconsc  
                                            ious heuristics                     
                                            My linguistic twis… https://t.co/7  
                                            3qsPCAtxQ                           
  131045592100

2. **Getting user's followers**
>Main parameters:
> * user_id – Specifies the ID of the user.
> * [More Details]("http://docs.tweepy.org/en/v3.5.0/api.html#API.followers_ids")


In [8]:

user_id='14861663'
count = 5

followers = tweepy.Cursor(api.followers_ids,user_id=user_id).items(count)

user_list = [[user] for user in followers]

headers = ['user_id']
table = columnar(user_list, headers, no_borders=True)
print(table) 

    
  USER_ID              
    
  824834746136006656   
  4907763859           
  1204953273951752192  
  995641454843252736   
  1242015753810788352  



3. **Getting user's followees**
>Main parameters:
> * user_id – Specifies the ID of the user.
> * [More Details]("http://docs.tweepy.org/en/v3.5.0/api.html#API.friends_ids")

In [9]:
user_id='14861663'
count = 5
    
friends = tweepy.Cursor(api.friends,user_id=user_id).items(count)
    
# Pulling information from tweets iterable object
user_list = [[user.id,  user.screen_name, user.created_at] for user in friends]

#print users
headers = ['user_id', ' screen_name','created_at']
table = columnar(user_list, headers, no_borders=True)
print(table)   


        
  USER_ID              SCREEN_NAME     CREATED_AT           
    
  37273937            MarieIlse12      2009-05-02 20:20:53  
  836374574921039873  BWTalkTech       2017-02-28 00:36:52  
  287976854           animationcareer  2011-04-26 01:12:59  
  7152572             librarycongress  2007-06-29 14:23:25  
  950069113           OJPNIJ           2012-11-15 16:39:25  




4. **Getting tweet with specific id**
> It would be helpful when we have tweet ids for which I would like to download their text content


In [18]:
import json 

tweet_id='1255894886051713030'

tweet = api.get_status(tweet_id)

tweet_list = [tweet.text, tweet.favorite_count, tweet.retweet_count]
print(tweet_list)

json_tweet = json.dumps(tweet._json)

print('\n', json_tweet)

['Al Pacino Fan Site: Al Pacino The Latest Huge Name For Tarantino’s ‘Once Upon A Time In Hollywood’ https://t.co/ldyHRX2kuH', 5, 0]


 {"created_at": "Thu Apr 30 16:20:48 +0000 2020", "id": 1255894886051713030, "id_str": "1255894886051713030", "text": "Al Pacino Fan Site: Al Pacino The Latest Huge Name For Tarantino\u2019s \u2018Once Upon A Time In Hollywood\u2019 https://t.co/ldyHRX2kuH", "truncated": false, "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": [{"url": "https://t.co/ldyHRX2kuH", "expanded_url": "https://alpacino.life/al-pacino-the-latest-huge-name-for-tarantinos-once-upon-a-time-in-hollywood-2.html", "display_url": "alpacino.life/al-pacino-the-\u2026", "indices": [99, 122]}]}, "source": "<a href=\"http://alpacino.info\" rel=\"nofollow\">AlPacino.info</a>", "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 233717042, "id_str

5. **Doing Search**
 > To search Twitter for recent tweets, we will define the Search term in this case "disneyland" and the start date of your search. [More Details](http://docs.tweepy.org/en/latest/api.html#API.search)<br>
 > - For creating complex queries please see [Building standard queries](https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/overview/standard-operators)
 > - Twitter API only allows you to access the past few weeks of tweets, so you cannot dig into the history too far.


In [19]:
# Define the search term and the date_since date

search_words = "disneyland -filter:retweets"

# Collect tweets
tweets = tweepy.Cursor(api.search,
              q=search_words,
              lang="en").items(5)

# Pulling information from tweets iterable object
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]

#print tweets
headers = ['id', ' created_at','text']
table = columnar(tweets_list, headers, no_borders=True)
print(table)


        
  ID                    CREATED_AT          TEXT                                
    
  1310802822783365120  2020-09-29 04:45:21  Hol up....is this Disneyland duck   
                                            girl?!?!?! 🤔🤔 https://t.co/pOl9H  
                                            4Lbg6                               
  1310802811152670721  2020-09-29 04:45:18  Someone take me to Tokyo Disneylan  
                                            d                                   
  1310802529978998784  2020-09-29 04:44:11  Someone couldn’t hook Tyra up with  
                                             Disney ears... any Etsy shop, any  
                                             company that is owned by Disney..  
                                            . anyone wann… https://t.co/egqVQG  
                                            vLad                                
  1310802311523045376  2020-09-29 04:43:19  Now Playing "Prince Ali (Reprise)"  
                

6. **Getting tweets form stream**
> The Twitter streaming API is used to download twitter messages in real time. In Tweepy, an instance of tweepy.Stream establishes a streaming session and routes messages to StreamListener instance. The on_data method of a stream listener receives all messages and calls functions according to the message type.<br>
> Therefore using the streaming api has three steps: [More Details](http://docs.tweepy.org/en/latest/streaming_how_to.html)
> - Create a class inheriting from StreamListener
> - Using that class create a Stream object
> - Connect to the Twitter API using the Stream.

In [4]:

# Creating a StreamListener
class MyStreamListener(tweepy.StreamListener):

    #override tweepy.StreamListener to add logic to on_status
    def on_status(self, status):
        print('{}\t{}\n'.format(status.created_at,status.text))
        
# Creating a Stream
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)        

# Starting a Stream

# we will use filter to stream all tweets containing the word covid19
myStream.filter(track=['covid19'])




2020-09-17 14:01:22	RT @AnkitSh87088319: #17Baje17Minute Unemployment is more dangerous than COVID19 pandemic.
Employment is the right for youngsters. They wil…

2020-09-17 14:01:22	RT @RepGraceMeng: Today, the House will vote on my resolution to denounce the anti-#Asian sentiment that has occurred since the outbreak of…

2020-09-17 14:01:22	RT @shirlh307: **
How come it's not on the 
about a ver…

2020-09-17 14:01:22	RT @shivamsandilya2: #17Baje17Minute Unemployment is more dangerous than COVID19 pandemic.
Employment is the right for youngsters. They wil…

2020-09-17 14:01:23	RT @ellymelly: No, I do not wish death on other people because I do not demand subservience to my opinion - unlike you.

And I have never s…

2020-09-17 14:01:23	RT @D_Boumaaz: #COVID19 du jour‼

Voici pourquoi le monde d'aujourd'hui est sens dessus dessous.

Cette pandémie ou plutôt plandemie a été…

2020-09-17 14:01:23	RT @KosUnbound: 6700 people told they were positive for covid19 when they were not. Geez!

ht

KeyboardInterrupt: 

In [None]:
consumer_key = "**"
consumer_secret = "**"
access_token = "**"
access_token_secret = "**"