# Twitter Crawler

The first thing you need to do is to create an application:

[Twitter Apps](https://apps.twitter.com/) Select the **Create New App** button and follow instructions to the end.

and obtain the following keys/tokens for authentication:

* consumer_key
* consumer_secret
* access_token
* access_token_secret

**Note** Generating Twitter API keys can take anywhere from minutes to weeks 

# **Tweepy**

> Tweepy is one of the best packages for working with twiter APIs [More](https://www.tweepy.org/)

In [2]:
## Import Required Modules

import os
import json
import tweepy


## Environment Setup and Authentication

- Set your twitter consumer_key, consumer_secret, access_token, and access_token_secret as environment variables. 
- For information on where to locate this information you can look at [TwitterEnvironment](https://developer.twitter.com/en/docs/apps/overview)
- A secure way to use your credentials is by creating environment variables in your terminal. 
```console
export 'consumer_key'='xxxx' 
export 'consumer_secret'='xxxx' 
export 'access_token'='xxxx' 
export 'access_token_secret'='xxxx'
```
- After authenticating the twitter credentials, you will be able to access the twitter api interface.

In [3]:
consumer_key = os.environ.get('consumer_key')
consumer_secret = os.environ.get('consumer_secret')
access_token = os.environ.get('access_token')
access_token_secret = os.environ.get('access_token_secret')

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

**Getting user’s Tweets**
>Main parameters:
> * id – Specifies the ID or screen name of the user.
> * count – Max amount of most recent tweets of user. <br>
> * [More Details]("https://tweepy.readthedocs.io/en/latest/api.html#API.user_timeline/")

In [4]:
!pip install columnar

Collecting columnar
  Downloading https://files.pythonhosted.org/packages/06/00/a17a5657bf090b9dffdb310ac273c553a38f9252f60224da9fe62d9b60e9/Columnar-1.4.1-py3-none-any.whl
Installing collected packages: columnar
Successfully installed columnar-1.4.1
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [4]:
import json
import tweepy
from columnar import columnar

username = 'boredbengio'
count = 5

# Only iterate through the first n statuses
tweets = tweepy.Cursor(api.user_timeline,
                       id=username).items(count)


# Pulling information from tweets iterable object
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]


#print tweets
headers = ['id', ' created_at','text']
table = columnar(tweets_list, headers, no_borders=True)
print(table)


# what are the current attributes/tags in a tweet?
# https://jsoneditoronline.org/
tweet = api.get_status('1420646753863225349')
print(json.dumps(tweet._json))




Unexpected parameter: id


        
  ID                    CREATED_AT                TEXT                          
    
  1520417612043325448  2022-04-30 14:59:41+00:00  RT @boredyannlecun: If (i) a  
                                                  ll the world's a ConvNet; &a  
                                                  mp; (ii) all worlds spawn a   
                                                  @ylecun, who invents ConvNet  
                                                  s → (iii) the probability th  
                                                  at…                           
  1517141321927954432  2022-04-21 14:00:52+00:00  My research program so big,   
                                                  God had to invent Bengio par  
                                                  allelism, forking Gogeta Ben  
                                                  gio into Yoshua &amp; Samy t  
                                                  o accel… https://t.co/Tinjmx  
              

**Pagination**
>Main parameters:
> * count – Max number of pages. <br>
> * [More Details]("https://docs.tweepy.org/en/stable/pagination.html")

In [6]:
# pagination.. iterate through pages
count = 1
for page in tweepy.Cursor(api.user_timeline,id=username).pages(count):
    searched_tweets = [status for status in page]
    ids_texts = [(json_obj.id, json_obj.text) for json_obj in searched_tweets]
    for id, text in ids_texts:
        print(id, text[:30])
    # searched_tweets = [status._json for status in page]
    #json_strings = [json.dumps(json_obj) for json_obj in searched_tweets]  
    #print(json_strings[0])
    


Unexpected parameter: id


1520417612043325448 RT @boredyannlecun: If (i) all
1517141321927954432 My research program so big, Go
1516969990901092353 My brain so big, I wrote "The 
1516968578016329729 RT @boredyannlecun: My influen
1516967191601725440 My lab so big, you need Hadoop
1516965673196572672 RT @boredyannlecun: My deep ne
1425265524238389254 RT @boredyannlecun: WTF, I was
1420646753863225349 There is a lot of talk about n
1338760483097235456 RT @boredyannlecun: Damn, @pmd
1338267929587163139 RT @boredyannlecun: What did y
1338266157573419008 RT @boredyannlecun: In light o
1334782203285430273 Seems @GoogleAI had the vanish
1324772330913046530 RT @boredyannlecun: Trump has 
1324481304797290497 ICLoseR (pronounced "I see los
1322996993120231426 What if instead of minimizing 
1313922130422136832 RT @boredyannlecun: Old man Ge
1313280370649989121 RT @BasicScienceSav: Thank you
1312595632851496960 RT @Graham__Duncan: Best "epic
1312595482359857153 RT @Jarmosan: Mr. Bengio, you’
1312495310837481472 Look at thi

In [7]:
user_id='14861663'
count = 5

followers = tweepy.Cursor(api.followers_ids,
                          user_id=user_id).items(count)

user_list = [[user] for user in followers]

headers = ['user_id']
table = columnar(user_list, headers, no_borders=True)
print(table) 

AttributeError: 'API' object has no attribute 'followers_ids'

**Getting user's followees**
>Main parameters:
> * user_id – Specifies the ID of the user.
> * [More Details]("http://docs.tweepy.org/en/v3.5.0/api.html#API.friends_ids")

In [8]:
user_id='14861663'
count = 5
    
friends = tweepy.Cursor(api.friends,
                        user_id=user_id).items(count)
    
# Pulling information from tweets iterable object
user_list = [[user.id,  user.screen_name, user.created_at] for user in friends]

#print users
headers = ['user_id', ' screen_name','created_at']
table = columnar(user_list, headers, no_borders=True)
print(table)   

AttributeError: 'API' object has no attribute 'friends'


**Getting tweet with specific id**
> helpful when you only have tweet ids and would like to get the corresponding attributes such as text.


In [9]:
import json 

tweet_id='1255894886051713030'

tweet = api.get_status(tweet_id)

tweet_list = [tweet.text, tweet.favorite_count, tweet.retweet_count]
print(tweet_list)

json_tweet = json.dumps(tweet._json)

print( json_tweet)

['Al Pacino Fan Site: Al Pacino The Latest Huge Name For Tarantino’s ‘Once Upon A Time In Hollywood’ https://t.co/ldyHRX2kuH', 7, 1]
{"created_at": "Thu Apr 30 16:20:48 +0000 2020", "id": 1255894886051713030, "id_str": "1255894886051713030", "text": "Al Pacino Fan Site: Al Pacino The Latest Huge Name For Tarantino\u2019s \u2018Once Upon A Time In Hollywood\u2019 https://t.co/ldyHRX2kuH", "truncated": false, "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": [{"url": "https://t.co/ldyHRX2kuH", "expanded_url": "https://alpacino.life/al-pacino-the-latest-huge-name-for-tarantinos-once-upon-a-time-in-hollywood-2.html", "display_url": "alpacino.life/al-pacino-the-\u2026", "indices": [99, 122]}]}, "source": "<a href=\"http://alpacino.info\" rel=\"nofollow\">AlPacino.info</a>", "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": {"id": 233717042, "id_str": 


**Twitter Search**
 > To search Twitter for recent tweets, we will define search terms and a start date of for search. [More Details](http://docs.tweepy.org/en/latest/api.html#API.search)<br>
 > - For creating complex queries please see [Building standard queries](https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/overview/standard-operators)
 > - Twitter API only allows you to access the past few weeks of tweets, so you cannot dig into the history too far.
​

In [10]:
# Define the search term and the date_since date

search_words = "#disneyland -filter:retweets"

# Collect tweets
tweets = tweepy.Cursor(api.search,
                       q=search_words,
                       lang="en").items(5)

# Pulling information from tweets iterable object
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]

#print tweets
headers = ['id', ' created_at','text']
table = columnar(tweets_list, headers, no_borders=True)
print(table)


AttributeError: 'API' object has no attribute 'search'

**Twitter Streaming API**
> The Twitter streaming API is used to download twitter messages in real time. In Tweepy, an instance of tweepy.Stream establishes a streaming session and routes messages to StreamListener instance. The on_data method of a stream listener receives all messages and calls functions according to the message type.<br>
> Using the streaming api has three steps: 
> - Create a class inheriting from StreamListener
> - Using that class create a Stream object
> - Connect to the Twitter API using the Stream.
[More Details](https://docs.tweepy.org/en/v3.5.0/streaming_how_to.html)

*What kinds of filters can be used?*: [see here](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/api-reference/post-statuses-filter)

*What are the error codes and how to handel them*: [see here](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/guides/streaming-message-types)