## Collecting tweets using the Twitter API


In this section we are going to see how to connect to the Twitter API to collect tweets and save them.

"In computer programming, an **Application Programming Interface (API)** is a set of subroutine definitions, protocols, and tools for building application software." [wikipedia](https://en.wikipedia.org/wiki/Application_programming_interface)

The Twitter API is the tool we use to collect tweets from Twitter

Twitter offers two different APIs:
- The Streaming API (https://dev.twitter.com/streaming/public) which allows to access a sample (~1%) of the public data flowing through Twitter.

- The REST API (https://dev.twitter.com/rest/public) which provide programmatic access to read and write Twitter data.

To use the Twitter API from python, we will use the library [tweepy](http://www.tweepy.org/) which facilitate the access to the API.

To install it run the following command in your terminal or execute the cell below:
```
pip install tweepy
```



In [1]:
# this will install tweepy on your machine
!pip install tweepy



Create a Twitter app and find your consumer token and secret

1. go to https://apps.twitter.com/
2. click `Create New App`
3. fill in the details
4. click on `manage keys and access tokens`
5. copy paste your *Consumer Key (API Key)* and *Consumer Secret (API Secret)* below:
6. click `create my access token`

In [None]:
consumer_key = 'xxx'
consumer_secret = 'xxx'
access_token = 'xxx'
access_token_secret = 'xxx'


In [2]:
consumer_key = 'jcMK6DLpQVLGGROhVTqDIng8Z'
consumer_secret = 'Q6crXUjhTqnKZxVMlSzVYgx71YR86LMVcwzOYG9TVCHXv66126'
access_token = '4844847993-SZiQKg4LQCqb4ZeUNLViRoYoKkIjGrcXyCDfhOL'
access_token_secret = 'Mq4IUBPYmuw2GTuHEFY6IuuZlqlDA7jbbAeLpyUxthwj5'


### Authentificate with the Twitter API


In [3]:
import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# create the api object that we will use to interact with Twitter
api = tweepy.API(auth)

In [4]:
# example of:
tweet = api.update_status('Hello Twitter')

In [5]:
# see all the information contained in a tweet:
print(tweet)

Status(_api=<tweepy.api.API object at 0x7f563492feb8>, _json={'created_at': 'Thu Apr 27 17:29:52 +0000 2017', 'id': 857648001610895360, 'id_str': '857648001610895360', 'text': 'Hello Twitter', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://www.cuny.edu" rel="nofollow">network_analysis_class</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 4844847993, 'id_str': '4844847993', 'name': 'Alexandre Bovet', 'screen_name': 'BovetAlexandre', 'location': 'Manhattan, NY', 'description': 'Physicists, Postdoctoral Researcher in Complex Networks #ComplexSystems, #DataScience, #Dataviz', 'url': 'https://t.co/VeoLL3mM6W', 'entities': {'url': {'urls': [{'url': 'https://t.co/VeoLL3mM6W', 'expanded_url': 'http://alexbovet.github.io/', 'display_url': 'alexbovet.github.io', 'indices': [0, 23]}]}, 'd

## Collecting tweets from the Streaming API
source : http://tweepy.readthedocs.io/en/v3.5.0/streaming_how_to.html

### Step 1: Creating a StreamListener

This simple stream listener prints status text. The on_data method of Tweepy’s StreamListener conveniently passes data from statuses to the on_status method.
Create class MyStreamListener inheriting from StreamListener and overriding on_status.:

In [6]:
#override tweepy.StreamListener to make it print tweet content when new data arrives
class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)

### Step 2: Creating a Stream

Using the api object we created and the StreamListener we can create a Stream Object:

In [7]:
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)

### Step 3: Starting a Stream

A number of twitter streams are available through Tweepy. Most cases will use filter, the user_stream, or the sitestream. For more information on the capabilities and limitations of the different streams see [Twitter Streaming API Documentation](https://dev.twitter.com/streaming/overview/request-parameters)

In this example we will use filter to stream all tweets containing the word python. The track parameter is an array of search terms to stream.

In [8]:
myStream.filter(track=['python'])

Python: most idiomatic way to convert None to empty string? #string #python #idioms https://t.co/KGx34j8auV
RT @rjallain: Why do power cables sag so low?  Here is a python model of a hanging cable https://t.co/HMGs5uQCDU #physics
RT @coding_jobfeeds: Python Developer (Django, SQL, RabbitMQ) - Revolutionising Inte Uxbridge https://t.co/F5dCQLcvlb #AngularJS #jobs
RT @0mgould: Biggest crowdpleaser at #EGU2017 : python scripting and arcgis-R bridge https://t.co/eC7xK3vxDs
Python: 2 Books In 1: Beginner's Guide + Best Practices To Programming Code With Python (py here  https://t.co/FPDlRTCKkw #java @androidbot_


KeyboardInterrupt: 

In [9]:
myStream.disconnect()

In [10]:
myStream.filter(track=['realdonaldtrump,trump'], languages=['en'])

RT @jonfavs: Not a small lie. Not a depends-on-your-perspective lie. A huge, Trump-like, don't-give-a-shit-what-you-think lie. https://t.co…
'The Simpsons' Spoofs Trump's First 100 Days (VIDEO + PHOTOS) https://t.co/F2D3tq36aI via @TMZ
RT @Fishgot2swim: With his astonishing array of lies,Trump has forfeited all credibility.

His staff &amp; followers excuse &amp; deflect but that'…
RT @DonaldJTrumpJr: In 8 years the prior admin almost doubles the debt aggregated by the US over its entire 240 years &amp; all of a sudden… 
'The Simpsons' commemorates Trump's first 100 days with scathing clip https://t.co/kDZeFoXLHN
Americans want accountability transparency &amp; ethics in the White House. Trump doesn't care https://t.co/LvD0MD0yPA @JohnCornyn @RepJoeBarton
RT @grace_meng: Call your Senators 202.225.3121 &amp; say NO to funding Trump’s dangerous immigration agenda. #NoRaids… 
RT @MaxineWaters: I just introduced the "No Russia Exemption for Oil Production Act" - the No REX Act. The bill b

KeyboardInterrupt: 

In [11]:
myStream.disconnect()

In [12]:
# streaming tweets from a given location
# we need to provide a comma-separated list of longitude,latitude pairs specifying a set of bounding boxes
# for example for New York
myStream.filter(locations=[-74,40,-73,41])

Live in #Westchester NY TONIGHT! Awesome comedy show and curry with @funnyindian @kromps. Call now for reservations… https://t.co/aLgwNpbeDb
@Ty_lindquist14 They'd rather look like they rolled around in a bag of nacho cheese Doritos
What an apt package to find sitting on my desk. https://t.co/zKN3ebUcJ6
Clearly something is wrong with Harvey like everyone else on this over throwing meat head rotation that claimed themselves the best in MLB
@Azizalqenaei
ما نعتب عليك وعلى طرحك لأننا مانعرفك
ولكن العتب على قناة الجزيره لنشرها ثقافة الشوارع
@BoyGeorge just wondering, do you ever eat fish?  xo
Supervisor: do this
Me: idk how to do this. Explain.
S: you've done it before.
Me: why would I ask you if I knew? 🤔
He's gotta go https://t.co/ZzujRX4YTt
@AMammalsTweeter @joeprince___ maybe but they're an extremely hard clinton supporter/bernie hater
"Real Estate Info"
"Daily News And Advice" 
https://t.co/MRzjLGk3mq 
Thanks to https://t.co/1GcymyIlve
https://t.co/nTJFlgSy5F
Weather startin to be ni

KeyboardInterrupt: 

In [13]:
myStream.disconnect()

### Saving the stream to a file
Lets' define a new StreamListener that will save the collected data to a file

In [14]:
#override tweepy.StreamListener to make it save data to a file
class StreamSaver(tweepy.StreamListener):
    def __init__(self, filename, api=None):
        self.filename = filename
        
        self.num_tweets = 0
        
        tweepy.StreamListener.__init__(self, api=api)
        
        
    def on_data(self, data):
        #print json directly to file
        
        with open(self.filename,'a') as tf:
            tf.write(data)
            
        self.num_tweets += 1
        
        print(self.num_tweets)
            
    def on_error(self, status):
        print(status)

In [17]:
# create the new StreamListener and stream object that will save collected tweets to a file
saveStream = StreamSaver(filename='trumpTweets3.txt')
mySaveStream = tweepy.Stream(auth = api.auth, listener=saveStream)


In [18]:
mySaveStream.filter(track=['realdonaldtrump,trump'], languages=['en'])


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277


KeyboardInterrupt: 

In [19]:
mySaveStream.disconnect()