# Scraping Twitter with Python

### Download Sublime Text

For this workshop, you will need a decent, cross-platform text editor. I recommend installing [Sublime Text 3](www.sublimetext.com/3). Just surf to the download tab and install the editor.

## Scraping Twitter


Ever since the launch of Twitter in 2006, researchers from various disciplines (e.g. Sociology, Linguistics, Computer Science, Folklore, etc.) have been interested in the social networking service. The service gained worldwide popularity, and nowadays, Twitter has more than 500 million registered users, out of which about 332 million are active. Every second, about 6000 tweets are produced on Twitter, which corresponds to 360,000 tweets per minute, about 518 million tweets per day and around 200 billion tweets per year. As such, Twitter hosts a massive amount of accessible messages, which in turn harbor vast amounts of information about numerous subjects and topics. 
<br/><br/>
The massive amounts of data require new tools, such as robust data-retrieval and data-mining methods, that can be employed to efficiently analyze the Twitter stream. In this workshop, we will create a small application to automatically 'scrape' large amounts of tweets from Twitter. In order to efficiently analyze these large amounts of tweets, we need tools to automatically convert the tweets in a more accessible and user-friendly format. Therefore, the second part of this workshop will be devoted to creating another application, which allows us to preprocess the data for doing textual analysis. Finally, in the third and last part of this workshop, we will start digging into some basic techniques for visualizing our data.

## Setting up Twitter

Now that we have seen some very basic computer code, it is time to start working on our Twitter scraping application. Twitter allows developers to access a small part of the complete Twitter stream by means of their provided Application Programming Interface (API). However, this stream is only accessible through authorized requests to the Twitter API. This means that we need to register our application at Twitter, which will provide us with the necessary access codes and passwords. Setting up the Twitter application involves the following XX steps:

### 1. Register as a Twitter user

Only registered users of Twitter can create applications. Our first step, therefore, is to create a Twitter account. Please visit the website of [Twitter](https://twitter.com/) and if you do not have account yet, please create one.

### 2. Register your application

In order to have access to Twitter data programmatically, we need to create an application which interacts with the Twitter API. To create this application, first visit the website [https://apps.twitter.com/](https://apps.twitter.com/), login to Twitter (if you're not already logged in), and click the button which says "Create New App":

![](images/twitter-app-create.png)

In the next step, you need to fill in the various required fields. First, give your application a new name, for example `lastname-scraper` (where you replace `lastname` with your own last name). Second, Provide a small description of your application (you can write whatever you like). Third, make up an web-address where your application is supposedly hosted (e.g. `http://www.lastname-scraper.nl`). Finally, agree with the "Developer Agreement" and press "Create your Twitter Application":

![](images/twitter-app-info.png)

After you have created your application, you will receive a "consumer key" and a "consumer secret". These "passwords" are application settings that should be kept private at all costs! In the next step, go to the tab called "Keys and Access Tokens" of your application:

![](images/twitter-key-access.png)

Scroll down and click on the button which says "Generate Access Token and Token Secret".

## Scraping Tweets

The programming language Python comes pre-installed with a large number of packages and modules that allow you to efficiently manipulate all kinds of data. However, the standard library of Python does not contain a package to work with Twitter data. Fortunately, the third-party package [Tweepy](http://docs.tweepy.org/en/v3.5.0/) provides all the necessary tools we need. In order use that package, we first need to install it. This can be done by executing the following cell:

In [2]:
!pip install tweepy

Collecting tweepy
  Downloading https://files.pythonhosted.org/packages/05/f1/2e8c7b202dd04117a378ac0c55cc7dafa80280ebd7f692f1fa8f27fd6288/tweepy-3.6.0-py2.py3-none-any.whl
Collecting requests-oauthlib>=0.7.0 (from tweepy)
  Downloading https://files.pythonhosted.org/packages/94/e7/c250d122992e1561690d9c0f7856dadb79d61fd4bdd0e598087dce607f6c/requests_oauthlib-1.0.0-py2.py3-none-any.whl
Collecting oauthlib>=0.6.2 (from requests-oauthlib>=0.7.0->tweepy)
[?25l  Downloading https://files.pythonhosted.org/packages/e6/d1/ddd9cfea3e736399b97ded5c2dd62d1322adef4a72d816f1ed1049d6a179/oauthlib-2.1.0-py2.py3-none-any.whl (121kB)
[K    100% |████████████████████████████████| 122kB 5.1MB/s ta 0:00:01
Installing collected packages: oauthlib, requests-oauthlib, tweepy
Successfully installed oauthlib-2.1.0 requests-oauthlib-1.0.0 tweepy-3.6.0


In order to authorize our application to access Twitter through our account, we first need to make Python aware of our authentication codes. The folder of this workshop (which you saved on your Desktop, or somewhere else) contains a file named `config.py`. Open that file using the Sublime Text editor. You should see something similar to:

![](images/sublime-config-file.png)

Update all variables with the appropriate values from your own scraping application (visit [https://apps.twitter.com/](https://apps.twitter.com/), copy your tokens and keys and paste them in the `config.py` file). After you have updated all variables, do not forget to **save** the file! To make Python aware of our configuration, execute the following code block:

In [1]:
import config

With the following code block, we set up all necessary components to access Twitter's API. We will first execute the cell and then try to understand, line by line, what all lines attempt to accomplish.

In [2]:
import tweepy

authentication = tweepy.OAuthHandler(config.consumer_key, config.consumer_secret)
authentication.set_access_token(config.access_token, config.access_secret)

api = tweepy.API(authentication)

The first line of this code block imports the `tweepy` package. This allows us to employ all functionality available in tweepy. The second line initializes a variable called `authentication`. We assign to that variable the value `tweepy.OAuthHandler(config.consumer_key, config.consumer_secret)`, which is an object available in tweepy handling the authentication for us. Note that this object takes two arguments (`config.consumer_key` and `config.consumer_secret`), which refer to the consumer key and consumer secret we set in our `config.py` file. The third line sets the access token of the authentication object. Again, as you can see, we make use of the information stores in our `config.py` file. Finally, at the last line, we initialize a variable called `api` and assign to it the object `tweepy.API(authentication)`, which takes a single argument (i.e. `authentication`).

It is well conceivable that this code block is a little overwhelming, but you don't need to understand all parts. It is, however, important that you at least try to read the lines and get a general impression of what is happening.

The good news is that this is basically all we need to do in order to access Twitter's API. For example, we can now use the variable `api` to access our own timeline on Twitter. Execute the following cell:

In [3]:
for tweet in tweepy.Cursor(api.home_timeline).items(10):
    print(tweet.text)

In the code block above, we make use of a so-called `for`-loop. A for-loop is one of the most important concepts in programming as it allows us to perform particular actions for multiple iterations. In this example, we loop over the first 10 items in our `api.home_timeline` and print the text of each tweet using `print(tweet.text)`. Try updating the number 10 with a different number and rerun the cell.

Let's have a look at another example of how to access your twitter account through the API. Using the following lines of code, we can print a list of all our followers:

In [4]:
for follower in tweepy.Cursor(api.friends).items():
    print(follower.name)

Finally, let's print a list of our 10 latest tweets:

In [5]:
for tweet in tweepy.Cursor(api.user_timeline).items(10):
    print(tweet.text)

Programmatically extracting our list of followers or our list of tweets is fun, but not particularly interesting. Let us now turn to our main application with which we can "keep the connection open" and gather all upcoming tweets about a particular topic, event or person. To this end, we need to make use of yet another functionality of tweepy, which is the `StreamListener`. The following code block gathers all new tweets that contain the word *python*. Execute this cell and wait for a couple of minutes. After that, press the "stop" button in your notebook (the black square).

In [None]:
from tweepy import Stream
from tweepy.streaming import StreamListener

class TwitterListener(StreamListener):

    def on_status(self, status):
        print(status.text)

    def on_error(self, status_code):
        if status_code == 420:
            #returning False in on_data disconnects the stream
            return False

twitter_stream = Stream(authentication, TwitterListener())
twitter_stream.filter(track=['python'])

How to pack serverless Python actions - IBM Cloud Blog #IBM https://t.co/Mkg6adKWM9 https://t.co/Eeh7Mct62E
Gênios 😂
Saudade de ver @montypython https://t.co/oFcjYrgD88
Python: Web Development and Penetration Testing

☞ https://t.co/YsVTWxNUkg

#python #WebDevelopment https://t.co/RJoOPXYorH
Introduction to programing Using Python #homework https://t.co/HYr5DPfAJ4
#Monitoring #Tools | Python Logging https://t.co/Gbm4rj4HgL
RT @opensourceway: How to create a 2D game with #Python and the Arcade library: https://t.co/tq3Oeh0u9b #PyCon https://t.co/gfTV6XXv24
RT @hasdid: #Monitoring #Tools | Python Logging https://t.co/Gbm4rj4HgL
RT @andartolo: As pessoas de bem da Noruega pediram que o filme "A Vida de Brian" fosse banido do país, e o governo baniu.
Então, o Monty P…
RT @yvNSHhtHVBRK7j5: python勉強中に「あれッ？低水準言語の方がよくね？」
ｃ++勉強中に「あれッ？言語よりアルゴリズムじゃね？」
アルゴリズム勉強中に「あれッ？数学も必要じゃね？」←今ココ
Actually I have interests in multiple fields: #bigfour #consulting #investmentbanking and my personal education bac… 

@_Reinose ㄴㄴㄴㄴ 그 코드는 task가 하나라 전혀 상관 없고 async한 task가 두개 이상 있을때 효과가 있는데 자세한건 https://t.co/Wd4clFGv5f 여기 보면 될듯
그냥 평상시에 사용하는 방법으로는 차이 없음
RT @gboeing: I did a comparison of city street network orientations in major US cities, and now I've got a better sense of why I find Bosto…
@nicolebyer Do they.. know Monty Python isnt one person?
Python: Web Development and Penetration Testing

☞ https://t.co/ySnYbrlXrS

#python #WebDevelopment https://t.co/zKxT9llnmi
En un tiempo en el que todo cambia todos los días, ganan enteros los lenguajes dinámicos como #python o… https://t.co/hRlqsFfyUJ
@Crell @mwop Comprehensions as in list comprehensions like Python has? I &lt;3 those.
EuroPython 2018: Find a new job at the conference
https://t.co/n5VdHxKBBL
+1 WebbyBot #Python
IM FUCKING DONE WITH YOU 😂😂😂😂😂😂😂😂😂😂
@BernardoMF @FinancialTimes É por isso que o Monty Python se aposentou, foram superados lá em ironia e no nosso congresso no non sense.
Django 2.1 release candidate 1 released | Weblog | Django
https

RT @camhouser: New geopandas! https://t.co/8kC2hTmNiz
Olha aí, que interessante.... https://t.co/WnJCh10SDu
RT @armaninspace: A simple neural network with #Python and Keras https://t.co/cOlCNdz00F
“Evolution of a salesman: A complete genetic algorithm tutorial for Python” by @ezstoltz https://t.co/MORAb0aAzF
Phew, with 5 mins to spare, managed to quickly update the Episode 75 Python's World download to 1.13 in time for th… https://t.co/B4WNwZyPbd
ADULTDEEPFAKESCOM Miley Cyrus Deep Fake (Double Hand Job)
 https://t.co/ux81qjQNch 
 kamapichachi in xxx sex videos… https://t.co/MUGkDSCgfk
El asado y el mate https://t.co/3xATxay3Ya
Looking for a new career? Learn Python. #machinelearning https://t.co/eFz4syt3vJ https://t.co/8CTgjrqeTV
RT @opensourceway: How a #Python-based tool built by three @Netflix engineers decreased outage response time from 45 minutes to just 7: htt…
Show HN: LiveDataFrame – Quickly Prototype Trading Strategies in Python https://t.co/FvuPYYqT6p
https://t.co/4Uohl71FBp

RT @DoraDollarDora: Python、たごとよっぴーがモデレーター（一番厄介なやつ）になってることを確認した
RT @ChannelNewsAsia: Yellow python spotted at HDB block in Queensway https://t.co/l4BheqD1jk https://t.co/HY22tawjok
Python Application Development Using Flashtext
https://t.co/PV55Bgpdtr

#Developer #node #nodejs #coding #js… https://t.co/GpbQG8AyXt
RT @BoostLog_Python: Python Application Development Using Flashtext
https://t.co/PV55Bgpdtr

#Developer #node #nodejs #coding #js #reactjs…
((That's horrible. I almost got bitten by a python at a carnival once.)) https://t.co/10tnzayeFQ
RT @awiltsch: Excited to get this out there! https://t.co/5KzSWLxMPv
Python's World - Giving you a tour of everything built in my world since the last tour! https://t.co/57qYBxSpYs
RT @raymondh: #python tip:  prefer str.index() over str.find().

The latter returns -1 for missing values.  Unfortunately, -1 is also a val…
RT @kdnuggets: Basic #Image #DataAnalysis Using #Numpy and #OpenCV, Part 2 #Python #DataScience #DataVisualization https://t.co

⭐️ Calculatrice Numworks : équations, saisie naturelle et transfert de scripts Python https://t.co/153KlkdmpG
@JohnCleese You should still find it hilarious. Trump is no more a soviet agent than I am ( I am not by the way ) l… https://t.co/FpEHgAxapP
RT @satofumi_: Heroes of the Storm でリプレイファイルからドラフトの内容を表示するツールです。 Python, Git のインストールが必要です。詳しくはパッケージ内の README.txt を御覧ください。あと、何か問題があればお知らせ下さい…
30 Free Courses: #NeuralNetworks, #MachineLearning, #Algorithms, #AI
by @granvilledsc @geoff_hinton |

Read more at… https://t.co/A0fJUhErhS
RT @Ronald_vanLoon: 30 Free Courses: #NeuralNetworks, #MachineLearning, #Algorithms, #AI
by @granvilledsc @geoff_hinton |

Read more at htt…
When learning Python/Django, should I set up a local server on my Mac or should I just use Google App Engine? by Hi… https://t.co/mhUvVTb9md
RT @rajatmonga: A great way to combine @TensorFlow graphs and Python code. Start with eager execution, deploy with graphs. https://t.co/Nxb…
#Day2
1. Tried a problem on GeeksforGeeks, a

Work started on 150.0 DAI (150.00 USD @ $1.0/DAI)
https://t.co/K6ycAWlKSH  #py-walletconnect-bridge #walletconnect #python #shell
RT @bbelderbos: Today I earned this @pybites certificate after coding a ton of #Python - join PyBites Code Platform and start coding! - htt…
ふむ。まぁお気持ちの表明みたいだから、これから色々決めていく感じなんかな。→
Python考案者のGuido van Rossum氏がPythonに関する意思決定から引退することを表明 | OSDN Magazine https://t.co/VjfNPzOKUP
Juliaのコツがわかってきたので、リファクタリングして可読性爆上げした。
プロット周り以外はpythonより柔軟に書けるようになった。
scikit-learnだけはpythonから使いたいけど。
@JayChamp16 Java? Python?
RT @_wildoc_: Juliaのコツがわかってきたので、リファクタリングして可読性爆上げした。
プロット周り以外はpythonより柔軟に書けるようになった。
scikit-learnだけはpythonから使いたいけど。
Hello,Python https://t.co/aYyVGIl2GJ
【ゼロから始める入門講座】やってみよ！python！
https://t.co/X608HexEoC
#udemy #クーポン #python
Creo que de los meetups de PyData se hace cargo NumFocus.
Desde pybonacci creo que poco podemos aportar.
Desde… https://t.co/Mzj353kbYY
Convert python code to TF symbolic code using just decorators! https://t.co/Xm09V7Mbfw
@dan_gunter @find_evil @

RT @TensorFlow: Graph models allow for all kinds of optimization, but it can be tricky to move between Python and graph code. To help with…
RT @ConcernedNIG: President Muhammadu Buhari, in his usual Highhandedness and gross abuse of human rights, sent the Nigerian Army with code…
RT @Ronald_vanLoon: 30 Free Courses: #NeuralNetworks, #MachineLearning, #Algorithms, #AI
by @granvilledsc @geoff_hinton |

Read more at htt…
RT @AnalyticsVidhya: Methodology of building a perfect predictive model as swiftly as in 10 minutes - explore here. https://t.co/0Tob4J4kgR…
RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: https://t.co/R06slu9JN…
@TweetinV Aw I forgot Monty Python
#15DAYSOFLEARNING
Day 2:Today i learned
DataBase modeling
Bulding different patterens of DataBase using join method… https://t.co/FA1scLlcTa
RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: htt

I need to start learning python asap.
@iDNEScz ...dočkejte času...nebude to dlouho trvat... podívejte na zákaz BBC vysílat Monty Python nebo třeba nové díly StarWars 😱👎
RT @IgboNative: 2019 Python dance will be for everybody, since we allowed our hate and tribalism blind us in kicking away the only democrac…
¿Aburridos de su fondo de pantalla?

Esta es una librería de #Python que genera fondos de pantalla aleatorios, basa… https://t.co/sNL1uvz1UO
RT @Don_Pep: Star Wars https://t.co/zmIJ5u8r1c
RT @mad_liberals: In celebration of my first blue check follower! @RealAlexJones 

Here's Monty Python And the Holy Trump video, featuring…
RT @Amaka_Ekwo: IgboNative on Twitter https://t.co/NTnzoxBr6d
RT @cuducos: Acabei de cadastrar uma issue na @RosieDaSerenata que me parece uma boa para quem está começando a estudar Python e ambiente d…
RT @GustArballo: Menciona este tuit con algo que le parezca genial a todo el mundo y a vos no.

Yo digo: me dan cero gracia los Monty Pytho…
Been away for a wh

RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: https://t.co/R06slu9JN…
RT @thoughtsymmetry: Great to see such a large and diverse turn out for our #deeplearning #python workshop on convolutional neural networks…
El Vitel Toné.
Vengan de a uno que los atiendo en fila (?) https://t.co/x7SXMRFkf1
my lesson for today: if you're trying to do something in python, chances are there's already a built-in function fo… https://t.co/zeGz641Unt
RT @realpython: 🐍 VIM and Python – A Match Made in Heaven https://t.co/dyYGVUtm6K
RT @raghavgoyal14: Convert python code to TF symbolic code using just decorators! https://t.co/Xm09V7Mbfw
@cesco_78 Spesso, ma stavolta lo faccio perché voglio piallare Windows, così da partire con una base pulita per deg… https://t.co/YlSKZwTXFR
RT @OracleDevs: Oracle #CodeOne is the most inclusive #developer conference on the planet. Join discussions on #Java, #Go, #Rust, #Python,…
RT @js_tut: Looks like py

@fuj_issan Python楽しそうやん！
@tiltaraiza @ChloeCumming Palin remains the Thinking Person's Python.
Genial! https://t.co/TJZhzMiwsZ
RT @IgboNative: 2019 Python dance will be for everybody, since we allowed our hate and tribalism blind us in kicking away the only democrac…
Watch what happened at 12:02 in @ApostleLeGrand's broadcast: Python spirit https://t.co/79m4bayb0e
けど大衆受け狙ってPythonとかでラムダ計算のインタプリタとか書くのはちょっと
@ohsayaa I'd say no prior coding knowledge required, though we process and interpret so much complex data so if you… https://t.co/pQdlNpzGUk
RT @Lugendre: けど大衆受け狙ってPythonとかでラムダ計算のインタプリタとか書くのはちょっと
Pythonワイ完全勝利やんけ草
RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: https://t.co/R06slu9JN…
「python(2)を完全に理解した」
RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: https://t.co/R06slu9JN…
RT @shouchu_record: ProgateでPython習得中。「while文 無限ループ (1)」のスライド。
「永遠にTrueになる」と注

RT @z_anderle: Python friends, I have a question for you. I'm preparing my "JavaScript for Python Developers" talk. My question: what are s…
RT @TensorFlow: Graph models allow for all kinds of optimization, but it can be tricky to move between Python and graph code. To help with…
I have a trove of opinions on why I consider #python to be one of the worst garbage languages of all time but nothi… https://t.co/bIXttA6KSZ
I mean, it's obviously the perfect time to propose new syntax for #python when we have no BDFL, right? #pep505… https://t.co/pXNVrDwCW6
Los minions y el dulce de leche. https://t.co/gWULHwyEkl
@thejoannagraham MCR followed by Monty Python followed by Akon omg Jo
RT @rzembo: “AutoGraph converts Python into TensorFlow graphs – TensorFlow”

#MachineLearning #DeepLearning #TensorFlow #AutoGraph via @Ten…
RT @varuask: #100DaysOfCode Yesterday i completed my first ever python project,today i learnt git basics and uploaded the project to github…
RT @ConcernedNIG: President Muham

RT @gp_pulipaka: OpenAI Defeats Amateur Teams at Dota 2. #BigData #Analytics #DeepLearning #MachineLearning #DataScience #AI #Python #RStat…
RT @gp_pulipaka: Build Powerful Pipelines in Any #Programming Language. #BigData #Analytics #MachineLearning #DataScience #AI #Python #RSta…
RT @gp_pulipaka: Do Bayesians Overfit? #BigData #Analytics #MachinenLearning #DataScience #AI #Statistics #Python #RStats #TensorFlow #Java…
Martin Cirio y eso de la faraona, vergüenza ajena https://t.co/AjV9DblExC
RT @gp_pulipaka: Pay What You Want and Become #DeepLearning and #MachineLearning #AI Expert with 4-Course Bundle. #BigData #Analytics #Data…
[mne-python] https://t.co/1qOYbpIV7D Eric Larson - ENH: Speed up a couple of tests (#5348)

﹡ ENH: Speed up a couple of tests

﹡ FIX: C...
I added a video to a @YouTube playlist https://t.co/mC4jWiBGJl Differences between python 2 and python 3 | Python 3 online course (A
RT @gp_pulipaka: Intuitive Interpretability: 100+ Papers and Tutorials. #BigData #Analytic

RT @Reza_Zadeh: AutoGraph generates TensorFlow graphs from arbitrary Python, supporting control flow such as while loops, if statements, br…
Python கத்துக்கலாம்னு இருக்கேன் 😎
RT @LucioSchAuna: Jugar al truco❌ https://t.co/iqXt0XvmTJ
RT @Iggyrock2: Perfect! 
Terry Gilliam would be proud! https://t.co/CNBODPvsdF
RT @awiltsch: Excited to get this out there! https://t.co/5KzSWLxMPv
RT @ConcernedNIG: President Muhammadu Buhari, in his usual Highhandedness and gross abuse of human rights, sent the Nigerian Army with code…
@Kinberg is a very smart producer.
Python考案者のGuido van Rossum氏がPythonに関する意思決定から引退することを表明 https://t.co/s2gwMxcRS2
RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: https://t.co/R06slu9JN…
RT @THR: .@Kinberg: "Logan was a Western, and Deadpool was like a Monty Python, R-rated comedy. Genre material has sort of pushed out a lot…
$1,200.00 Cesare Paciotti Italian Python Suede Boots Heels EU 40 Womens Shoes https

Who needs python, puppet, or embedded agents… we have TCL. I didn’t even know this was a thing. TCL based custom ag… https://t.co/JA1BDKi3Py
RT @jakevdp: Most common Python issue I see with students is they've installed Python three or four different ways &amp; have all their paths c…
+2348085713694
Info@erpschoolafrica.com

#DataAnalytics #enterprisesolutions #erp #ERPchain #erpnext… https://t.co/WQWzw0fjMM
@veramaz @Raspberry_Pi @nostarch That's exactly what I show you how to do in the last chapter. Use the official cam… https://t.co/l9L1xkavuA
RT @freeCodeCamp: Get started with blockchains and cryptocurrencies in Python. https://t.co/desl07DpZG
SpicyBigDaddy#8008: @python JavaScript is better btw kys
@SubconLaser £34,999 for 500W fiber laser (good for 5mm) or £49,999 for 1kw good for 10mm steel single phase AC.… https://t.co/OucGxiHIIs
RT @bcjordo: Who needs python, puppet, or embedded agents… we have TCL. I didn’t even know this was a thing. TCL based custom agents on Cis…
¿Por qué

el verano y la coca cola https://t.co/EjF3NbwAcp
geographica-longitude-async 0.3.4 #Python https://t.co/jlHpOHTFZg via @krachik
Harry Potter/Rápido y Furioso https://t.co/BhceS3SFtT
@darrelclute @bcjordo @gluwareinc @Cisco We need to be careful not to fall into a Python vs Ruby vs Go vs TCL debat… https://t.co/MVYDpJina9
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear… https://t.co/b1umESsoKj
RT @BoostLog_Python: Python Application Development Using Flashtext
https://t.co/PV55Bgpdtr

#Developer #node #nodejs #coding #js #reactjs…
RT @giulipsl: El mate. Vengan de a miles. https://t.co/ZuQPAOkA61
@ckhonson we need to make the kind of work that solves problems of human cooperation feel like a 3am python bender
RT @OnePerfectShot: MONTY PYTHON’S LIFE OF BRIAN (1979) 

DP: Peter Biziou 
Dir: Terry Jones
Buy or Rent via Amazon: https://t.co/R06slu9JN…
RT @bcjordo: Who needs python, puppet, or embedded agents… we have TCL. I d

@SnakeBytesTV Hi Bryan. Ive been a long time subscriber and was wondering how your scaleess ball python was doing?… https://t.co/WyLnH1W0c5
Last night in the dream of a nerd: 

I was with friends &amp; @JohnCleese &amp; @BrucioMcCulloch on a camping trip, explain… https://t.co/EE6v6CB6L9
The latest Python Informer Newsfeed! https://t.co/izYGubq046 Thanks to @Admin120746252 #python #html5
Adam in the junk yard: *Monty python voice* bring out ya dead *dink*
RT @kdnuggets: Basic #Image #DataAnalysis Using #Numpy and #OpenCV, Part 2 #Python #DataScience #DataVisualization https://t.co/s66Md6Jxty…
🐍 Practical Introduction to Web Scraping in Python https://t.co/WW7pyyulfW
How to read analog signals from #Python with an analog to digital converter and #RaspberryPi… https://t.co/rvvn0wYwe4
RT @iaintheretostay: @SnakeBytesTV Hi Bryan. Ive been a long time subscriber and was wondering how your scaleess ball python was doing? I w…
RT @gangewifre: Last night in the dream of a nerd: 

I was with fr

RT @isabelangelo100: The Exoplanet Summer School I'm working on at @BerkeleySETI is a great introduction to astronomy and exoplanet researc…
@jimmymandrill @BoakandBailey Did it end up in a Monty Python sketch 😳
DGN Technologies is hiring a Python developer in Sunnyvale, CA #job #Python https://t.co/fGmFJNwkfh
RT @PaulMoreiraPLTV: Le truc avec les Etats Unis, c'est qu'on a parfois l'impression que le réel est écrit par un scénariste des Monty Pyth…
RT @TsonkoMirchev: #Python JSON-RPC client for #Byteball — #Steemit https://t.co/q2ZzLZTKv9
RT @MTD_Zine: MARCH TO DELIVERANCE, a Lukas, Forsyth &amp; Python #FireEmblem fanzine, is now available to pre-order in both physical and digit…
Python programmer (GUI), UNIX bash programming https://t.co/QBnMLwdOTh
bitbank(ビットバンク)の公式APIの始め方（Python用コード） https://t.co/kqW2MWVBv6 #仮想通貨 #取引所
RT @williecolon: Monty Python Attacked For Being ‘Too White’ - https://t.co/dYKgPeqv1W
RT @KirkDBorne: Check out the new @BurtchWorks 2018 SAS, R, #Python Survey Resu

RT @ClintonBrownley: @NumFOCUS Projects at @SciPyConf 2018
Talks and videos and tutorials, oh my! #Python #Julia
https://t.co/JI3A4ikRGC ht…
RT @Python_Basics: How many ways can you print 3 without using any numbers in #python? https://t.co/inohEMsf98
RT @Jeff_OQuinn: In addition to #100DaysOfCode in #Python lately, I have also been binge watching The Man in the High Castle.  Who’s with m…
RT @data_nerd: Python for Beginners: Learning How to Use Conditionals in #Python https://t.co/IB2H6ulsSz
Python prog listening out for it - publish #MQTT to my broker - #NodeRED adding datetime and tweeting it out :) https://t.co/Onf3kkW1xR
RT @OCEANUserGroup: 3 Days of Awesome #IBMi Education starts tomorrow at #OCEANTechCon18! #SQL #Watson #PHP #Python #APIs #Nodejs #RPG #RDi…
“AutoGraph converts Python into TensorFlow graphs” by TensorFlow https://t.co/1POK1HL3pN
Python Software Foundation: The Happy Medium: Distinguished Service Award Winner Tim Peters https://t.co/u80VLjiLZ5
Real Python: Lists a

New #TensorFlow feature converts #Python code, including control flow, print() and other Python-native features, in… https://t.co/dygsb1BsEb
@dr_ayshah @AlRiyadh Python Fundamentals Wesley Chun 2009
los gatos https://t.co/uI26htuXeF
C'est quoi l'utilité des tuples par rapport aux listes en python ? 
Est-ce que ça sert vraiment à quelque chose ? (… https://t.co/ALSIAerAWm
@matheoxz AMEI e eu enrolando pra fazer aquele único de Python KWBQKSBQK
RT @yvNSHhtHVBRK7j5: python勉強中に「あれッ？低水準言語の方がよくね？」
ｃ++勉強中に「あれッ？言語よりアルゴリズムじゃね？」
アルゴリズム勉強中に「あれッ？数学も必要じゃね？」←今ココ
RT @RSComponents: Speed up your #IoT development thanks to our new partnership with @Zerynth. Easily program popular microcontrollers in #P…
RT @devtazv2: Day 112: #100DaysOfCode progress: trying to make an app for commercial use/sell. So far went smooth but shouldn't have change…
Game of thrones
Salsas en hamburguesa
Reggaeton https://t.co/VccTOKaROe
So much of comedy ages incredibly badly. It feels like our taste in what's funny changes re

RT @markhneedham: Stealing an idea from @BarrasaDV  - QuickGraph: Analysing #Python Dependency Graph with PageRank, Closeness Centrality, a…
So happy to announce the release of this course! I hope folks enjoy it. https://t.co/oN2OttyAgL
RT @ArkEcosystem: With a new #ARK Core comes newly coded #SDK's as well, completely restructured and written from scratch.

https://t.co/mx…
Working through this fantastic ML course by @AndrewYNg (thank you!) on @coursera - octave is good, but I am more co… https://t.co/qYNikQrXQX
python-bogus-project-honeypot 1531939498.3 https://t.co/CWPFkl5Pgt
RT @Reza_Zadeh: AutoGraph generates TensorFlow graphs from arbitrary Python, supporting control flow such as while loops, if statements, br…
RT @freakboy3742: A huge *thank you* to @gvanrossum for everything he’s done for the Python community. I can’t understate how profound your…
@CBCPolitics @BillBlair The Minister of Irregularity. Very Monty Python..ish. 😁 #cdnpoli
RT @teoliphant: Excellent talk on UMAP for 

Can you adapt this code block to search for another hashtag?

In this workshop, we have executed all our Python code in a so-called IPython notebook (for more information, see [http://jupyter.org/](http://jupyter.org/)). However, many developers prefer to use the command line for executing small scripts. In what follows, I will show you how we can scrape tweets from Twitter using the command line.

You have already used the command line to launch the IPython notebook of this workshop and we will use the same application to run a Python script. First, launch a new terminal (on Linux or Mac OS X) or command prompt (on Windows). Second, move to the folder in which you saved the current workshop. For Mac OS X and Linux users:

    cd ~/Desktop/twitter-workshop-master

For Windows users:

    cd c:\Users\your-user-name\Desktop\twitter-workshop-master

Replace `your-user-name` in the path above with your actual user name. The folder `twitter-workshop-master` contains a file named `tweetscraper.py`, which is a Python script to scrape tweets from Twitter. Python scripts can be executed from the command line/ prompt by issuing:

    python name-of-script.py [arguments]

In our case, this means that we can execute the tweet-scraper by issuing:

    python tweetscraper.py

The tweet-scraper script takes two arguments. The first argument is the path to the directory in which you want to save the streamed tweets. The second argument is the search query, with which you specify one term or multiple terms that must be present in a tweet. To execute the `tweetscraper.py` script, issue the following command:

    python tweetscraper.py -d data -q lunch

Your terminal or command prompt should look similar to:

![](images/commandline.png)

Open your file browser and watch the file names `stream_lunch.json` grow. To stop the script from streaming, hold `ctrl` and press `c`. To finish this section, try searching for another term or terms.

## Data Preprocessing

Now that we have created a small application to scrape tweets from Twitter, let us move to the second part of this workshop, in which we will perform some text preprocessing steps we need for textual analysis.

Let us first have a look at the structure of a tweet. We saved our tweets in the `data/` folder under the name `stream_[QUERY].json`, where query should be replaced with the query you searched for. The tweets are formatted in JSON, which is a common data structure to store data on the Internet. Let's have a look at the structure of a single tweet:

In [8]:
import json

with open("stream_lunch.json") as infile:
    line = infile.readline()
    tweet = json.loads(line)
    print(json.dumps(tweet, indent=4))

FileNotFoundError: [Errno 2] No such file or directory: 'stream_lunch.json'

As you can see, a tweet contains quite a lot of information about, for example, the geolocation, the date, the tweeter, etc. The most important fields for our purposes are:

1. text: the text of the actual tweet;
2. created_at: the date of creation;
3. favorite_count: the number of favorites of this tweet;
4. retweet_count: the number of retweets;
5. lang: the language of the tweet;
6. id: the tweet identifier;
7. place: geo-location information;
8. user: the profile of the author of the tweet.

The JSON format is quite convenient for doing computational analyses, but often we would like to manually inspect the data. Unfortunately, JSON is not the most readable format. Therefore, in what follows we will convert our scraped tweets into an CSV (comma separated value) file, which you can open and modify using familiar software such as Excel. The following code block shows you how to do that in only a few lines of code:

In [None]:
import csv

fields = ['id', 'created_at', 'user', 'text', 'favorite_count', 'retweet_count', 'lang', 'place']

tweets = []
with open("data/stream_lunch.json") as infile:
    for line in infile:
        tweet = json.loads(line)
        information = []
        for field in fields:
            if field not in tweet:
                information.append('')
            elif field == 'user':
                information.append(tweet['user']['name'])
            elif field == 'place' and tweet['place'] != None:
                information.append(tweet['place']['name'])
            else:
                information.append(tweet[field])
        tweets.append(information)

with open("data/stream_lunch.csv", "w") as outfile:
    csvwriter = csv.writer(outfile)
    csvwriter.writerow(fields) # write the field names as header
    csvwriter.writerows(tweets)

In this code block, the variable `fields` is a list, which indicates the fields of the tweets we want to store in our CSV-file. Can you update this list with some other fields?

The folder of this workshop contains another Python script named `json2csv.py`, which can be used to convert a file of tweets in JSON format to a CSV file. Again, this script needs to be executed from the command line/ prompt. It takes two arguments: the input file (i.e. the json file containing our tweets) and the output file (i.e. the filename of our new CSV file):

    python json2csv.py --infile [PATH TO YOUR JSON FILE] --outfile [PATH TO YOUR CSV FILE]

Can you update this line with the correct paths to both the JSON file and the new CSV file and execute it in your terminal or command prompt. After that, open the CSV file with Excel or some other spreadsheet program.

## Data Visualization

After we have collected and preprocessed our data, let us move on to the final part of this workshop, which we will start digging into some aspects of data visualization.

There are many libraries available in Python for doing data visualization. The backbone of most of these libraries is the acclaimed plotting library [Matplotlib](http://matplotlib.org/). Matplotlib provides almost all functionality to make the fancy graphs you always dreamed of. The default color schemes of this library, however, don't look really pretty ([although people are working on this](http://bids.github.io/colormap/)). I therefore generally resort to [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/index.html) or to the interactive plotting library [Bokeh](http://bokeh.pydata.org/en/latest/) when doing data visualization. In this workshop, we will have a brief look at some of the plotting functionality in Seaborn.

We start with importing the library into our notebook:

In [None]:
import seaborn as sns
%matplotlib inline

After that we can create simple plots such as:

In [None]:
sns.plt.plot(range(10))

Or:

In [None]:
import numpy as np

t = np.arange(0.0, 2.0, 0.01)
s = np.sin(2 * np.pi * t)
sns.plt.plot(t, s)

sns.plt.xlabel('time (s)')
sns.plt.ylabel('voltage (mV)')
sns.plt.title('About as simple as it gets, folks')

Let us now try something more interesting: can we visualize our scraped twitter stream as a timeline, in which we can observe the rise and fall of a particular topic on Twitter? Creating such a timeline involves the following steps. First we need to load our CSV file into Python. The third-party library [Pandas](http://pandas.pydata.org/) provides excellent functionality to work with CSV files. By executing the following cell, we load all our data into Python:

In [18]:
import pandas as pd
data = pd.read_csv("data/stream_lunch.csv", parse_dates=['created_at'], index_col='created_at').sort_index()
data.head()

Second, we plot the timeline using:

In [19]:
data.text.notnull().resample("1T").sum().plot()

Just as we expected... In the code block above, I resample the data into 1 minute bins and count how often the timestamps fall into a bin. Try changing `1T` into `3T` and see what happens when you execute the code block again.

Let's make some other time series plots, which give a more detailed view on what people are discussing. For illustration purposes, let find out how if and when people are tweeting about a sandwich:

In [20]:
data.text.str.contains("sandwich").resample("1T").sum().plot()

Or simply a snack:

In [21]:
data.text.str.contains("snack").resample("1T").sum().plot()

These plots can be combined into a single plot by executing the two lines within the same cell:

In [22]:
data.text.str.contains("snack").resample("1T").sum().plot()
data.text.str.contains("sandwich").resample("1T").sum().plot()
data.text.str.contains("food").resample("1T").sum().plot()

sns.plt.legend(["snack", "sandwich", "food"])

The folder of this workshop contains another Python script named `visualizetweets.py`, with which you can create these time series plots from the command line. The script takes 3 arguments: (i) the CSV file containing the tweets, (ii) the way you want to resample and aggregate your data (e.g. into bins of 1 minute, etc.) and (iIi) the terms you want to plot. If you don't specify any terms, the script will simply produce a time series plot of all tweets. The script can be executed as follows:

    python visualizetweets.py --infile [PATH TO YOUR CSV FILE] --resample [RESAMPLE METHOD] --query [TERM OR TERMS]

Update the various arguments and try executing this command. After that, open the data folder of this workshop and open the file named "tweetviz.pdf".