Tools for processing twitter transcripts. Warning: these scripts are awful and ugly, so if you can possibly use anything else, please do so.
Load up the conversation in Hootsuite Dashboard, select all the tweets you want to archive, and then view the source of this selection. Save the HTML source to a file. These scripts assume the specific HTML format used by Hootsuite, so they won't work if you copy HTML from any other twitter viewer, such as twitter.com.
Then, pipe this saved html file to the process_hootsuite.py script like so:
cat tweets_2013-01-01.html | ./process_hootsuite.py > clean_tweets_reversed_2013-01-01.html
The resulting cleaned an reversed html is suitable for posting on the web, or pasting into an existing html document.
However, you will still want to visually inspect this file. It's possible that not all of the tweets were parsed correctly. In particular, look for any links that are missing their closing </a> tag
If that's all you want to do, you're done. But if you want to use the transcript analysis tools, continue on:
./tweet_stats.py clean_tweets_reversed_*.html | ./user_stats.py > output.tsv
Detailed explanation of steps:
-
Copy html source of tweets from HootSuite Dashboard.
-
Feed this html into process_hootsuite.py, which cleans up the HTML and reverses the chronological order.
-
Feed all of the cleaned-up html files into tweet_stats.py (which extracts username, date, and timestamp for each tweet)
-
Pipe the output of tweet_stats.py into user_stats.py (which counts number of tweets for each date and user and outputs tsv)
-
Generate charts using transcript_charts.R and the tsv output of user_stats.py
Example R output:



