psychoanalyze-tweethead collection November 6, 2018 Charles Seife, NYU cs129@nyu
*** README ***
This code was only meant for my own use, so apologies for the ugliness. It's also far from my main activity, so please don't expect quick (or any!) response to bug fix requests, etc.
In addition to the files included, you need to make an apikeys.txt file that keeps your Twitter api keys (format: consumer_key|consumer_secret|access_token|accesstoken_secret).
Some programs in first three groups operate on a file that ends in ".namesfile"; each line in the namesfile has 2+ entries, delimited by tabs. Entry #1 (mandatory) = screen name of twitter user. Entry #2 (mandatory) = network label (for plotting). I've been using entry 3 for a measure of the confidence that I've got that the user is a bot, and entry 4 for my justification for that confidence, but YMMV. Sample included.
Bottleneck is the collector, thanks to the throttle on the API. Everything beyond that is relatively quick -- I haven't found the need to multiprocess. Each user gets two or more flat files, so the directory will fill up very quickly, but once the amalgamation of the analysis is done, you can move/discard those files.
Error handling is poor, and a few issues remain w/r/t data integrity (such as rare hiccups on bad characters.)
As for streaming, my tweetstreamer is unstable, especially with high-volume keyword lists. Definitely room for improvement with everything in group 4.
*** tweetcollector folder ***
Group 1: Tweet analysis & plotting
tweetcollectorbatch.py For each user in a namefile, gather info and last ~3220 tweets IN: namefile; OUT: tweetlogfile
tweetanalyzerbatch.py For each user in a namefile, analyze tweets contained in tweetlogfile IN: namefile, tweetlogfile; OUT: tweetanalysis
analysiscollectorbatch.py Collect all tweetanalyses related to a namefile in one large file IN: namefile, tweetanalysis; OUT: collectedanalysis
analysisplotter.py Plots all tweetanalyses in one collectedanalysis file IN: collectedanalysis
Group 2: Comms network generation from collected tweets
retweetedgeanalyzerbatch.py For each user in a namefile, extract edges&nodes (RTs and @s) from tweetlogfile IN: namefile, tweetlogfile; OUT: retweetedges
collectrtedgesbatch.py Collect all retweetedges files related to a namefile, place into 2 node/edge files IN: namefile, retweetedges; OUT: collectedrtedges, collectedrtnodes
Group 3: Followers/friends network generation
networkfollowercollectorbatch.py For each user in a namefile, extract followers and friends OUT: followers
collectrtedgesbatch.py Collect all followers files related to a namefile, place into 2 node/edge files IN: namefile, followers; OUT: collectedfollowedges, collectedfollownodes
Group 4: Stream monitoring via keyword/user
tweetstreamer.py Using keywords or user ID number, records all relevant tweets that come through stream OUT: streamlogfile
threadifyer.py Using the stream log file, attempts to identify threads OUT: threadedlogfile
threadparser.py Using the threaded log file, attempts to find parent->child relationships OUT: parsedlogfile, which can be modified for use in Gephi.