Twitter Subgraph Manipulator by Deen Freelon
In short, TSM is a Python module that contains a few functions for analyzing Twitter and Twitter-like (i.e., directed and very sparse with community structure) network data. I wrote it for my own research purposes but thought someone out there might find it useful.
Here are some of the things TSM can do:
- Support very large communities (millions of nodes/edges)--the only limit is your computer's memory
- Extract retweets and @-mentions into edgelist format for network analysis and visualization
- Partition networks into communities, isolate the N largest communities, and identify the most-connected users in each community
- Measure the insularity of network communities (using EI indices) to determine the extent to which each looks like an echo chamber
- Measure the overlap between network communities to determine which ones interact more and less often
- Get the top retweets in a Twitter dataset and rank them by N of retweets and by community
- Track Twitter (or other) communities over time: compute similarity scores (weighted or unweighted Jaccard coefficients) for partitioned network communities drawn from the same dataset at two different time slices
- Discover which nodes intermediate between which communities
- Find the most-used hashtags in each community (or dataset)
- Find the most-used hyperlinks or web domains in each community (or dataset)
Here's what you need to use TSM:
tsm.py, the TSM Python module provided here
- python-louvain, Thomas Aynaud's Python implementation of the Louvain method of network community detection.
- NetworkX, a widely-used Python module for general network analysis.
- Python 3.x (needed for Unicode support)
tsm.py for a full description of TSM's functions and how to use them. The module should work as long as NetworkX and
python-louvain are installed.
TSM demo files.zip file contains two IPython notebooks and a Twitter ID file that can be used to demo many of TSM's functions. Code and instructions are provided to hydrate the Twitter ID file. For testing purposes, here is a very brief (fabricated) sample demonstrating how input data for the
t2e function should be formatted in a plain text file. This sample was created by Devin Gaffney (@DGaffney):
dgaff,"Twitter is pretty fun, isn't it, @dfreelon?" dfreelon,Yes indeed @dgaff - @some_other_user weigh in? some_other_user,Of course twitter is grand. Mostly because of @dril. dril,Some weird tweet no one understands but everyone favorites @some_other_user cnnbrk,Looks like @dril just tweeted
I gratefully acknowledge funding support from the US Institute of Peace in creating this module.