Finds Communities within the Twitter domain using Spectral Clustering and Influence.
Steps:
- Harvest Tweets using streaming API (stream.py), store tweets in MongoDB
- Remove any tweets that do not have a retweet/mention
- Construct a graph from the remaining tweets
- Use Spectral Clustering to find the best clusters
- Evaluation Clustering using the Silhouette score
Requires the following modules:
- tweepy
- pymongo
- networkx
- matplotlib
- numpy
- sklearn
Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53–65. doi:10.1016/0377-0427(87)90125-7.