This is a project to understand and analyze the following:
- How a message flows in a social network
- Isolating user profiles for a specific topic.
- Summarizing Statistical data over a geographical area. -- a. Isolating messages from a specific geographic area. -- b. Using Message summarization -- c. Performing clustering on these summaries.
The framework is designed using Python & MongoDB. The python libraries required are:
- Tweepy - Python library to use the Twitter API http://tweepy.readthedocs.org/en/v3.2.0/#
- Pymongo - Python library to access MongoDB.
MongoDB
-
Update Mongo conf.
MongoDB config file - /etc/mongodb.conf Replace: bind_ip = 127.0.0.1 with bind_ip = 127.0.0.1,<ip_address of mongodb>
-
Create the following in MongoDB
a. Create a db 'twitter'. All the collections are created in this database. b. Create a collection 'twitter_data' which contains all the twitter data. Index twitter_data on the following: -- Location -- Tweet (text) -- User ID (Twitter Handle) -- Geo