Skip to content

Classes that utilize python and mongo to scrape the Twitter Restful API and insert data into a database.

Notifications You must be signed in to change notification settings

bwu/pymongo_for_twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pymongo Twitter Class
=====================

Using sixohsix's twitter package for python, and the pymongo package for
python and MongoDB, these classes work together to scrape the Twitter 
REST API and store the data in a Mongo database. The class was designed for
to be low maintenance, easy to monitor and regularly run so that one can
accurately monitor the performance of his or her Twitter handle over time.
General guidelines to use are shown below. I may be missing points. If you need
help using this, please reach out to me at @brain_deer.

Data is collected and stored in five different collections.
	- twitter_channel
	- twitter_content
	- twitter_replies
	- twitter_mentions
	- twitter_hashtags





General guidelines to use:
==========================

Requisites: 
	- Installed twitter, pymongo, bitly_api and itertools packages.
	- A set up mongo database

1. Create a twitter application and authorize given handles to acquire
oauth token and oauth token secret.

2. Create a "handles" collection with documents of the following format

	{
	   "_id": ObjectId("ABCD"),
	   "name": "justinbeiber",
	   "twitter": "JUSTINS_TWITTERID",
	   "twitter_hashtags": {
	     "0": "#followfriday",
	     "1": "#traveltuesday" 
	  },
	   "twitter_oauth_token": "JUSTINS_OAUTH_TOKEN",
	   "twitter_oauth_token_secret": "JUSTINS_TWITTER_OAUTH_TOKEN_SECRET" 
	}

3. Insert CONSUMER_KEY, CONSUMER_SECRET, and DATABASE_NAME in twitter_scrape.py

4. Insert BITLY_USERNAME, BITLY_APIKEY in insights.py

5. Run twitter_scrape.py from your python interpreter
	




Collected Data Points
=====================

twitter_channel - stores day by day channel data

Example:

	{
	   "_id": ObjectId("ABCD"),
	   "total_tweets": 5104,
	   "friends_count": 113,
	   "followers_count": 3644,
	   "handle_name": "justinbeiber",
	   "date": "2012-11-28",
	   "total_lists": 89 
	}

-----------------------------------------------

twitter_content - stores tweets made by handle shared to all followers,
freezing data after 2 and 7 days.

Example:

	{
	   "_id": ObjectId("ABCD"),
	   "handle_name": "justinbeiber",
	   "date": "2012-11-20 11: 00: 00",
	   "lifetime": {
	     "reply_count": 1,
	     "retweet_count": 11,
	     "bitly_clicks": 119 
	  },
	   "status_id": "1234",
	   "text": "Check out this pic from my latest concert: http: \/\/t.co\/dcjkeow",
	   "tweet_link": "http: \/\/twitter.com\/#!\/12345\/status\/1234",
	   "url": "bit.ly\/JKDixkl",
	   "2_day": {
	     "reply_count": 1,
	     "retweet_count": 9,
	     "bitly_clicks": 111 
	  },
	   "7_day": {
	     "reply_count": 1,
	     "retweet_count": 10,
	     "bitly_clicks": 117 
	  } 
	}
	
-----------------------------------------------

twitter_replies - stores tweets made by handle beginning with an @mention,
freezing data after 2 and 7 days.

Example:

	{
	   "_id": ObjectId("ABCD"),
	   "handle_name": "justinbeiber",
	   "date": "2012-11-20 11: 00: 00",
	   "lifetime": {
	     "reply_count": 1,
	     "retweet_count": 11,
	     "bitly_clicks": 119 
	  },
	   "status_id": "1234",
	   "text": "@JustinsFans Check out this pic from my latest concert: http: \/\/t.co\/dcjkeow",
	   "tweet_link": "http: \/\/twitter.com\/#!\/12345\/status\/1234",
	   "url": "bit.ly\/JKDixkl",
	   "2_day": {
	     "reply_count": 1,
	     "retweet_count": 9,
	     "bitly_clicks": 111 
	  },
	   "7_day": {
	     "reply_count": 1,
	     "retweet_count": 10,
	     "bitly_clicks": 117 
	  } 
	}
	
-----------------------------------------------

twitter_mentions - stores tweets mentioning a specific handle, storing specific
information about the user that tweeted.

Example:

	{
	   "_id": ObjectId("ABCD"),
	   "tweet_link": "http: \/\/twitter.com\/#!\/BrianSmith\/status\/1234",
	   "url": null,
	   "handle_name": "justinbeiber",
	   "user": {
	     "statuses_count": 104,
	     "name": "Brian Smith",
	     "friends_count": 26,
	     "followers_count": 839,
	     "location": "",
	     "screen_name": "BrianSmith" 
	  },
	   "status_id": "1234",
	   "date": "2012-11-28 18: 53: 02",
	   "text": "I just had a great time watching @justinbeiber!",
	   "retweet_count": 6,
	   "in_reply_to_status": null 
	}
	
-----------------------------------------------

twitter_hashtags - stores tweets containing a specific search term, storing specific
information about the user that tweeted.

Example:

	{
	   "_id": ObjectId("ABCD"),
	   "tweet_link": "http: \/\/twitter.com\/#!\/JoeShmo\/status\/1234",
	   "in_reply_to_screen_name": null,
	   "handle_name": "justinbeiber",
	   "query": "%23baconbits",
	   "user": {
	     "statuses_count": 9065,
	     "name": "Joseph Shmo",
	     "friends_count": 296,
	     "followers_count": 301,
	     "location": "'Jersey",
	     "screen_name": "JoeShmo" 
	  },
	   "status_id": "1234",
	   "date": "2012-11-28 18: 33: 58",
	   "text": "I loveeeee my #baconbits! bit.ly\/Xv4fd",
	   "url": "bit.ly\/Xv4fd",
	   "retweet_count": 36 
	}

-----------------------------------------------	

About

Classes that utilize python and mongo to scrape the Twitter Restful API and insert data into a database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages