Permalink
Browse files

Releasing health topic follower dataset

  • Loading branch information...
1 parent 8bb2ff8 commit ea5b80cd4029edc8e500cfb881b7d78dcc79c575 @bianjiang committed Nov 14, 2013
Showing with 60 additions and 0 deletions.
  1. +60 −0 README.md
View
@@ -7,6 +7,66 @@ To workaround Twitter's rate limits, ``tweetf0rm`` can spawn multiple crawlers e
It's quite stable for the things that I want to do. I have collected billions of tweets from **2.8 millions** twitter users in about 2 weeks with a single machine.
+Dataset
+------------
+Twitter license (or at least the company's position on this) does not allow me redistribute the crawled data (e.g., someone asked the question a while back: https://dev.twitter.com/discussions/8232). But, here is what I have:
+
+* Health topics: I crawler tweets of 2,686,823 users' tweets (i.e., as of 11/12/2013; maximum of 3,200 per user, limitted by Twitter apis) in a matter of two weeks. All thhese twitter users follow one of the following, what I call, health-related information centers (i.e., person or organization who share health-related information, such as the [CNNHealth](https://twitter.com/cnnhealth). Note that, some of the users either haven't posted anything or have set the privacy setting to private, so it will show zero tweets for these users. Anyway, I haven't done anything to this dataset yet besides doing some pre-processing (indexing, calculate common statistics), although I have some research ideas that I am planning to try. If you want to get a hand on this dataset (either collaborate with me or just want the data), contact me at [Jiang Bian](jbian@uams.edu) :). The detailed stats such as how many tweets will be posted as soon as my code gets them calculated.
+
+ * https://twitter.com/RWJF
+ * https://twitter.com/samhsagov
+ * https://twitter.com/PublicHealth
+ * https://twitter.com/WebMD
+ * https://twitter.com/NIMHgov
+ * https://twitter.com/HHSGov
+ * https://twitter.com/drsanjaygupta
+ * https://twitter.com/womenshealth
+ * https://twitter.com/HealthHabits
+ * https://twitter.com/medlineplus
+ * https://twitter.com/KHNews
+ * https://twitter.com/NIH
+ * https://twitter.com/cnnhealth
+ * https://twitter.com/DrOz
+ * https://twitter.com/projecthopeorg
+ * https://twitter.com/NBCNewsHealth
+ * https://twitter.com/LIVESTRONG
+ * https://twitter.com/JohnsHopkinsSPH
+ * https://twitter.com/CDC_eHealth
+ * https://twitter.com/healthfinder
+ * https://twitter.com/FamHealthGuide
+ * https://twitter.com/AmericanCancer
+ * https://twitter.com/HealthCareGov
+ * https://twitter.com/goodhealth
+ * https://twitter.com/CDCemergency
+ * https://twitter.com/Disc_Health
+ * https://twitter.com/HarvardHealth
+ * https://twitter.com/Health_Affairs
+ * https://twitter.com/WomensHealthMag
+ * https://twitter.com/latimeshealth
+ * https://twitter.com/FDA_Drug_Info
+ * https://twitter.com/nytimeshealth
+ * https://twitter.com/MayoClinic
+ * https://twitter.com/AIDSgov
+ * https://twitter.com/NPRHealth
+ * https://twitter.com/USDAFoodSafety
+ * https://twitter.com/DailyHealthTips
+ * https://twitter.com/MinorityHealth
+ * https://twitter.com/RedCross
+ * https://twitter.com/FDAWomen
+ * https://twitter.com/WSJhealth
+ * https://twitter.com/runnersworld
+ * https://twitter.com/bbchealth
+ * https://twitter.com/CMSGov
+ * https://twitter.com/AmerMedicalAssn
+ * https://twitter.com/KatherineHobson
+ * https://twitter.com/MensHealthMag
+ * https://twitter.com/FDArecalls
+ * https://twitter.com/WSJhealthblog
+ * https://twitter.com/CDCgov
+ * https://twitter.com/WHO
+ * https://twitter.com/GoHealthyPeople
+ * https://twitter.com/CDCFlu
+ * https://twitter.com/girlshealth
Installation
------------

0 comments on commit ea5b80c

Please sign in to comment.