diff --git a/README.md b/README.md index c2f8ed3..c17c320 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,66 @@ To workaround Twitter's rate limits, ``tweetf0rm`` can spawn multiple crawlers e It's quite stable for the things that I want to do. I have collected billions of tweets from **2.8 millions** twitter users in about 2 weeks with a single machine. +Dataset +------------ +Twitter license (or at least the company's position on this) does not allow me redistribute the crawled data (e.g., someone asked the question a while back: https://dev.twitter.com/discussions/8232). But, here is what I have: + +* Health topics: I crawler tweets of 2,686,823 users' tweets (i.e., as of 11/12/2013; maximum of 3,200 per user, limitted by Twitter apis) in a matter of two weeks. All thhese twitter users follow one of the following, what I call, health-related information centers (i.e., person or organization who share health-related information, such as the [CNNHealth](https://twitter.com/cnnhealth). Note that, some of the users either haven't posted anything or have set the privacy setting to private, so it will show zero tweets for these users. Anyway, I haven't done anything to this dataset yet besides doing some pre-processing (indexing, calculate common statistics), although I have some research ideas that I am planning to try. If you want to get a hand on this dataset (either collaborate with me or just want the data), contact me at [Jiang Bian](jbian@uams.edu) :). The detailed stats such as how many tweets will be posted as soon as my code gets them calculated. + + * https://twitter.com/RWJF + * https://twitter.com/samhsagov + * https://twitter.com/PublicHealth + * https://twitter.com/WebMD + * https://twitter.com/NIMHgov + * https://twitter.com/HHSGov + * https://twitter.com/drsanjaygupta + * https://twitter.com/womenshealth + * https://twitter.com/HealthHabits + * https://twitter.com/medlineplus + * https://twitter.com/KHNews + * https://twitter.com/NIH + * https://twitter.com/cnnhealth + * https://twitter.com/DrOz + * https://twitter.com/projecthopeorg + * https://twitter.com/NBCNewsHealth + * https://twitter.com/LIVESTRONG + * https://twitter.com/JohnsHopkinsSPH + * https://twitter.com/CDC_eHealth + * https://twitter.com/healthfinder + * https://twitter.com/FamHealthGuide + * https://twitter.com/AmericanCancer + * https://twitter.com/HealthCareGov + * https://twitter.com/goodhealth + * https://twitter.com/CDCemergency + * https://twitter.com/Disc_Health + * https://twitter.com/HarvardHealth + * https://twitter.com/Health_Affairs + * https://twitter.com/WomensHealthMag + * https://twitter.com/latimeshealth + * https://twitter.com/FDA_Drug_Info + * https://twitter.com/nytimeshealth + * https://twitter.com/MayoClinic + * https://twitter.com/AIDSgov + * https://twitter.com/NPRHealth + * https://twitter.com/USDAFoodSafety + * https://twitter.com/DailyHealthTips + * https://twitter.com/MinorityHealth + * https://twitter.com/RedCross + * https://twitter.com/FDAWomen + * https://twitter.com/WSJhealth + * https://twitter.com/runnersworld + * https://twitter.com/bbchealth + * https://twitter.com/CMSGov + * https://twitter.com/AmerMedicalAssn + * https://twitter.com/KatherineHobson + * https://twitter.com/MensHealthMag + * https://twitter.com/FDArecalls + * https://twitter.com/WSJhealthblog + * https://twitter.com/CDCgov + * https://twitter.com/WHO + * https://twitter.com/GoHealthyPeople + * https://twitter.com/CDCFlu + * https://twitter.com/girlshealth Installation ------------