https://www.kaggle.com/rounakbanik/ted-talks
These datasets contain information about all audio-video recordings of TED Talks uploaded to the official TED.com website until September 21st, 2017. The TED main dataset contains information about all talks including number of views, number of comments, descriptions, speakers and titles. The TED transcripts dataset contains the transcripts for all talks available on TED.com.
There are two CSV files.
ted_main.csv - Contains data on actual TED Talk metadata and TED Talk speakers. transcripts.csv - Contains transcript and URL information for TED Talks
The data has been scraped from the official TED Website and is available under the Creative Commons License.
I've always been fascinated by TED Talks and the immense diversity of content that it provides for free. I was also thoroughly inspired by a TED Talk that visually explored TED Talks stats and I was motivated to do the same thing, albeit on a much less grander scale.
Some of the questions that can be answered with this dataset: 1. How is each TED Talk related to every other TED Talk? 2. Which are the most viewed and most favorited Talks of all time? Are they mostly the same? What does this tell us? 3. What kind of topics attract the maximum discussion and debate (in the form of comments)? 4. Which months are most popular among TED and TEDx chapters? 5. Which themes are most popular amongst TEDsters?