# **Ted Talk Exploratory Data Analysis** 

dataset contains 6 different features of each talk available on TED's website 



In [66]:
#importing libraries

import numpy as np
import pandas as pd 
from matplotlib import pyplot as plt
import plotly.express as px
from wordcloud import WordCloud, STOPWORDS


## Data discovery & preparation



In [4]:
#reading the ted talk dataset (csv file)
df = pd.read_csv("tedtalk_data.csv")
df.head()

Unnamed: 0,title,author,date,views,likes,link
0,Climate action needs new frontline leadership,Ozawa Bineshi Albert,December 2021,404000,12000,https://ted.com/talks/ozawa_bineshi_albert_cli...
1,The dark history of the overthrow of Hawaii,Sydney Iaukea,February 2022,214000,6400,https://ted.com/talks/sydney_iaukea_the_dark_h...
2,How play can spark new ideas for your business,Martin Reeves,September 2021,412000,12000,https://ted.com/talks/martin_reeves_how_play_c...
3,Why is China appointing judges to combat clima...,James K. Thornton,October 2021,427000,12000,https://ted.com/talks/james_k_thornton_why_is_...
4,Cement's carbon problem — and 2 ways to fix it,Mahendra Singhi,October 2021,2400,72,https://ted.com/talks/mahendra_singhi_cement_s...


Questions to explore:
- top 10 viewed/liked vidoes ✅

- top 10 popular videos (views, likes, and watch time) ✅

- The authors with the most number of talks ✅

- graph author with the the title of their ted talks

- Does the most popular Author have a theme for his speeches?

- which month most of the ted talks were released 

- The newest & oldest ted talk in the dataset

- Visualization of number o ted talks every year



In [14]:
#(rows, columns)
df.shape

(5440, 6)

In [6]:
# name of the columns in the dataset
df.columns

Index(['title', 'author', 'date', 'views', 'likes', 'link'], dtype='object')

In [8]:
#each column type
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5440 entries, 0 to 5439
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   5440 non-null   object
 1   author  5439 non-null   object
 2   date    5440 non-null   object
 3   views   5440 non-null   int64 
 4   likes   5440 non-null   int64 
 5   link    5440 non-null   object
dtypes: int64(2), object(4)
memory usage: 255.1+ KB


In [12]:
#checking for NAs
df.isnull().sum()

title     0
author    1
date      0
views     0
likes     0
link      0
dtype: int64

Dealing with Missing values:

1- Delete the row with missing value

2- Imputing the Missing Value 

    - Replacing With Arbitrary Value

    - Replacing With Mean (not appropraite if there's outliers)

    - Replacing With Mode ( categorical features)

    - Replacing With Median (There's outliers)

    - Replacing with previous value – Forward fill (timeseries)

    - Replacing with next value – Backward fill (timeseries)

In [16]:
# viewing the row with null value
df[df['author'].isnull()]

Unnamed: 0,title,author,date,views,likes,link
3039,Year In Ideas 2015,,December 2015,532,15,https://ted.com/talks/year_in_ideas_2015


I clicked on the link of the ted talk to try to find the author of the ted talk (missing value), it turned out to be a compilation of ted talks in the year 2015, which is why I decided to deal with the NA by **Deleting it**.

In [17]:
#removing NA from our data
df.dropna(inplace=True)

In [19]:
#checking for duplicates 
df.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
5435    False
5436    False
5437    False
5438    False
5439    False
Length: 5439, dtype: bool

In [71]:
#Splitting the date column into months values and years values
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month_name()
df.head(3)

Unnamed: 0,title,author,date,views,likes,link,year,month
0,Climate action needs new frontline leadership,Ozawa Bineshi Albert,December 2021,404000,12000,https://ted.com/talks/ozawa_bineshi_albert_cli...,2021,December
1,The dark history of the overthrow of Hawaii,Sydney Iaukea,February 2022,214000,6400,https://ted.com/talks/sydney_iaukea_the_dark_h...,2022,February
2,How play can spark new ideas for your business,Martin Reeves,September 2021,412000,12000,https://ted.com/talks/martin_reeves_how_play_c...,2021,September


In [25]:
df['views'].describe().astype(int)

count        5439
mean      2061954
std       3567316
min          1200
25%        671000
50%       1300000
75%       2100000
max      72000000
Name: views, dtype: int64

In our Ted Talk dataset there's a **minimum of 1200** views, **maximum of 72,000,000** views, and the **average** number of views is **2,061,954**

---



In [26]:
df['likes'].describe().astype(int)

count       5439
mean       62619
std       107653
min           37
25%        20000
50%        41000
75%        65000
max      2100000
Name: likes, dtype: int64

In our Ted Talk dataset there's a **minimum of 37** likes, **maximum of 2,100,000** likes, and the **average** number of likes is **62,619**

In [44]:
top10views = df.sort_values(by = 'views', ascending = False).head(10)
top10views

Unnamed: 0,title,author,date,views,likes,link,year,month
5436,Do schools kill creativity?,Sir Ken Robinson,February 2006,72000000,2100000,https://ted.com/talks/sir_ken_robinson_do_scho...,2006,2
4084,Your body language may shape who you are,Amy Cuddy,June 2012,64000000,1900000,https://ted.com/talks/amy_cuddy_your_body_lang...,2012,6
2958,Inside the mind of a master procrastinator,Tim Urban,February 2016,60000000,1800000,https://ted.com/talks/tim_urban_inside_the_min...,2016,2
4765,How great leaders inspire action,Simon Sinek,September 2009,57000000,1700000,https://ted.com/talks/simon_sinek_how_great_le...,2009,9
4605,The power of vulnerability,Brené Brown,June 2010,56000000,1700000,https://ted.com/talks/brene_brown_the_power_of...,2010,6
3504,How to speak so that people want to listen,Julian Treasure,June 2013,49000000,1400000,https://ted.com/talks/julian_treasure_how_to_s...,2013,6
2168,My philosophy for a happy life,Sam Berns,October 2013,43000000,1300000,https://ted.com/talks/sam_berns_my_philosophy_...,2013,10
3251,The next outbreak? We're not ready,Bill Gates,March 2015,43000000,1300000,https://ted.com/talks/bill_gates_the_next_outb...,2015,3
3017,What makes a good life? Lessons from the longe...,Robert Waldinger,November 2015,41000000,1200000,https://ted.com/talks/robert_waldinger_what_ma...,2015,11
3994,"Looks aren't everything. Believe me, I'm a model.",Cameron Russell,October 2012,38000000,1100000,https://ted.com/talks/cameron_russell_looks_ar...,2012,10


In [52]:
top10liked = df.sort_values(by = 'likes', ascending = False).head(10)
top10liked

Unnamed: 0,title,author,date,views,likes,link,year,month
5436,Do schools kill creativity?,Sir Ken Robinson,February 2006,72000000,2100000,https://ted.com/talks/sir_ken_robinson_do_scho...,2006,2
4084,Your body language may shape who you are,Amy Cuddy,June 2012,64000000,1900000,https://ted.com/talks/amy_cuddy_your_body_lang...,2012,6
2958,Inside the mind of a master procrastinator,Tim Urban,February 2016,60000000,1800000,https://ted.com/talks/tim_urban_inside_the_min...,2016,2
4765,How great leaders inspire action,Simon Sinek,September 2009,57000000,1700000,https://ted.com/talks/simon_sinek_how_great_le...,2009,9
4605,The power of vulnerability,Brené Brown,June 2010,56000000,1700000,https://ted.com/talks/brene_brown_the_power_of...,2010,6
3504,How to speak so that people want to listen,Julian Treasure,June 2013,49000000,1400000,https://ted.com/talks/julian_treasure_how_to_s...,2013,6
2168,My philosophy for a happy life,Sam Berns,October 2013,43000000,1300000,https://ted.com/talks/sam_berns_my_philosophy_...,2013,10
3251,The next outbreak? We're not ready,Bill Gates,March 2015,43000000,1300000,https://ted.com/talks/bill_gates_the_next_outb...,2015,3
3017,What makes a good life? Lessons from the longe...,Robert Waldinger,November 2015,41000000,1200000,https://ted.com/talks/robert_waldinger_what_ma...,2015,11
3994,"Looks aren't everything. Believe me, I'm a model.",Cameron Russell,October 2012,38000000,1100000,https://ted.com/talks/cameron_russell_looks_ar...,2012,10


In [129]:
fig = px.scatter(df, 
                 x = "likes", 
                 y = "views", 
                 size = "likes", 
                 color = "views",
                 hover_name="title", 
                 log_x=True, 
                 size_max=60)
fig.show()

In [56]:
fig = px.bar(top10views, x="title", y=["views","likes"], title="Top 10 Popular videos based on Views & likes")
fig.show()

"Do schools kill creativity?" released in 2006 is still the most popular Ted Talk Video based on the number of views and likes.

In [85]:
#Top 10 authors with the most number of talks
top10Authors=df['author'].value_counts().reset_index()
top10Authors.columns=['author','counts']
top10Authors = top10Authors.head(10)
top10Authors

Unnamed: 0,author,counts
0,Alex Gendler,45
1,Iseult Gillespie,33
2,Matt Walker,18
3,Alex Rosenthal,15
4,Elizabeth Cox,13
5,Emma Bryce,12
6,Juan Enriquez,11
7,Daniel Finkel,11
8,Jen Gunter,9
9,Greg Gage,9


In [65]:
fig = px.bar(top10Authors, x="author", y="counts", title="Top 10 authors with the most number of talks")
fig.show()

The most popular Ted Talk Author is Alex Gendler, where he has 45 ted talk speeches.

In [86]:
alex_df = df[df['author'] == 'Alex Gendler']
alex_df

Unnamed: 0,title,author,date,views,likes,link,year,month
66,"Blood, concrete, and dynamite: Building the Ho...",Alex Gendler,December 2021,724000,21000,https://ted.com/talks/alex_gendler_blood_concr...,2021,December
329,The woman who stared at the sun,Alex Gendler,May 2021,1900000,57000,https://ted.com/talks/alex_gendler_the_woman_w...,2021,May
348,How one design flaw almost toppled a skyscraper,Alex Gendler,May 2021,712000,21000,https://ted.com/talks/alex_gendler_how_one_des...,2021,May
358,"Demolition, disease, and death: Building the P...",Alex Gendler,April 2021,724000,21000,https://ted.com/talks/alex_gendler_demolition_...,2021,April
367,How the world's tallest skyscraper was built,Alex Gendler,April 2021,770000,23000,https://ted.com/talks/alex_gendler_how_the_wor...,2021,April
372,Why are airplanes slower than they used to be?,Alex Gendler,April 2021,2200000,66000,https://ted.com/talks/alex_gendler_why_are_air...,2021,April
522,Building the world's largest (and most controv...,Alex Gendler,December 2020,810000,24000,https://ted.com/talks/alex_gendler_building_th...,2020,December
525,Can you solve the monster duel riddle?,Alex Gendler,December 2020,1800000,55000,https://ted.com/talks/alex_gendler_can_you_sol...,2020,December
548,Can you solve the Alice in Wonderland riddle?,Alex Gendler,November 2020,1600000,48000,https://ted.com/talks/alex_gendler_can_you_sol...,2020,November
757,The Egyptian myth of the death of Osiris,Alex Gendler,July 2020,2500000,77000,https://ted.com/talks/alex_gendler_the_egyptia...,2020,July


In [122]:
alex_df_sorted = alex_df.sort_values(by = 'year', ascending = True)
alex_df_sorted

Unnamed: 0,title,author,date,views,likes,link,year,month
3812,Myths and misconceptions about evolution,Alex Gendler,July 2013,2600000,79000,https://ted.com/talks/alex_gendler_myths_and_m...,2013,July
3325,What is a gift economy?,Alex Gendler,December 2014,422000,12000,https://ted.com/talks/alex_gendler_what_is_a_g...,2014,December
3387,Why elephants never forget,Alex Gendler,November 2014,8000000,242000,https://ted.com/talks/alex_gendler_why_elephan...,2014,November
3610,Why do we cry? The three types of tears,Alex Gendler,February 2014,6500000,196000,https://ted.com/talks/alex_gendler_why_do_we_c...,2014,February
3559,How tsunamis work,Alex Gendler,April 2014,7600000,228000,https://ted.com/talks/alex_gendler_how_tsunami...,2014,April
3416,History vs. Christopher Columbus,Alex Gendler,October 2014,4200000,126000,https://ted.com/talks/alex_gendler_history_vs_...,2014,October
3220,The wars that inspired Game of Thrones,Alex Gendler,May 2015,5800000,174000,https://ted.com/talks/alex_gendler_the_wars_th...,2015,May
3193,Can you solve the famously difficult green-eye...,Alex Gendler,June 2015,13000000,406000,https://ted.com/talks/alex_gendler_can_you_sol...,2015,June
3146,Can you solve the bridge riddle?,Alex Gendler,September 2015,19000000,590000,https://ted.com/talks/alex_gendler_can_you_sol...,2015,September
3107,Can you solve the prisoner hat riddle?,Alex Gendler,October 2015,24000000,723000,https://ted.com/talks/alex_gendler_can_you_sol...,2015,October


In [123]:

alex_df_month=alex_df.value_counts('year').reset_index()
alex_df_month.columns=['year','counts']
alex_df_month

Unnamed: 0,year,counts
0,2020,12
1,2019,9
2,2021,6
3,2014,5
4,2015,4
5,2018,4
6,2017,3
7,2013,1
8,2016,1


Alex Gendler first speech was in 2013 and 

In [124]:
fig = px.bar(alex_df, x=["views","likes"],y="title", title="Alex Gendler speeches Likes & Views")
fig.show()

In [134]:
# fig = px.line(alex_df, x="date", y="title", title='Alex Gendler ')
# fig.show()

fig = px.line(alex_df_sorted, x="date", y="views",hover_data=['title'])
fig.show()

In [76]:
#top Months 
topMonth=df.value_counts('month').reset_index()
topMonth.columns=['month','counts']
topMonth

Unnamed: 0,month,counts
0,February,725
1,November,682
2,October,585
3,March,580
4,April,576
5,June,493
6,July,446
7,September,349
8,December,334
9,May,322


In [73]:
fig = px.bar(topMonth, x="month", y="counts", title="Number of Ted Talk releases each month")
fig.show()

February and Novemeber are the most popular months to give a ted talk where August and January are least popular.

In [99]:
#top Years
topYear=df.value_counts('year').reset_index()
topYear.columns=['year','counts']
# topYear=topYear.head(22)
topYear

Unnamed: 0,year,counts
0,2019,544
1,2020,501
2,2017,495
3,2018,473
4,2016,399
5,2021,390
6,2013,388
7,2015,376
8,2014,357
9,2012,302


In [103]:
topYear.min()

year      1970
counts       1
dtype: int64

In [107]:
print(df[df['year']== 1970])

                                 title          author          date   views  \
736               Innovations in sleep      Beautyrest  January 1970   60000   
738  Love letters to what we hold dear  Debbie Millman  January 1970  192000   

     likes                                               link  year    month  
736   1800  https://ted.com/talks/beautyrest_innovations_i...  1970  January  
738   5700  https://ted.com/talks/debbie_millman_love_lett...  1970  January  


In [135]:
fig = px.bar(topYear, x="year", y="counts", title="Number of Ted Talk releases each Year")
fig.show()

The oldest Ted Talks is in 1970 and we can see that the Ted talks started increasing in the 2000s. The maximum number of speeches were in 2019