Danish Deepak

# TED Talk Analysis

This project involves working on a dataset which consists information on various TED Talks by a number of authors.

In [1]:
# Importing the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
# Importing the dataset as a DataFrame.

df = pd.read_csv('ted_data.csv')

# Looking at the raw Dataframe.

df.head()

Unnamed: 0,title,author,date,views,likes,link
0,Climate action needs new frontline leadership,Ozawa Bineshi Albert,December 2021,404000,12000,https://ted.com/talks/ozawa_bineshi_albert_cli...
1,The dark history of the overthrow of Hawaii,Sydney Iaukea,February 2022,214000,6400,https://ted.com/talks/sydney_iaukea_the_dark_h...
2,How play can spark new ideas for your business,Martin Reeves,September 2021,412000,12000,https://ted.com/talks/martin_reeves_how_play_c...
3,Why is China appointing judges to combat clima...,James K. Thornton,October 2021,427000,12000,https://ted.com/talks/james_k_thornton_why_is_...
4,Cement's carbon problem — and 2 ways to fix it,Mahendra Singhi,October 2021,2400,72,https://ted.com/talks/mahendra_singhi_cement_s...


### Data Exploration.

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5440 entries, 0 to 5439
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   5440 non-null   object
 1   author  5439 non-null   object
 2   date    5440 non-null   object
 3   views   5440 non-null   int64 
 4   likes   5440 non-null   int64 
 5   link    5440 non-null   object
dtypes: int64(2), object(4)
memory usage: 255.1+ KB


There is only 1 null value in the DataFrame and that's in the author column.

In [8]:
df['author'].isna().sum()

1

### Data Preprocessing.

In [16]:
# Dropping the row with the null value

df = df.dropna()

# Checking for null values 

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5439 entries, 0 to 5439
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   5439 non-null   object
 1   author  5439 non-null   object
 2   date    5439 non-null   object
 3   views   5439 non-null   int64 
 4   likes   5439 non-null   int64 
 5   link    5439 non-null   object
dtypes: int64(2), object(4)
memory usage: 297.4+ KB


No Null values.

### Data Analysis

#### Finding the most popular TED Talk Speaker in terms of number of talks.

In [22]:
# Finding all the number of authors

author = {}

for i in df['author']:
    if i not in author:
        author[i] = 0

In [24]:
# Finding the number of talks of each author

for i in author.keys():
    c = 0
    for j in df.values:
        if i == j[1]:
            c += 1
    author[i] = c

In [26]:
# Finding the author with the most number of TED Talks

lst_nt = author.items()
df_nt  = pd.DataFrame(lst_nt, columns = ['Author', 'No. of TED Talks'])
df_nt.sort_values(by = 'No. of TED Talks', ascending = False).head(5)

Unnamed: 0,Author,No. of TED Talks
63,Alex Gendler,45
6,Iseult Gillespie,33
64,Matt Walker,18
108,Alex Rosenthal,15
26,Elizabeth Cox,13


Alex Gendler is the most popular TED Talk Author in terms of number of TED Talks.

#### Finding the most popular TED Talk speaker in terms of number of views.

In [27]:
# Finding the no. of views of each author

author_v = {}

for i in author.keys():
    c = 0
    for j in df.values:
        if i == j[1]:
            c += j[3]
    author_v[i] = c

In [28]:
# Finding the author with the most views

lst_nv = author_v.items()
df_nv  = pd.DataFrame(lst_nv, columns = ['Author', 'No. of Views'])
df_nv.sort_values(by = 'No. of Views', ascending = False).head(5)

Unnamed: 0,Author,No. of Views
63,Alex Gendler,187196000
372,Sir Ken Robinson,95654000
355,Bill Gates,77800000
305,Simon Sinek,74800000
3622,Brené Brown,72000000


Alex Gendler is the most famous TED Talk author in terms of number of views and is followed by Sir Ken Robinson and then Bill Gates.

#### Finding the most popular TED Talk Speaker in terms of no. of likes.

In [29]:
# Finding the total no. of likes of each author

author_l = {}

for i in author.keys():
    c = 0
    for j in df.values:
        if i == j[1]:
            c += j[4]
    author_l[i] = c

In [30]:
# Finding the author with the most likes

lst_l = author_l.items()
df_l  = pd.DataFrame(lst_l, columns = ['Author', 'No. of likes'])
df_l.sort_values(by = 'No. of likes', ascending = False).head(5)

Unnamed: 0,Author,No. of likes
63,Alex Gendler,5691000
372,Sir Ken Robinson,2833600
355,Bill Gates,2349000
305,Simon Sinek,2246000
3622,Brené Brown,2204000


Alex Gendler is the most famous TED Talk author in terms of number of likes and is followed by Sir Ken Robinson and then Bill Gates.

#### Finding the TED Talk with the best view to like ratio.

In [41]:
# Finding view to like ratio of every TED Talk.

ted_talk ={}

for i in df.values:
    ted_talk[i[0]] = round(i[3]/i[4], 2)

In [42]:
# Finding the TED talk with best view to like ratio

lst_vtl = ted_talk.items()
df_vtl  = pd.DataFrame(lst_vtl, columns = ['Title', 'View to Likes ratio'])
df_vtl.sort_values(by = 'View to Likes ratio', ascending = False).head(5)

Unnamed: 0,Title,View to Likes ratio
955,A camera that can see around corners,36.4
905,What's the point(e) of ballet?,36.4
837,How to see more and care less: The art of Geor...,36.4
26,Can you outsmart the fallacy that divided a na...,36.3
1016,The function and fashion of eyeglasses,36.3


The above Dataframe shows the TED talks with the best view to like ratio.

#### Month Wise Analysis of TED Talks.

In [49]:
# Finding the months

month = {}

for i in df.values:
    if i[2].split(' ')[0] not in month:
        month[i[2].split(' ')[0]] = 0

In [50]:
# Finding the number of TED Talks in every month 

for i in month:
    c = 0
    for j in df.values:
        if i == j[2].split(' ')[0]:
            c += 1
    month[i] = c

In [54]:
# Finding the month with the most number of TED Talks

lst_mt = month.items()
df_mt  = pd.DataFrame(lst_mt, columns = ['Month', 'No. of TED Talks'])
df_mt.sort_values(by = 'No. of TED Talks', ascending = False)

Unnamed: 0,Month,No. of TED Talks
1,February,725
7,November,682
3,October,585
5,March,580
10,April,576
9,June,493
8,July,446
2,September,349
0,December,334
11,May,322


In the month of February the most number of ted talks happen

#### Year wise analysis of TED Talks.

In [52]:
# Finding the years

year = {}

for i in df.values:
    if i[2].split(' ')[1] not in month:
        year[i[2].split(' ')[1]] = 0

In [55]:
# Finding the number of TED Talks in every year. 

for i in year:
    c = 0
    for j in df.values:
        if i == j[2].split(' ')[1]:
            c += 1
    year[i] = c

In [56]:
# Finding the year with the most number of TED Talks

lst_yt = year.items()
df_yt  = pd.DataFrame(lst_yt, columns = ['Year', 'No. of TED Talks'])
df_yt.sort_values(by = 'No. of TED Talks', ascending = False)

Unnamed: 0,Year,No. of TED Talks
2,2019,544
5,2020,501
4,2017,495
8,2018,473
3,2016,399
0,2021,390
11,2013,388
6,2015,376
7,2014,357
10,2012,302


The Most no. of ted talk happened in the year 2019.

#### Finding TED Talk of your Favourite author.

In [62]:
a = input('Enter your Favourite author: ').lower()
print('')
print('The following are the TED Talks of your favourite author: ')
print('')
print('-'*60)
for i in df.values:
    if a == i[1].lower():
        print('Title : ', i[0])
        print('Author: ', i[1])
        print('Date  : ', i[2])
        print('Views : ', i[3])
        print('Likes : ', i[4])
        print('link  : ', i[5])
        print('-'*60)

Enter your Favourite author: bill gates

The following are the TED Talks of your favourite author: 

------------------------------------------------------------
Title :  The innovations we need to avoid a climate disaster
Author:  Bill Gates
Date  :  March 2021
Views :  1700000
Likes :  53000
link  :  https://ted.com/talks/bill_gates_the_innovations_we_need_to_avoid_a_climate_disaster
------------------------------------------------------------
Title :  How the pandemic will shape the near future
Author:  Bill Gates
Date  :  June 2020
Views :  4600000
Likes :  138000
link  :  https://ted.com/talks/bill_gates_how_the_pandemic_will_shape_the_near_future
------------------------------------------------------------
Title :  How we must respond to the coronavirus pandemic
Author:  Bill Gates
Date  :  March 2020
Views :  8600000
Likes :  259000
link  :  https://ted.com/talks/bill_gates_how_we_must_respond_to_the_coronavirus_pandemic
----------------------------------------------------------

#### Finding TED Talks Based on tags.

In [65]:
a = input('Enter a Tag: ').lower()
print('')
print('The following are the TED Talks based pn your tag: ')
print('')
print('-'*60)
for i in df.values:
    if a in i[0].lower():
        print('Title : ', i[0])
        print('Author: ', i[1])
        print('Date  : ', i[2])
        print('Views : ', i[3])
        print('Likes : ', i[4])
        print('link  : ', i[5])
        print('-'*60)

Enter a Tag: climate

The following are the TED Talks based pn your tag: 

------------------------------------------------------------
Title :  Climate action needs new frontline leadership
Author:  Ozawa Bineshi Albert
Date  :  December 2021
Views :  404000
Likes :  12000
link  :  https://ted.com/talks/ozawa_bineshi_albert_climate_action_needs_new_frontline_leadership
------------------------------------------------------------
Title :  Why is China appointing judges to combat climate change?
Author:  James K. Thornton
Date  :  October 2021
Views :  427000
Likes :  12000
link  :  https://ted.com/talks/james_k_thornton_why_is_china_appointing_judges_to_combat_climate_change
------------------------------------------------------------
Title :  The ocean's ingenious climate solutions
Author:  Susan Ruffo
Date  :  October 2021
Views :  522000
Likes :  15000
link  :  https://ted.com/talks/susan_ruffo_the_ocean_s_ingenious_climate_solutions
-------------------------------------------------

This concludes this analysis project.