## Exploring Ted Talks
The following dataset spans Ted Talks from 1984 to 2021

In [1]:
! powershell Get-Content TedTalks.csv -Head 10 | venv\Scripts\csvlook

| title                                                               | author                     |       date |   views |  likes | link                                                                                                             |
| ------------------------------------------------------------------- | -------------------------- | ---------- | ------- | ------ | ---------------------------------------------------------------------------------------------------------------- |
| Climate action needs new frontline leadership                       | Ozawa Bineshi Albert       | 2021-12-01 | 404,000 | 12,000 | https://ted.com/talks/ozawa_bineshi_albert_climate_action_needs_new_frontline_leadership                         |
| The dark history of the overthrow of Hawaii                         | Sydney Iaukea              | 2022-02-01 | 214,000 |  6,400 | https://ted.com/talks/sydney_iaukea_the_dark_history_of_the_overthrow_of_hawaii                                  |
| How pl

Taking an initial look at the CSV file.

In [2]:
import csv
data = []
with open("TedTalks.csv", encoding="UTF-8") as file:
    read_file = csv.reader(file)
    header = next(read_file)
    for row in read_file:
        data.append(row)
print(header)
print(data[:5])

['title', 'author', 'date', 'views', 'likes', 'link']
[['Climate action needs new frontline leadership', 'Ozawa Bineshi Albert', 'December 2021', '404000', '12000', 'https://ted.com/talks/ozawa_bineshi_albert_climate_action_needs_new_frontline_leadership'], ['The dark history of the overthrow of Hawaii', 'Sydney Iaukea', 'February 2022', '214000', '6400', 'https://ted.com/talks/sydney_iaukea_the_dark_history_of_the_overthrow_of_hawaii'], ['How play can spark new ideas for your business', 'Martin Reeves', 'September 2021', '412000', '12000', 'https://ted.com/talks/martin_reeves_how_play_can_spark_new_ideas_for_your_business'], ['Why is China appointing judges to combat climate change?', 'James K. Thornton', 'October 2021', '427000', '12000', 'https://ted.com/talks/james_k_thornton_why_is_china_appointing_judges_to_combat_climate_change'], ["Cement's carbon problem — and 2 ways to fix it", 'Mahendra Singhi', 'October 2021', '2400', '72', 'https://ted.com/talks/mahendra_singhi_cement_s_ca

The CSV file of data is imported and saved as a list. The data should be checked for any empty cells. If any row has empty cells, it will be removed.

In [3]:
row_count = 0
col_count = 0
for row in data:
    for col in row:
        if col == "":
            print("Empty cell at row {}, column {}".format(row_count, col_count))
        col_count += 1
    row_count += 1
    col_count = 0
            

Empty cell at row 3039, column 1


In [4]:
print(data[3039])

['Year In Ideas 2015', '', 'December 2015', '532', '15', 'https://ted.com/talks/year_in_ideas_2015']


Row 3039 is missing data. This row will be removed from the dataset.

In [5]:
del data[3039]

Let's make separate datasets sorted by most views and most likes respectively. The counts for views and likes first need to be converted to integers to be sorted properly.

In [6]:
for row in data:
    row[3] = int(row[3])
    row[4] = int(row[4])

Now they can be sorted.

In [9]:
views_data = sorted(data, key=lambda x: x[3], reverse=True)
likes_data = sorted(data, key=lambda x: x[4], reverse=True)
print("Most likes in descending order:")
for row in likes_data[:5]:
    print(row)
print("\nMost views in descending order:")
for row in views_data[:5]:
    print(row)

Most likes in descending order:
['Do schools kill creativity?', 'Sir Ken Robinson', 'February 2006', 72000000, 2100000, 'https://ted.com/talks/sir_ken_robinson_do_schools_kill_creativity']
['Your body language may shape who you are', 'Amy Cuddy', 'June 2012', 64000000, 1900000, 'https://ted.com/talks/amy_cuddy_your_body_language_may_shape_who_you_are']
['Inside the mind of a master procrastinator', 'Tim Urban', 'February 2016', 60000000, 1800000, 'https://ted.com/talks/tim_urban_inside_the_mind_of_a_master_procrastinator']
['The power of vulnerability', 'Brené Brown', 'June 2010', 56000000, 1700000, 'https://ted.com/talks/brene_brown_the_power_of_vulnerability']
['How great leaders inspire action', 'Simon Sinek', 'September 2009', 57000000, 1700000, 'https://ted.com/talks/simon_sinek_how_great_leaders_inspire_action']

Most views in descending order:
['Do schools kill creativity?', 'Sir Ken Robinson', 'February 2006', 72000000, 2100000, 'https://ted.com/talks/sir_ken_robinson_do_school

The top 5 most liked Ted Talks are also the top 5 most viewed. Let's see the average of likes and views per year.

In [10]:
from datetime import datetime
date_format = "%B %Y"
def averages(dataset, col):
    avg_per_year = {}
    for row in dataset:
        date = row[2]
        dt_object = datetime.strptime(date, date_format)
        year = dt_object.year
        if year not in avg_per_year:
            avg_per_year[year] = row[col]
        elif year in avg_per_year:
            avg_per_year[year] += row[col]
    for year in avg_per_year:
        avg = avg_per_year[year] / len(dataset)
        avg_per_year[year] = round(avg, 1)
    for key in sorted(avg_per_year.keys(), reverse=True):
        print("{}: {}".format(key, avg_per_year[key]))

print("Averages of views per year")
averages(data, 3)

print("\nAverages of likes per year")
averages(data, 4)

Averages of views per year
2022: 2108.8
2021: 79008.8
2020: 116923.0
2019: 174318.7
2018: 149266.0
2017: 183653.5
2016: 170784.1
2015: 212307.3
2014: 151618.4
2013: 195355.3
2012: 147671.6
2011: 105842.6
2010: 97103.5
2009: 102525.3
2008: 38243.8
2007: 32478.6
2006: 33298.4
2005: 28953.5
2004: 20661.5
2003: 10000.7
2002: 5597.0
2001: 1936.0
1998: 1362.4
1994: 128.5
1991: 60.1
1990: 132.9
1984: 202.2
1983: 126.9
1972: 239.0
1970: 46.3

Averages of likes per year
2022: 62.7
2021: 2400.9
2020: 3547.9
2019: 5288.6
2018: 4536.0
2017: 5610.7
2016: 5204.4
2015: 6462.9
2014: 4610.5
2013: 5929.4
2012: 4473.5
2011: 3213.6
2010: 2934.9
2009: 3096.4
2008: 1160.6
2007: 980.3
2006: 993.1
2005: 883.4
2004: 631.8
2003: 301.6
2002: 168.4
2001: 57.9
1998: 41.6
1994: 3.7
1991: 1.8
1990: 3.9
1984: 6.1
1983: 3.7
1972: 7.5
1970: 1.4


The likes and views for 2022 are lower since the year has only recently started. A dip in likes and views can be seen in 2020, and dips lower still in 2021. This is most likely due to the pandemic that started in the beginning of 2020.