# Ted Talks Recommendation System with Machine Learning

Ted Talks Recommendation System has to be purely based on the content rather than based on data of a user. As a user generally watches videos on Youtube and other applications mostly to get entertained. But a user watches Ted Talks to take some inspiration, so the data of the user has nothing to do here.

To recommend Ted Talks to a user we need to create a content-based recommendation system where all the ted talks will be recommended based on the content of the video that the user watched earlier. To create such a system we can use the concept of cosine similarity in machine learning.

Ted Talks recommendation system contains the transcripts of all the audios and videos of Ted talks uploaded at Ted.com. Let’s start the task of creating this recommendation system by importing the necessary Python libraries and the dataset:

DATASET- https://drive.google.com/file/d/19DqXLP96CoOF4Gjz8_UZ2la9_3uylgMf/view

In [1]:
import numpy as np
import pandas as pd

In [3]:
data = pd.read_csv("ted_talks.csv")

In [4]:
data.head()

Unnamed: 0,transcript,url
0,Good morning. How are you?(Laughter)It's been ...,https://www.ted.com/talks/ken_robinson_says_sc...
1,"Thank you so much, Chris. And it's truly a gre...",https://www.ted.com/talks/al_gore_on_averting_...
2,"(Music: ""The Sound of Silence,"" Simon & Garfun...",https://www.ted.com/talks/david_pogue_says_sim...
3,If you're here today — and I'm very happy that...,https://www.ted.com/talks/majora_carter_s_tale...
4,"About 10 years ago, I took on the task to teac...",https://www.ted.com/talks/hans_rosling_shows_t...


The dataset contains the transcript of the ted talks and the URL of that content. So to continue with this dataset, I will create a new column as a title by separating the title from the URL

In [6]:
data["title"] = data["url"].map(lambda x:x.split("/")[-1])

In [7]:
data['title']

0             ken_robinson_says_schools_kill_creativity\n
1                    al_gore_on_averting_climate_crisis\n
2                     david_pogue_says_simplicity_sells\n
3                 majora_carter_s_tale_of_urban_renewal\n
4       hans_rosling_shows_the_best_stats_you_ve_ever_...
                              ...                        
2462    duarte_geraldino_what_we_re_missing_in_the_deb...
2463    armando_azua_bustos_the_most_martian_place_on_...
2464    radhika_nagpal_what_intelligent_machines_can_l...
2465    theo_e_j_wilson_a_black_man_goes_undercover_in...
2466    karoliina_korppoo_how_a_video_game_might_help_...
Name: title, Length: 2467, dtype: object

As I stated in the beginning that this recommender system has to be purely based on the content rather than the data of the user so here I will first prepare this dataset and then let’s use cosine similarity to measure the similarities between different Ted talks:

In [8]:
from sklearn.feature_extraction import text

In [13]:
ted_talks = data["transcript"].tolist()

In [14]:
bi_tfidf = text.TfidfVectorizer(input=ted_talks, stop_words="english", ngram_range=(1,2))
bi_matrix = bi_tfidf.fit_transform(ted_talks)

In [15]:
uni_tfidf = text.TfidfVectorizer(input=ted_talks, stop_words="english")
uni_matrix = uni_tfidf.fit_transform(ted_talks)

In [16]:
from sklearn.metrics.pairwise import cosine_similarity

In [17]:
bi_sim = cosine_similarity(bi_matrix)
uni_sim = cosine_similarity(uni_matrix)

Now that last step will be to create a Python function to recommend ted talks based on their content. 

In [18]:
def recommend_ted_talks(x):
    return ".".join(data["title"].loc[x.argsort()[-5:-1]])
    
data["ted_talks_uni"] = [recommend_ted_talks(x) for x in uni_sim]
data["ted_talks_bi"] = [recommend_ted_talks(x) for x in bi_sim]
print(data['ted_talks_uni'].str.replace("_", " ").str.upper().str.strip().str.split("\n")[1])

['RORY BREMNER S ONE MAN WORLD SUMMIT', '.ALICE BOWS LARKIN WE RE TOO LATE TO PREVENT CLIMATE CHANGE HERE S HOW WE ADAPT', '.TED HALSTEAD A CLIMATE SOLUTION WHERE ALL SIDES CAN WIN', '.AL GORE S NEW THINKING ON THE CLIMATE CRISIS']
