# Ted Talks - Recommender System

### Meet the data
The data we will use comes in as a tabular flat file, the transcript for each talk is stored in a row across a column named transcript. Here is how the file looks like

In [14]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

df1 = pd.read_csv('ted_main.csv')

In [4]:
df1.head()

Unnamed: 0,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,ratings,related_talks,speaker_occupation,tags,title,url,views
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 19645}, {...","[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...",Author/educator,"['children', 'creativity', 'culture', 'dance',...",Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_sc...,47227110
1,265,With the same humor and humanity he exuded in ...,977,TED2006,1140825600,43,Al Gore,Al Gore: Averting the climate crisis,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 544}, {'i...","[{'id': 243, 'hero': 'https://pe.tedcdn.com/im...",Climate advocate,"['alternative energy', 'cars', 'climate change...",Averting the climate crisis,https://www.ted.com/talks/al_gore_on_averting_...,3200520
2,124,New York Times columnist David Pogue takes aim...,1286,TED2006,1140739200,26,David Pogue,David Pogue: Simplicity sells,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 964}, {'i...","[{'id': 1725, 'hero': 'https://pe.tedcdn.com/i...",Technology columnist,"['computers', 'entertainment', 'interface desi...",Simplicity sells,https://www.ted.com/talks/david_pogue_says_sim...,1636292
3,200,"In an emotionally charged talk, MacArthur-winn...",1116,TED2006,1140912000,35,Majora Carter,Majora Carter: Greening the ghetto,1,1151367060,"[{'id': 3, 'name': 'Courageous', 'count': 760}...","[{'id': 1041, 'hero': 'https://pe.tedcdn.com/i...",Activist for environmental justice,"['MacArthur grant', 'activism', 'business', 'c...",Greening the ghetto,https://www.ted.com/talks/majora_carter_s_tale...,1697550
4,593,You've never seen data presented like this. Wi...,1190,TED2006,1140566400,48,Hans Rosling,Hans Rosling: The best stats you've ever seen,1,1151440680,"[{'id': 9, 'name': 'Ingenious', 'count': 3202}...","[{'id': 2056, 'hero': 'https://pe.tedcdn.com/i...",Global health expert; data visionary,"['Africa', 'Asia', 'Google', 'demo', 'economic...",The best stats you've ever seen,https://www.ted.com/talks/hans_rosling_shows_t...,12005869


In [12]:
df1_new = df1[['comments', 'event', 'main_speaker','title', 'speaker_occupation', 'views', 'published_date']]

### Recommendation system based on Tags

Clean the **Tags**

In [8]:
import re

In [18]:
def clean_tags(x):
    letter_only=re.sub("[^a-zA-Z]", " ", x)
    return ' '.join(letter_only.split()).lower()

In [15]:
df1_new['tags']=df1['tags']
df1_new.tags=df1_new.tags.astype('str')

In [19]:
df1_new['tags'] = df1_new['tags'].apply(clean_tags)

In [21]:
df1_new['tags'].head()

0    children creativity culture dance education pa...
1    alternative energy cars climate change culture...
2    computers entertainment interface design media...
3    macarthur grant activism business cities envir...
4    africa asia google demo economics global devel...
Name: tags, dtype: object

### Convert to sparse matrix using count vectorize

In [23]:
from sklearn.feature_extraction.text import CountVectorizer

In [27]:
cv=CountVectorizer()
cv_tags=cv.fit_transform(df1_new['tags'])
df_genres=pd.DataFrame(cv_tags.todense(), columns=cv.get_feature_names(), index=df1_new['title'])

### Get Similarity Scores using cosine similarity

The cosine similarity will become a means for us to find out how similar the one Ted Talk is to the other.

In [31]:
from sklearn.metrics.pairwise import cosine_similarity

In [32]:
cos = cosine_similarity(cv_tags)

Let's check how we faired, by examining the recommendations. Let's pickup, any Ted Talk Title from, the list, let's say we pick up:

#### Recommendation using Title

In [46]:
def get_recommendation_based_title(x):
    index_to_search = df1_new[df1_new['title']==x].index[0]
    series_similar=pd.Series(cos[index_to_search])
    index_similar=series_similar.sort_values(ascending=False).head(10).index
    return df1_new.loc[index_similar]

In [47]:
get_recommendation_based_title('Do schools kill creativity?')

Unnamed: 0,comments,event,main_speaker,title,speaker_occupation,views,published_date,tags
0,4553,TED2006,Ken Robinson,Do schools kill creativity?,Author/educator,47227110,1151367060,children creativity culture dance education pa...
1420,818,TEDxMidAtlantic,Colin Powell,Kids need structure,Former U.S. Secretary of State,1485801,1358957905,tedx children culture education parenting stud...
1498,120,TED Talks Education,Pearl Arredondo,"My story, from gangland daughter to star teacher",Teacher,1059278,1368025500,children education teaching
280,174,LIFT 2007,Sugata Mitra,Kids can teach themselves,Education researcher,1486853,1219798800,children cities culture education global issue...
830,301,TEDxMidAtlantic,Diana Laufenberg,How to learn? From mistakes,Educator,1882742,1292431320,tedx children culture education
654,795,TED2010,Adora Svitak,What adults can learn from kids,Child prodigy,4782854,1270142220,children creativity education intelligence
1493,632,TED Talks Education,Rita Pierson,Every kid needs a champion,Educator,7469445,1367589737,children education motivation teaching
1588,313,TED2013,James Flynn,Why our IQ levels are higher than our grandpar...,Moral philosopher,2991225,1380208246,culture education
1033,252,TEDGlobal 2011,Alison Gopnik,What do babies think?,Child development psychologist,2901853,1318259576,brain children culture education psychology
1023,155,TEDGlobal 2011,Geoff Mulgan,A short intro to the Studio School,Social commentator,667985,1317136687,creativity culture design education work


#### Recommendation using Speakers

In [44]:
def get_recommendation_based_speakers(x):
    index_to_search = df1_new[df1_new['main_speaker']==x].index[0]
    series_similar=pd.Series(cos[index_to_search])
    index_similar=series_similar.sort_values(ascending=False).head(10).index
    return df1_new.loc[index_similar]

In [45]:
get_recommendation_based_speakers('Hans Rosling')

Unnamed: 0,comments,event,main_speaker,title,speaker_occupation,views,published_date,tags
4,593,TED2006,Hans Rosling,The best stats you've ever seen,Global health expert; data visionary,12005869,1151440680,africa asia google demo economics global devel...
117,261,TED2007,Hans Rosling,New insights on poverty,Global health expert; data visionary,3243784,1182762720,africa asia google economics global developmen...
502,122,TED@State,Hans Rosling,Let my dataset change your mindset,Global health expert; data visionary,1471039,1251334800,africa asia data global development global iss...
1818,177,TEDSalon Berlin 2014,Hans and Ola Rosling,How not to be ignorant about the world,Global health expert; data visionary,3673455,1410449403,global issues health statistics
122,136,TED2007,Emily Oster,Flip your thinking on AIDS in Africa,Assumption-busting economist,854967,1184221140,aids africa economics global issues health sci...
127,97,TEDGlobal 2007,George Ayittey,Africa's cheetahs versus hippos,Economist,648234,1185791520,africa business corruption economics entrepren...
1861,100,TEDGlobal 2014,Michael Green,What the Social Progress Index can reveal abou...,Social progress expert,1132771,1415721331,economics global issues policy statistics
1094,119,TEDxCanberra,Thomas Pogge,Medicine for the 99 percent,Philosopher,242251,1324220484,tedx economics global issues health
1607,207,TEDGlobal 2013,Charles Robertson,Africa's next boom,Emerging-markets economist,1204089,1382454693,africa business economics global issues
234,139,TED2008,Paul Collier,"The ""bottom billion""",Economist,990220,1211986800,africa activism business economics global deve...


#### Recommendation using Speaker Ocuupation

In [70]:
def get_recommendation_based_speaker_occupation(x):
    index_to_search = df1_new[df1_new['speaker_occupation']==x].index[0]
    series_similar = pd.Series(cos[index_to_search])
    index_similar = series_similar.sort_values(ascending = False).head(10).index
    return df1_new.loc[index_similar] 

In [71]:
get_recommendation_based_speaker_occupation('Artist')

Unnamed: 0,comments,event,main_speaker,title,speaker_occupation,views,published_date,tags
78,84,TED2003,Vik Muniz,"Art with wire, sugar, chocolate and string",Artist,1149090,1175731860,brazil animation art creativity design illusion
959,281,TED2011,Shea Hembrey,How I became 100 artists,Artist and curator,1486880,1308061740,art creativity design storytelling
495,103,TEDGlobal 2009,Willard Wigan,Hold your breath for micro-sculpture,Micro-sculptor,640722,1249261200,art creativity
1152,181,INK Conference,Shilo Shiv Suleman,Using tech to enable dreaming,Artist,576918,1329929988,art creativity
1312,223,TEDGlobal 2012,Kirby Ferguson,Embrace the remix,Filmmaker and Remixer,1356163,1344610939,art creativity
382,21,TED1998,Milton Glaser,Using design to make ideas new,Graphic designer,382985,1234314000,art communication creativity culture design
801,144,TEDGlobal 2010,Miwa Matreyek,Glorious visions in animation and performance,Multimedia artist,829399,1288349340,animation art design performance technology
351,58,TED2001,Eva Zeisel,The playful search for beauty,Designer,438972,1228803420,art creativity design exploration play
1574,257,TEDGlobal 2013,Alexa Meade,Your body is my canvas,Visual artist,2682427,1378480353,art creativity design painting photography
1190,98,TED2012,Marco Tempest,A magical tale (with augmented reality),Techno-illusionist,1394503,1333119630,art design entertainment illusion magic techno...


To get the similarity tags we are using count vectorizer to make the binary, and then transform it into numbers that can be count based on their cosine similarity to get their nearest tags.