
# <font color=red>**TED Talks**</font>

[Kaggle Competition](https://www.kaggle.com/rounakbanik/ted-talks)

[GitHub - Carlos Scovino](https://github.com/cscovino/TED-Talks-Analysis)


###  <font color=blue> **Archivo**:</font>  ted_main.csv


| Tipo | Columna | Descripcion |
| :--- | :--- | :--- |
| Numeric | comments | The number of first level comments made on the talk |
| Text | description | A blurb of what the talk is about |
| Numeric | duration | The duration of the talk in seconds |
| Text | event | The TED/TEDx event where the talk took place |
| Timestamp | film_date | The Unix timestamp of the filming |
| Numeric | languages | The number of languages in which the talk is available |
| Text | main_speaker | The first named speaker of the talk |
| Text | name | The official name of the TED Talk. Includes the title and the speaker. |
| Numeric | num_speaker | The number of speakers in the talk |
| Timestamp | published_date | The Unix timestamp for the publication of the talk on TED.com |
| Text | ratings | A stringified dictionary of the various ratings given to the talk (inspiring, fascinating, jaw dropping, etc.) |
| Text | related_talks | A list of dictionaries of recommended talks to watch next |
| Text | speaker_occupation | The occupation of the main speaker |
| Text | tags | The themes associated with the talk |
| Text | title | The title of the talk |
| Text | url | The URL of the talk |
| Numeric | views | The number of views on the talk |

###  <font color=blue> **Archivo**:</font>  transcripts.csv

| Tipo | Columna | Descripcion |
| :--- | :--- | :--- |
| Text | transcript | The official English transcript of the talk. |
| Text | url | The URL of the talk |

<div class="alert alert-block alert-info">

# Imports

</div>

In [139]:
import pandas as pd

import ast
import datetime

<div class="alert alert-block alert-info">

# Dataframes

</div>

In [140]:
df_main = pd.read_csv('./dataset/ted_main.csv',sep=",",quotechar='"',quoting=1)
df_transcript = pd.read_csv('./dataset/transcripts.csv',sep=",",quotechar='"',quoting=1)

<div class="alert alert-block alert-info">

# Describes

</div>

### Data: Main

In [141]:
df_main.columns

Index(['comments', 'description', 'duration', 'event', 'film_date',
       'languages', 'main_speaker', 'name', 'num_speaker', 'published_date',
       'ratings', 'related_talks', 'speaker_occupation', 'tags', 'title',
       'url', 'views'],
      dtype='object')

In [142]:
df_main.count()[0]

2550

In [143]:
df_main.describe()

Unnamed: 0,comments,duration,film_date,languages,num_speaker,published_date,views
count,2550.0,2550.0,2550.0,2550.0,2550.0,2550.0,2550.0
mean,191.562353,826.510196,1321928000.0,27.326275,1.028235,1343525000.0,1698297.0
std,282.315223,374.009138,119739100.0,9.563452,0.207705,94640090.0,2498479.0
min,2.0,135.0,74649600.0,0.0,1.0,1151367000.0,50443.0
25%,63.0,577.0,1257466000.0,23.0,1.0,1268463000.0,755792.8
50%,118.0,848.0,1333238000.0,28.0,1.0,1340935000.0,1124524.0
75%,221.75,1046.75,1412964000.0,33.0,1.0,1423432000.0,1700760.0
max,6404.0,5256.0,1503792000.0,72.0,5.0,1506092000.0,47227110.0


In [144]:
df_main.head(2)

Unnamed: 0,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,ratings,related_talks,speaker_occupation,tags,title,url,views
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 19645}, {...","[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...",Author/educator,"['children', 'creativity', 'culture', 'dance',...",Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_sc...,47227110
1,265,With the same humor and humanity he exuded in ...,977,TED2006,1140825600,43,Al Gore,Al Gore: Averting the climate crisis,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 544}, {'i...","[{'id': 243, 'hero': 'https://pe.tedcdn.com/im...",Climate advocate,"['alternative energy', 'cars', 'climate change...",Averting the climate crisis,https://www.ted.com/talks/al_gore_on_averting_...,3200520


<div class="alert alert-block alert-success">

## Ratings

</div>

In [145]:
lst_ratings = ["Funny", "Beautiful", "Ingenious",
               "Courageous", "Longwinded", "Confusing",
               "Informative", "Fascinating", "Unconvincing",
               "Persuasive", "Jaw-dropping", "OK",
               "Obnoxious", "Inspiring"]

In [146]:
df_main['ratings'][2]

"[{'id': 7, 'name': 'Funny', 'count': 964}, {'id': 3, 'name': 'Courageous', 'count': 45}, {'id': 9, 'name': 'Ingenious', 'count': 183}, {'id': 1, 'name': 'Beautiful', 'count': 60}, {'id': 21, 'name': 'Unconvincing', 'count': 104}, {'id': 11, 'name': 'Longwinded', 'count': 78}, {'id': 8, 'name': 'Informative', 'count': 395}, {'id': 10, 'name': 'Inspiring', 'count': 230}, {'id': 22, 'name': 'Fascinating', 'count': 166}, {'id': 2, 'name': 'Confusing', 'count': 27}, {'id': 25, 'name': 'OK', 'count': 146}, {'id': 24, 'name': 'Persuasive', 'count': 230}, {'id': 23, 'name': 'Jaw-dropping', 'count': 54}, {'id': 26, 'name': 'Obnoxious', 'count': 142}]"

###  <font color=blue> **DataFrame**:</font>  df_ratings

| ix | rating_funny | rating_beautiful | rating_ingenious | ... | rating_<14> | 
| :--- | :--- | :--- | :--- | --- | --- |
| 0 | 19645 | 4573 | 6073 | ... | ... |
| 1 | 544 | 58 | 56 | ... | ... |
| 2 | 964 | 60 | 183 | ... | ... | 
| ... | ...| ... | ... | ... | ... |
| 2549 | ...| ... | ... | ... | ... |

In [147]:
all_ratings = []

df_ratings = df_main['ratings']

for ix, ratings in df_ratings.items():
    
    rec_rating = {}
    rec_rating['ix'] = ix
    
    # String to Python Data Type
    ratings_lst = ast.literal_eval(ratings)

    for rating in ratings_lst:   
        rec_rating["rating_"+rating['name'].lower()]=rating['count']

    all_ratings.append(rec_rating)

In [148]:
df_ratings =  pd.DataFrame(all_ratings,
                           index=[ dic['ix'] for dic in all_ratings],
                           columns=["rating_funny","rating_beautiful","rating_ingenious",
                                    "rating_courageous","rating_longwinded","rating_confusing",
                                    "rating_informative","rating_fascinating","rating_unconvincing",
                                    "rating_persuasive","rating_jaw-dropping","rating_ok",
                                    "rating_obnoxious","rating_inspiring"] )

In [149]:
len(df_ratings)

2550

In [150]:
df_ratings.shape

(2550, 14)

In [151]:
df_ratings.head()

Unnamed: 0,rating_funny,rating_beautiful,rating_ingenious,rating_courageous,rating_longwinded,rating_confusing,rating_informative,rating_fascinating,rating_unconvincing,rating_persuasive,rating_jaw-dropping,rating_ok,rating_obnoxious,rating_inspiring
0,19645,4573,6073,3253,387,242,7346,10581,300,10704,4439,1174,209,24924
1,544,58,56,139,113,62,443,132,258,268,116,203,131,413
2,964,60,183,45,78,27,395,166,104,230,54,146,142,230
3,59,291,105,760,53,32,380,132,36,460,230,85,35,1070
4,1390,942,3202,318,110,72,5433,4606,67,2542,3736,248,61,2893


<div class="alert alert-block alert-success">

## Tags

</div>

###  <font color=blue> **DataFrame**:</font>  df_tags

| ix | tag_children | tag_creativity | tag_culture | ... | tag_other |
| :--- | :--- | :--- | :--- | --- | --- |
| 0 | 1 | 1 | 1 | ... | 1 |
| 1 | 0 | 0 | 0 | ... | 3 |
| 2 | 0 | 0 | 0 | ... | 2 |
| ... | ...| ... | ... | ... | ... |
| 2549 | ...| ... | ... | ... | ... |

In [152]:
df_main['tags'].head()

0    ['children', 'creativity', 'culture', 'dance',...
1    ['alternative energy', 'cars', 'climate change...
2    ['computers', 'entertainment', 'interface desi...
3    ['MacArthur grant', 'activism', 'business', 'c...
4    ['Africa', 'Asia', 'Google', 'demo', 'economic...
Name: tags, dtype: object

In [185]:
lst_tags = ["technology", "science", "design", "business", "collaboration", "innovation", "social_change",
            "health", "nature", "environment", "future", "communication", "activism", "children",
            "personal_growth", "humanity", "society", "identity", "community", "culture", "global_issues",
            "entertainment", "art", "politics", "economics", "religion"]

In [186]:
all_tags = []
df_tags = df_main['tags']

for ix, tags in df_tags.items():

    # --- Empty Dict ---
    rec_tag = {}
    rec_tag['ix'] = ix
    for tag in lst_tags:
        rec_tag["tag_"+tag.lower().replace(" ","_")]=0
    rec_tag['tag_other']=0
    
    # --- Update Dict ---
    tags = ast.literal_eval(tags)
    for tag in tags:
        
        if "tag_"+tag.lower() in rec_tag.keys():
            rec_tag["tag_"+tag.lower().replace(" ","_")]+=1
        else:
            rec_tag["tag_other"]+=1
            
    all_tags.append(rec_tag)

In [190]:
columns=[ "tag_"+tag.lower() for tag in lst_tags]
columns.append("tag_other")
print(columns)

['tag_technology', 'tag_science', 'tag_design', 'tag_business', 'tag_collaboration', 'tag_innovation', 'tag_social_change', 'tag_health', 'tag_nature', 'tag_environment', 'tag_future', 'tag_communication', 'tag_activism', 'tag_children', 'tag_personal_growth', 'tag_humanity', 'tag_society', 'tag_identity', 'tag_community', 'tag_culture', 'tag_global_issues', 'tag_entertainment', 'tag_art', 'tag_politics', 'tag_economics', 'tag_religion', 'tag_other']


In [193]:
df_tags =  pd.DataFrame(all_tags,
                        index=[ row['ix'] for row in all_tags ],
                        columns=columns)

In [194]:
df_tags.head()

Unnamed: 0,tag_technology,tag_science,tag_design,tag_business,tag_collaboration,tag_innovation,tag_social_change,tag_health,tag_nature,tag_environment,...,tag_identity,tag_community,tag_culture,tag_global_issues,tag_entertainment,tag_art,tag_politics,tag_economics,tag_religion,tag_other
0,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,5
1,1,1,0,0,0,0,0,0,0,1,...,0,0,1,0,0,0,0,0,0,5
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,7
3,0,0,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,1,0,0,5
4,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,1,0,9


In [195]:
len(df_tags)

2550

In [196]:
df_tags.shape

(2550, 27)

<div class="alert alert-block alert-success">

## Film & Published Dates

</div>

In [24]:
f = lambda x: datetime.datetime.fromtimestamp(int(x)).strftime('%d-%m-%Y')

df_main['film_date'] = df_main['film_date'].apply(f)
df_main['published_date'] = df_main['published_date'].apply(f)

In [25]:
df_main.head()

Unnamed: 0,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,ratings,related_talks,speaker_occupation,tags,title,url,views
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,24-02-2006,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,26-06-2006,"[{'id': 7, 'name': 'Funny', 'count': 19645}, {...","[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...",Author/educator,"['children', 'creativity', 'culture', 'dance',...",Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_sc...,47227110
1,265,With the same humor and humanity he exuded in ...,977,TED2006,24-02-2006,43,Al Gore,Al Gore: Averting the climate crisis,1,26-06-2006,"[{'id': 7, 'name': 'Funny', 'count': 544}, {'i...","[{'id': 243, 'hero': 'https://pe.tedcdn.com/im...",Climate advocate,"['alternative energy', 'cars', 'climate change...",Averting the climate crisis,https://www.ted.com/talks/al_gore_on_averting_...,3200520
2,124,New York Times columnist David Pogue takes aim...,1286,TED2006,23-02-2006,26,David Pogue,David Pogue: Simplicity sells,1,26-06-2006,"[{'id': 7, 'name': 'Funny', 'count': 964}, {'i...","[{'id': 1725, 'hero': 'https://pe.tedcdn.com/i...",Technology columnist,"['computers', 'entertainment', 'interface desi...",Simplicity sells,https://www.ted.com/talks/david_pogue_says_sim...,1636292
3,200,"In an emotionally charged talk, MacArthur-winn...",1116,TED2006,25-02-2006,35,Majora Carter,Majora Carter: Greening the ghetto,1,26-06-2006,"[{'id': 3, 'name': 'Courageous', 'count': 760}...","[{'id': 1041, 'hero': 'https://pe.tedcdn.com/i...",Activist for environmental justice,"['MacArthur grant', 'activism', 'business', 'c...",Greening the ghetto,https://www.ted.com/talks/majora_carter_s_tale...,1697550
4,593,You've never seen data presented like this. Wi...,1190,TED2006,21-02-2006,48,Hans Rosling,Hans Rosling: The best stats you've ever seen,1,27-06-2006,"[{'id': 9, 'name': 'Ingenious', 'count': 3202}...","[{'id': 2056, 'hero': 'https://pe.tedcdn.com/i...",Global health expert; data visionary,"['Africa', 'Asia', 'Google', 'demo', 'economic...",The best stats you've ever seen,https://www.ted.com/talks/hans_rosling_shows_t...,12005869


### Data: Transcripts

In [19]:
df_transcript.count()[0]

2467

In [6]:
df_transcript.columns

Index(['transcript', 'url'], dtype='object')

In [17]:
df_transcript.head()

Unnamed: 0,transcript,url
0,Good morning. How are you?(Laughter)It's been ...,https://www.ted.com/talks/ken_robinson_says_sc...
1,"Thank you so much, Chris. And it's truly a gre...",https://www.ted.com/talks/al_gore_on_averting_...
2,"(Music: ""The Sound of Silence,"" Simon & Garfun...",https://www.ted.com/talks/david_pogue_says_sim...
3,If you're here today — and I'm very happy that...,https://www.ted.com/talks/majora_carter_s_tale...
4,"About 10 years ago, I took on the task to teac...",https://www.ted.com/talks/hans_rosling_shows_t...
