# Instructions

In this assignment, you will write a Python script to download user interaction data from YouTube for each 
“youtubeId” provided in the “vdoLinks.csv” file. For each of the YouTube video id, you will extract the 
following information:
1. 100 comments
2. Description of the video
3. View Count
4. Like Count
5. Dislike Count
6. Comment Count
7. Duration of the video
8. Favorite Count


While extracting the data, if any of the “youtubeId” does not work, please ignore that ID and proceed with the following “youtubeId”.

After the data extraction is done, then please find out the following:
1. A list of the top-10 videos based on the total views
2. A list of the bottom-10 videos based on the total views
3. The most liked video
4. The least liked video
5. The video with the highest duration
6. Apply sentiment analysis on the downloaded comments for each of the videos. For sentiment analysis, you can use VADER or any other tool of your choice. Your program should list the sentiments score for each of the movies

Data from YouTube can be extracted in many ways. You can use any of the available approaches. However, we have practiced one method for the task in our Lab session that might be handy.

# Submissions

To submit, please do the following:
1. Write a short report on the assignment. (70 Marks)
- The report should have the following:
    - Data collection procedure
    - Data cleansing steps
    - Analysis Steps
    - Using a bar diagram, show the top-10 videos based on total views
    - Using a bar diagram, show the bottom-10 videos based on total views (NB. In the bottom 10 list, do not include the videos that have zero
    -  The title of the video that has the most likes
    - The title of the video that has the least likes
    - The title of the video that has the highest duration
    - Using a bar diagram, show the top-10 videos that have the highest positive sentiment  scores, which you calculated from the comments
    - Using a bar diagram, show the bottom-10 videos that have the highest negative sentiment scores, which you calculated from the comments

_**Note:** Use appropreate additional visualization techniques to present your findings if required._
<br><br><br>
**Please upload the report and the Python file**

**Presentation (30 Marks)**

Presentation time: 15 minutes

Presentation is a two-step procedure:
- Step-1: Present your slides describing your work
- Step-2: Demonstrate your python solution

# Code

## Import the libraries

In [52]:
import pandas as pd
from googleapiclient.discovery import build
from dotenv import load_dotenv

## Data extraction

In [58]:
df = pd.read_csv('./vdoLinks.csv', header=0)
df.head(3)

Unnamed: 0,youtubeId,movieId,title
0,K26_sDKnvMU,1,Toy Story (1995)
1,3LPANjHlPxo,2,Jumanji (1995)
2,rEnOoWs3FuA,3,Grumpier Old Men (1995)


In [3]:
df.shape

(25623, 3)

In [4]:
def get_video_data(video_id, n_comments):
	try:
		youtube = build('youtube', 'v3', developerKey=API_KEY)
		video_comments = youtube.commentThreads().list(part='snippet', videoId=video_id, maxResults=n_comments, order='time').execute()
    	
		video_stats = youtube.videos().list(part="snippet, statistics, contentDetails", id=video_id).execute()
		
		description = video_stats['items'][0]['snippet']['description']
		n_view = video_stats['items'][0]['statistics']['viewCount']
		n_like = video_stats['items'][0]['statistics']['likeCount']
		n_comment = video_stats['items'][0]['statistics']['commentCount']
		duration = video_stats['items'][0]['contentDetails']['duration']
		n_favorite = video_stats['items'][0]['statistics']['favoriteCount']
    
		comments = []
		for item in video_comments['items']:
			comments.append(item['snippet']['topLevelComment']['snippet']['textDisplay'])
		
		return {'id': video_id,
				'comments': comments,
		 		'description': description,
				'n_view': n_view, 
				'n_like': n_like, 
				'n_comment': n_comment,
				'duration': duration,
				'n_favorite': n_favorite
				 }

	except Exception as e:
		print(f'Error while getting the data from video: {video_id}')
		print(f'Error: {e} \n')
				
		return None

In [5]:
len(df['youtubeId'])

25623

It was needed to split the data in sets due to API restrictions (for each set it is necessary to create a new project on YouTube API)

In [6]:
df.reset_index(drop=True, inplace=True)

video_ids_1 = df['youtubeId'][:1000]
video_ids_2 = df['youtubeId'][1000:2000]
video_ids_3 = df['youtubeId'][2000:3000]
video_ids_4 = df['youtubeId'][3000:4000]
video_ids_5 = df['youtubeId'][4000:5000]
video_ids_6 = df['youtubeId'][5000:6000]
video_ids_7 = df['youtubeId'][6000:7000]
video_ids_8 = df['youtubeId'][7000:8000]
video_ids_9 = df['youtubeId'][8000:9000]
video_ids_10 = df['youtubeId'][9000:10000]
video_ids_11 = df['youtubeId'][10000:11000]
video_ids_12 = df['youtubeId'][11000:12000]
video_ids_13 = df['youtubeId'][12000:13000]
video_ids_14 = df['youtubeId'][13000:14000]
video_ids_15 = df['youtubeId'][14000:15000]
video_ids_16 = df['youtubeId'][15000:16000]
video_ids_17 = df['youtubeId'][16000:17000]
video_ids_18 = df['youtubeId'][17000:18000]
video_ids_19 = df['youtubeId'][18000:19000]
video_ids_20 = df['youtubeId'][19000:20000]
video_ids_21 = df['youtubeId'][20000:21000]
video_ids_22 = df['youtubeId'][21000:22000]
video_ids_23 = df['youtubeId'][22000:23000]
video_ids_24 = df['youtubeId'][23000:24000]
video_ids_25 = df['youtubeId'][24000:25000]
video_ids_26 = df['youtubeId'][25000:]

In [7]:
lists = [video_ids_1, video_ids_2, video_ids_3, video_ids_4, video_ids_5,
        video_ids_6, video_ids_7, video_ids_8, video_ids_9, video_ids_10,
        video_ids_11, video_ids_12, video_ids_13, video_ids_14, video_ids_15,
        video_ids_16, video_ids_17, video_ids_18, video_ids_19, video_ids_20,
        video_ids_21, video_ids_22, video_ids_23, video_ids_24, video_ids_25, 
        video_ids_26]

In [9]:
len_sum = 0 
for l in lists:
    len_sum += len(l)

print(f'Total length: {len_sum}')

Total length: 25623


In [115]:
data = [get_video_data(id, 100) for id in video_ids_26]

Error while getting the data from video: TKwhBk4d7M0
Error: 'likeCount' 

Error while getting the data from video: a8Ji8YfftNE
Error: <HttpError 404 when requesting https://youtube.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=a8Ji8YfftNE&maxResults=100&order=time&key=AIzaSyCmh930r4v580hBn50Ex0-trb3wmqZZ8JE&alt=json returned "The video identified by the <code><a href="/youtube/v3/docs/commentThreads/list#videoId">videoId</a></code> parameter could not be found.". Details: "[{'message': 'The video identified by the <code><a href="/youtube/v3/docs/commentThreads/list#videoId">videoId</a></code> parameter could not be found.', 'domain': 'youtube.commentThread', 'reason': 'videoNotFound', 'location': 'videoId', 'locationType': 'parameter'}]"> 

Error while getting the data from video: oDE6J3iut28
Error: <HttpError 404 when requesting https://youtube.googleapis.com/youtube/v3/commentThreads?part=snippet&videoId=oDE6J3iut28&maxResults=100&order=time&key=AIzaSyCmh930r4v580hBn5

In [117]:
cnt = 0
for dt in data:
    if dt is None:
        cnt += 1
print(f'Number of "None" values: {cnt}')

Number of "None" values: 231


In [119]:
import pickle

# with open('raw_data/data_26.pkl', 'wb') as f:
    # pickle.dump(data, f)

# with open('./raw_data/data_1.pkl', 'rb') as f:
    # data = pickle.load(f)

### Join all data

In [7]:
datasets = [f'data_{i}.pkl' for i in range(1, 27)]
datasets[22:]

['data_23.pkl', 'data_24.pkl', 'data_25.pkl', 'data_26.pkl']

In [45]:
import pickle

data = []
for dataset in datasets:
    with open(f'./raw_data/{dataset}', 'rb') as f:
        dt = pickle.load(f)
    for d in dt:
        data.append(d)

print(f'len(data): {len(data)}')

len(data): 25623


In [48]:
data

[None,
 None,
 {'id': 'rEnOoWs3FuA',
  'comments': ['Buena película de comedia romántica',
   '<a href="https://www.youtube.com/watch?v=rEnOoWs3FuA&amp;t=1m36s">1:36</a> <b>GRUMPY⬅️ER</b> <br>         <b>GRUMPIER</b>',
   'I&#39;m watching this now, it never gets old🤣',
   'Canaloni !!!!',
   'I was looking for halloween themed movies and stumbled over this... is there anything halloween related in this film?<br><br><br>Seen it years ago but can&#39;t remember squat',
   'welp, been 3 years since anybody have commented',
   'Well, there was supposed to be another sequel. From what I heard, the two guys go to Italy (to meet their new relatives, I suppose) &amp; wind up meeting Italian versions of themselves.',
   'i loved the first one is it like the first one or not??'],
  'description': 'The more things change, the more they stay the same in Wabasha, Minnesota. The uncatchable fish named Catfish Hunter grows fatter. The wisecracks, zingers and put downs pile up like freshly raked leav

### Removing the None values

In [49]:
data_without_none = [dt for dt in data if dt]
print(f'len(data_without_none): {len(data_without_none)}')

len(data_without_none): 16561


In [50]:
data_without_none[0]

{'id': 'rEnOoWs3FuA',
 'comments': ['Buena película de comedia romántica',
  '<a href="https://www.youtube.com/watch?v=rEnOoWs3FuA&amp;t=1m36s">1:36</a> <b>GRUMPY⬅️ER</b> <br>         <b>GRUMPIER</b>',
  'I&#39;m watching this now, it never gets old🤣',
  'Canaloni !!!!',
  'I was looking for halloween themed movies and stumbled over this... is there anything halloween related in this film?<br><br><br>Seen it years ago but can&#39;t remember squat',
  'welp, been 3 years since anybody have commented',
  'Well, there was supposed to be another sequel. From what I heard, the two guys go to Italy (to meet their new relatives, I suppose) &amp; wind up meeting Italian versions of themselves.',
  'i loved the first one is it like the first one or not??'],
 'description': 'The more things change, the more they stay the same in Wabasha, Minnesota. The uncatchable fish named Catfish Hunter grows fatter. The wisecracks, zingers and put downs pile up like freshly raked leaves. And GRUMPY OLD MEN b

### Define the dataframe

In [56]:
df_data_without_none = pd.DataFrame(data_without_none)
df_data_without_none.head(3)

Unnamed: 0,id,comments,description,n_view,n_like,n_comment,duration,n_favorite
0,rEnOoWs3FuA,"[Buena película de comedia romántica, <a href=...","The more things change, the more they stay the...",173631,218,13,PT1M51S,0
1,2GfZl4kuVNI,[I loved this movie when I was younger because...,"Director: Michael Mann.\nCast: Al Pacino, Robe...",1159733,5955,547,PT2M28S,0
2,-C-xXZyX2zU,"[R.I.P. Brad Renfro, Blake Heron, Charles Rock...",A preview for this 90s disney movie. From the ...,174486,202,88,PT1M,0


In [57]:
df_data_without_none.isna().sum().sum()

0

In [62]:
df_data_without_none.columns = ['Video ID', 'Comments', 'Description', 
                                'Number of views', 'Number of likes', 'Number of comments',
                                'Duration', 'Number of favorites']
df_data_without_none.head(2)

Unnamed: 0,Video ID,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
0,rEnOoWs3FuA,"[Buena película de comedia romántica, <a href=...","The more things change, the more they stay the...",173631,218,13,PT1M51S,0
1,2GfZl4kuVNI,[I loved this movie when I was younger because...,"Director: Michael Mann.\nCast: Al Pacino, Robe...",1159733,5955,547,PT2M28S,0


In [66]:
df.head(2)

Unnamed: 0,Video ID,Movie ID,Movie title
0,K26_sDKnvMU,1,Toy Story (1995)
1,3LPANjHlPxo,2,Jumanji (1995)


In [63]:
df.columns = ['Video ID', 'Movie ID', 'Movie title']

In [64]:
final_df = pd.merge(df, df_data_without_none, on='Video ID')
final_df.head(3)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
0,rEnOoWs3FuA,3,Grumpier Old Men (1995),"[Buena película de comedia romántica, <a href=...","The more things change, the more they stay the...",173631,218,13,PT1M51S,0
1,2GfZl4kuVNI,6,Heat (1995),[I loved this movie when I was younger because...,"Director: Michael Mann.\nCast: Al Pacino, Robe...",1159733,5955,547,PT2M28S,0
2,-C-xXZyX2zU,8,Tom and Huck (1995),"[R.I.P. Brad Renfro, Blake Heron, Charles Rock...",A preview for this 90s disney movie. From the ...,174486,202,88,PT1M,0


### Saving the dataset

In [1]:
import pickle
import pandas as pd

# with open('df.pkl', 'wb') as f:
#     pickle.dump(final_df, f)

with open('df.pkl', 'rb') as f:
    df = pickle.load(f)

## Tasks

### List of top 10 videos based on the total views

In [6]:
df.sort_values("Number of views", ascending=False).head(3)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
11322,UFWtbX7rjs8,91339,"12 Dogs of Christmas, The (2005)",[],"It's Christmas time in 1930's Pittsburgh, but ...",99997,177,0,PT2M8S,0
8370,BrSj0uHnrXg,63280,"Winning of Barbara Worth, The (1926)",[I saw some shots of the town life in the Gold...,http://ilakid.altervista.org/\n\nAmazon: https...,9999,48,2,PT7M19S,0
14508,7Eiko_J6mfE,112655,No One Lives (2012),[Ruthless gang who kill and rob innocent peopl...,The red band trailer for No One Lives. Starrin...,999608,2058,368,PT2M1S,0


In [10]:
df.dtypes

Video ID               object
Movie ID                int64
Movie title            object
Comments               object
Description            object
Number of views        object
Number of likes        object
Number of comments     object
Duration               object
Number of favorites    object
dtype: object

The sort_values method was not working as expected due to 'Number of views' type

In [13]:
df[['Number of views', 'Number of likes', 'Number of comments']] = df[['Number of views', 'Number of likes', 'Number of comments']].astype(int)
df.dtypes

Video ID               object
Movie ID                int64
Movie title            object
Comments               object
Description            object
Number of views         int32
Number of likes         int32
Number of comments      int32
Duration               object
Number of favorites    object
dtype: object

In [16]:
df.sort_values("Number of views", ascending=False).head(10)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
14279,450p7goxZqg,111226,All of Me (2013),"[Sweet, Masterpiece song <br>2022🔥🔥🔥🔥, F***in ...",Official music video for “All of Me” by John L...,2076217622,11442023,237883,PT5M8S,0
15626,dNJdJIwCF_Y,120853,Fresh Guacamole (2012),"[mm yes, my favorite food<br><br><br><br><br><...",The 2013 Academy Award Nominated film by PES. ...,442365631,2574246,68802,PT1M41S,0
3493,j-V12tL78Mc,5364,Unfaithful (2002),[So sad how this happens in real life...I&#39;...,Unfaithful movie clips: http://j.mp/1ixkUnl\nB...,123294382,67752,4967,PT2M43S,0
13144,NVcSNnqRD0c,104076,"Smurfs 2, The (2013)",[This song reminds me a lot to I’m fed up from...,Britney Spears' official music video for 'Ooh ...,119915960,765593,58976,PT4M21S,0
16025,z5rRZdiu1UE,126106,Beastie Boys: Sabotage (1994),[Les beasty boys sont annoncés près de vous<br...,REMASTERED IN HD!\nRead the story behind Ill C...,99739887,608487,25665,PT3M4S,0
9512,SvGcGjIc16I,76189,Growth (2009),"[is it like Slither?, eww gross...but interes...","Here is the trailer for Growth, the latest fil...",67091038,30546,30,PT2M6S,0
10809,3H8bnKdf654,87520,Transformers: Dark of the Moon (2011),[Fun fact: Sentinel&#39;s facial features were...,Subscribe! http://YouTube.com/ClevverTV\n\nWat...,53112347,108727,22552,PT2M28S,0
14393,9ItBvH5J6ss,111921,The Fault in Our Stars (2014),"[DIL BECHARA♥️😙, Can someone polease tell me w...",The Fault In Our Stars | Official Trailer: Haz...,46719617,591032,40110,PT2M30S,0
15981,sdUUx5FdySs,125926,Kiwi! (2006),[Got reminded of this video cause I wanted to ...,"My Master's Thesis Animation, which I complete...",45192147,485939,68780,PT3M10S,0
12963,pdbI0Fn4COQ,103203,Eden (2012),[WTF is this??? Isn&#39;t there enough of th...,"""Arrestingly Supenseful"" (Variety) ""Nothing sh...",43445192,55446,3515,PT2M6S,0


In [18]:
top_10_views = df.sort_values("Number of views", ascending=False).head(10)['Movie title']
top_10_views

14279                         All of Me (2013)
15626                   Fresh Guacamole (2012)
3493                         Unfaithful (2002)
13144                     Smurfs 2, The (2013)
16025            Beastie Boys: Sabotage (1994)
9512                             Growth (2009)
10809    Transformers: Dark of the Moon (2011)
14393            The Fault in Our Stars (2014)
15981                             Kiwi! (2006)
12963                              Eden (2012)
Name: Movie title, dtype: object

### A list of the bottom 10 videos based on the total views

In [15]:
df.sort_values("Number of views", ascending=True).head(10)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
15025,OUf6CIW7C8Q,116283,And So It Is (1966),[],,8,0,0,PT1M9S,0
16341,EwM3gtL22E4,128856,Crockdale (2011),[],,14,0,0,PT57M54S,0
12899,HX7dx_w_Ol8,102860,Hilton! (2013),[],,17,0,0,PT18S,0
16453,sYpW4fvlf7s,129777,Chronic Town (2010),[],Reclusive indie pioneers The Long Afternoon pe...,34,0,0,PT3M34S,0
15133,hHJow-uF3A0,116945,Freedom (2009),[],Cabrainnnnnnnn,36,0,0,PT1M1S,0
13735,CA2QbzFUoQQ,107621,"Wooden Bridge, The (2012)",[],Cornerstone Wooden Bridge,43,0,0,PT58S,0
14165,9B-65BWKwQM,110314,"Me Two (Personne aux deux personnes, La) (2008)",[],warheads they are sour,50,0,0,PT56S,0
15550,tQplNZJL8XI,120208,Flesh and Blood (1922),[],Directed by Irving Cummings\nStarring:\nLon Ch...,86,0,0,PT1H13M18S,0
13852,zrhl2wZrvgI,108316,American Scary (2006),[],American Scary,86,0,0,PT2M40S,0
12252,D4MYlz7vaRg,98337,97 Percent True (2008),[],2008 Guy Maddin,97,1,0,PT6M30S,0


In [21]:
bottom_10_views = df.sort_values("Number of views", ascending=True).head(10)['Movie title']
bottom_10_views

15025                                And So It Is (1966)
16341                                   Crockdale (2011)
12899                                     Hilton! (2013)
16453                                Chronic Town (2010)
15133                                     Freedom (2009)
13735                          Wooden Bridge, The (2012)
14165    Me Two (Personne aux deux personnes, La) (2008)
15550                             Flesh and Blood (1922)
13852                              American Scary (2006)
12252                             97 Percent True (2008)
Name: Movie title, dtype: object

### The most liked video

In [22]:
df.sort_values("Number of likes", ascending=False).head(3)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
14279,450p7goxZqg,111226,All of Me (2013),"[Sweet, Masterpiece song <br>2022🔥🔥🔥🔥, F***in ...",Official music video for “All of Me” by John L...,2076217622,11442023,237883,PT5M8S,0
15626,dNJdJIwCF_Y,120853,Fresh Guacamole (2012),"[mm yes, my favorite food<br><br><br><br><br><...",The 2013 Academy Award Nominated film by PES. ...,442365631,2574246,68802,PT1M41S,0
13144,NVcSNnqRD0c,104076,"Smurfs 2, The (2013)",[This song reminds me a lot to I’m fed up from...,Britney Spears' official music video for 'Ooh ...,119915960,765593,58976,PT4M21S,0


In [23]:
most_liked = df.sort_values("Number of likes", ascending=False).head(1)['Movie title']
most_liked

14279    All of Me (2013)
Name: Movie title, dtype: object

### The least liked video

In [24]:
df.sort_values("Number of likes", ascending=True).head(3)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
12899,HX7dx_w_Ol8,102860,Hilton! (2013),[],,17,0,0,PT18S,0
16440,DxvOSYWicGg,129719,That's Life (1998),[Gerry Wilson was one of my best friends- miss...,"find out what ever happen to the 1998 TV show,...",559,0,1,PT5M18S,0
14873,a9uUiCoeqAA,115174,Love Is Strange (2014),[],genConnect joined John Lithgow and Alfred Moli...,2354,0,0,PT1M48S,0


In [26]:
least_liked = df.sort_values("Number of likes", ascending=True).head(1)['Movie title']
least_liked

12899    Hilton! (2013)
Name: Movie title, dtype: object

Several videos have 0 likes

In [30]:
df[df['Number of likes'] == 0].count()

Video ID               40
Movie ID               40
Movie title            40
Comments               40
Description            40
Number of views        40
Number of likes        40
Number of comments     40
Duration               40
Number of favorites    40
dtype: int64

In [32]:
least_liked_videos = df[df['Number of likes'] == 0]['Movie title']
len(least_liked_videos)

40

In [33]:
least_liked_videos

403         Bread and Chocolate (Pane e cioccolata) (1973)
707                                Leopard Son, The (1996)
1167                                 Love Walked In (1998)
5532                                     All at Sea (1957)
5967                  Adversary, The (L'adversaire) (2002)
6189                   Monday Morning (Lundi matin) (2002)
6906             Dark at the Top of the Stairs, The (1960)
7054        Orchestra Rehearsal (Prova d'orchestra) (1978)
8440                                  Waiter (Ober) (2006)
10500     In the Midst of Life (Au coeur de la vie) (1963)
10744    No Rest for the Brave (Pas de repos pour les b...
10828                 Three Brothers (Tre fratelli) (1981)
11264                                  Election Day (2007)
11399                     Hans (Kukkulan kuningas) (2009) 
12376                          Electile Dysfunction (2008)
12899                                       Hilton! (2013)
13122                                Only the Young (201

### The video with the highest duration

In [37]:
df.head(1)

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites
0,rEnOoWs3FuA,3,Grumpier Old Men (1995),"[Buena película de comedia romántica, <a href=...","The more things change, the more they stay the...",173631,218,13,PT1M51S,0


In [41]:
duration_formated = pd.to_timedelta(df['Duration'])
df['Duration formated'] = duration_formated
df[['Duration', 'Duration formated']].head(3)

Unnamed: 0,Duration,Duration formated
0,PT1M51S,0 days 00:01:51
1,PT2M28S,0 days 00:02:28
2,PT1M,0 days 00:01:00


In [43]:
df.sort_values("Duration formated", ascending=False).head()

Unnamed: 0,Video ID,Movie ID,Movie title,Comments,Description,Number of views,Number of likes,Number of comments,Duration,Number of favorites,Duration formated
11334,RBB_6gpUE-Q,91444,Getting to Know You (1999),[],"Trailer for the independent film ""Getting to K...",3779,2,0,PT12H49M24S,0,0 days 12:49:24
13348,eJ3RzGoQC4s,105250,"Century of the Self, The (2002)",[Tony Blair ushered in the &#39;end of elitist...,Adam Curtis Documentary. \n\nhttps://en.m.wiki...,2865142,34979,798,PT3H54M44S,0,0 days 03:54:44
4850,eIozQwKTxp0,7767,"Best of Youth, The (La meglio gioventù) (2003)","[gee this trailer sure spoils the whole story,...",The best of youth (trailer)\r\nItalian movie 2...,330972,407,51,PT3H49M44S,0,0 days 03:49:44
15721,bxKkeqN4LCI,121403,Elvis and Me (1988),"[I get no volume, ugh where is the sound???, N...","""Copyright Disclaimer Under Section 107 of the...",370930,1742,373,PT3H4M7S,0,0 days 03:04:07
16236,NLV2Pojnvwg,127644,The Trial of Lee Harvey Oswald (1977),[fact:<br>1) there were more than three shots ...,"This is the complete 1977 TV movie ""The Trial ...",106815,599,459,PT3H3M23S,0,0 days 03:03:23


In [44]:
highest_duration = df.sort_values("Duration formated", ascending=False).head(1)['Movie title']
highest_duration

11334    Getting to Know You (1999)
Name: Movie title, dtype: object

### Sentiment analysis