# Analyse subtitles of Harry Potter 4

In this notebook we'll go throught Harry Potter 4 subtitles to see if we can find some entities in the text and if we can map it with characters.

In [1]:
%load_ext autoreload
%autoreload 2

## Import packages

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json

from bechdelai.nlp.process_srt import load_srt
from bechdelai.data.tmdb import get_movie_cast_from_id
from bechdelai.nlp.analyse_srt import extract_person_references_in_srt
from bechdelai.nlp.entities import match_entities_with_cast

2023-01-22 16:34:32,159 loading file C:\Users\natha\.flair\models\ner-english\4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4
2023-01-22 16:34:34,405 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>


In [3]:
pd.set_option('display.max_rows', 100)

## Load data

Srt file

In [4]:
fpath = "../../../data/srt/harry_potter_4.srt"
srt_list = load_srt(fpath)

In [5]:
for srt in srt_list[:2]:
    print(srt)

1
00:01:46,125 --> 00:01:48,543
Bloody kids.

2
00:02:35,007 --> 00:02:38,676
How fastidious you've become,
Wormtail.



TMDB meta data

In [6]:
tmdb_id = "674"
cast = get_movie_cast_from_id(tmdb_id)["cast"]

In [7]:
cast_df = pd.DataFrame(cast)
cast_df.head()

Unnamed: 0,adult,gender,id,known_for_department,name,original_name,popularity,profile_path,cast_id,character,credit_id,order
0,False,2,10980,Acting,Daniel Radcliffe,Daniel Radcliffe,48.452,/omfCyOCwgG26ajPp8NiaATgCeze.jpg,1,Harry Potter,52fe4268c3a36847f801c21d,0
1,False,2,10989,Acting,Rupert Grint,Rupert Grint,28.655,/q2KZZ0ltTEl7Sf8volNFV1JDEP4.jpg,2,Ron Weasley,52fe4268c3a36847f801c221,1
2,False,1,10990,Acting,Emma Watson,Emma Watson,54.312,/tvPPRGzAzdQFhlKzLbMO1EpuTJI.jpg,3,Hermione Granger,52fe4268c3a36847f801c225,2
3,False,2,1923,Acting,Robbie Coltrane,Robbie Coltrane,14.129,/jOHs3xvlwRiiG2CLtso5zzmGCXg.jpg,7,Rubeus Hagrid,52fe4268c3a36847f801c235,3
4,False,2,5469,Acting,Ralph Fiennes,Ralph Fiennes,62.183,/tJr9GcmGNHhLVVEH3i7QYbj6hBi.jpg,4,Lord Voldemort,52fe4268c3a36847f801c229,4


## Get entities and nouns

In [8]:
%%time
results = extract_person_references_in_srt(srt_list)

Wall time: 8min 54s


In [9]:
results.sample(20)

2023-01-22 16:43:33,148 loading file C:\Users\natha\.flair\models\ner-english\4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4
2023-01-22 16:43:35,326 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>


Unnamed: 0,srt_id,text,start_sec,end_sec,ent,start_idx,end_idx,ent_type,gender
224,163,How can the Ministry\nnot know who conjured it?,875,877,it,8,9,PRON,unknown
1117,737,"...is none other than the\nBulgarian bonbon, V...",3971,3975,Bulgarian,6,7,MISC,unknown
600,417,How did you do it?,2315,2317,it,4,5,PRON,unknown
2455,1578,...when Voldemort's wand and mine\nsort of con...,8896,8900,mine,6,7,PRON,unknown
522,372,"Harry, for goodness sake.",2080,2082,Harry,0,1,PER,unknown
2067,1321,Sonorus!,7143,7145,Sonorus,0,1,PROPN,unknown
1353,881,Because we'd take the mickey out of her\nif sh...,4637,4639,her,8,9,PRON,woman
1231,809,"Yeah, but, then again,\nhe can take himself.",4323,4325,himself,10,11,PRON,man
1621,1044,"Not being a bad boy again,\nare you, Harry?",5592,5594,boy,4,5,NOUN,unknown
1253,825,And I said yes!,4391,4393,I,1,2,PRON,unknown


In [14]:
len(results)

2487

In [12]:
results_with_cast = match_entities_with_cast(results, cast_df)

In [13]:
results_with_cast.sample(20)

Unnamed: 0,srt_id,text,start_sec,end_sec,ent,start_idx,end_idx,ent_type,gender,character_found
292,209,<i>Whether we be old and bald</i>,1169,1171,we,4,5,PRON,unknown,
1222,803,"- Hi, Harry.\n- Hello.",4277,4278,Harry,3,4,PER,man,Harry Potter
785,533,Dean was told by Parvati that...,2828,2831,Parvati,4,5,PER,woman,Parvati Patil
30,15,"Step aside, Wormtail,\nso I can give our guest...",213,219,guest,10,11,NOUN,unknown,
169,125,Harry!,740,742,Harry !,0,2,PER,man,Harry Potter
189,137,- Crime?\n- Barty! They're just kids.,775,777,Barty,4,5,PER,man,Barty Crouch
1023,676,Mr. Krum.,3414,3416,Mr.,0,1,PROPN,unknown,
1963,1262,"Curiosity is not a sin, Harry.\nBut you should...",6855,6860,Harry,6,7,PER,man,Harry Potter
799,538,"Yeah, I brought the cloak.\nHagrid, where are ...",2853,2856,I,2,3,PRON,unknown,
1192,789,"Blimey, Harry. You've slayed dragons.\nIf you ...",4200,4204,date,15,16,NOUN,unknown,
