# character_identification
We can learn the names of characters by getting creative with the subtitle track. Character names are important for NLP-based plot comprehension. We'll also want to identify names and tie them to face and vocal encodings, to persistently track characters throughout the film.

The audience learns who the names of characters by listening to the dialogue (except for the cases where character names are displayed onscreen, most often in documentaries or docu-dramatizations). So screenwriters know they have to put character names in dialogue. These might take the form of self-introductions "I am Detective Lieutenant Elliot" or more subtle hints like a line that addresses them in second-person "I'm sorry, Marta."

Screenwriters need to drop these hints when the character is introduced. But since we can analyze movies non-chronologically (all at once, in an instant), we can look for these types of clues everywhere.

In previous notebooks, we've demonstrated how to parse and clean subtitles. For clarity, we'll just be typing in the lines of dialogue manually.

In [1]:
import pysrt
import spacy

In [2]:
nlp = spacy.load('en')

# Introductions
## Self-Introduction
The most basic form of character introduction is the first-person introduction, which may take the form of "I'm Alice", or "My name is Marlowe."

This is a good time to clarify that many of the sentence structures we're looking for will be somewhat hard-coded. The two examples above are very common — there are only so many ways screenwriters can have a character introduce herself.

In [3]:
sent = "Hey, I'm Vlad."
sent_doc = nlp(sent)

In [4]:
for token in sent_doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_)

Hey hey INTJ UH intj Xxx
, , PUNCT , punct ,
I -PRON- PRON PRP nsubj X
'm be AUX VBP ROOT 'x
Vlad Vlad PROPN NNP attr Xxxx
. . PUNCT . punct .


We can have spaCy analyze this simple sentence. This three-word sentence turns into six tokens. Vlad is properly labeled as a PROPN, a proper noun.