# Building the Dreamworks Data Frame
This code is very similar to the code in my Disney dataframe notebooke, found [here](https://github.com/Data-Science-for-Linguists-2019/Animated-Movie-Gendered-Dialogue/blob/master/code/Disney_code/Disney_Data_Complete_Edits_Good.ipynb).

I will be labeling gender and roles in each Dreamworks movie. Because the Dreamworks films widely don't feature singing. The song function will be used rarely, if at all. The nine movies dataframes to annotate are: Antz, Shrek, Shrek the Third, How to Train Your Dragon, How to Train Your Dragon 2, The Croods, Rise of the Guardians, Kung Fu Panda, and Megamind. I'll need to add...
* Gender
* Role
* Speaker Status (princess, prince, or neither)
* Dialogue Labels

After adding all this, I'll add
* Movie name
* Year of release
* Utterance Numbers

Finally, I concatenate all these together, and add a Disney_Period column. However, instead of adding a period, I'll simply label "Dreamworks", so I can easily split Disney and Dreamworks data later.

## Necessary Functions

In [1]:
#Gender Label
def whichgen(name, female, male):
    if name in female: return 'f'
    elif name in male: return 'm'
    else: return 'n'

#Role Label
def whichrole(name, pro, ant, helper):
    if name in pro: return 'PRO'
    if name in ant: return 'ANT'
    if name in helper: return 'HELPER'
    else: return "N" #for neutral

#Dialogue Label
def is_dialogue(line, song):
    if line in song: return 'S'
    else: return 'D'
    
#Status Label
def status(name, princess, prince):
    if name in princess: return 'PRINCESS'
    if name in prince: return 'PRINCE'
    else: return 'NON-P'

In [2]:
import functools
from functools import partial

In [3]:
import pandas as pd

In [4]:
%pprint

Pretty printing has been turned OFF


## Antz

In [5]:
antz_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/antz_lines.pkl")

In [6]:
antz_df.head()

Unnamed: 0,Speaker,Text
0,Z,"All my life, I've lived and worked in the big..."
1,Z,"...which is kind of a problem, since I've alwa..."
2,Z,I feel...isolated. Different. I've got aband...
3,MOTIVATIONAL COUNSELLOR,Terrific! You should feel insignificant!
4,Z,...I should?


In [7]:
# Need to lower all entries
antz_df.Speaker = antz_df.Speaker.str.lower().str.strip()

In [8]:
antz_df.Text = antz_df.Text.str.lower().str.strip()

In [9]:
antz_df.head()

Unnamed: 0,Speaker,Text
0,z,"all my life, i've lived and worked in the big ..."
1,z,"...which is kind of a problem, since i've alwa..."
2,z,i feel...isolated. different. i've got aband...
3,motivational counsellor,terrific! you should feel insignificant!
4,z,...i should?


In [10]:
sorted(antz_df.Speaker.unique())

['all', 'ant officer', 'ant soldiers', 'aphids', 'azteca', 'bala', 'barbatus', 'barker', 'bartender', 'beetle', 'butterfly', 'carpenter', 'colonel', 'commando #1', 'commando ant', 'commando ant #1', 'commando ant #3', 'cricket', 'drunk scout', 'excited ants', 'female wasp', 'fly', 'foreman', 'formica', 'general formica', 'guard ant', 'handmaiden #1', 'handmaiden #2', 'ladybug', 'loud voice', 'major mandible', 'male wasp', 'mandible', 'motivational counsellor', 'officer', 'princess', 'queen', 'soldier', 'soldier #2', 'soldier ant', 'soldier ants', 'soldiers', 'the wasps', 'tough voice', 'tracker ant', 'wasp', 'weaver', 'worker', 'worker #1', 'worker #2', 'worker #3', 'worker #4', 'worker ant #1', 'worker ants', 'workers', 'z']

### Gender

In [11]:
antz_female = ['azteca', 'bala', 'female wasp', 'handmaiden #1', 'handmaiden #2', 'princess', 'queen']
antz_male = ['z', 'motivational counsellor', 'ant soldiers', 'barbatus', 'major mandible', 'male wasp', 'mandible', 'weaver',
         'drunk scout', 'foreman', 'formica', 'general formica', 'bartender']
#the genders of several other characters are ambiguous

In [12]:
gender_func = partial(whichgen, female = antz_female, male= antz_male)
antz_df["Gender"] = antz_df.Speaker.map(gender_func)

### Role

In [13]:
antz_pro = ['z', 'bala', 'princess']
antz_ant = ['mandible', 'major mandible']
antz_helper = ['weaver', 'azteca', 'barbatus', 'formica', 'general formica']

In [14]:
role_func = partial(whichrole, pro = antz_pro, ant=antz_ant, helper = antz_helper)
antz_df["Role"] = antz_df.Speaker.map(role_func)

### Status

In [15]:
antz_princess = ['bala, princess']
antz_prince = []

In [16]:
status_func = partial(status, princess = antz_princess, prince = antz_prince)
antz_df['Speaker_Status'] = antz_df.Speaker.map(status_func)

### Song
Z sings Almost Like Being in Love at one point in the movie, but it doesn't seem to be in the script.

In [17]:
antz_song_words = ['in love', 'almost like']
for line in antz_song_words:
    print(line)
    print(antz_df[["Speaker", "Text"]][antz_df.Text.str.contains(line, regex = False)])
    print('\n')

in love
     Speaker                                               Text
260        z                          so...you two are in love?
261  formica  in love?  i'm just a plain old soldier at hear...


almost like
Empty DataFrame
Columns: [Speaker, Text]
Index: []




### Song/Movie/Year/Utterance Number

In [18]:
antz_df['Song'] = 'D'
antz_df['Movie'] = 'Antz'
antz_df['Year'] = 1998
antz_df['UTTERANCE_NUMBER'] = antz_df.Text.index + 1

In [19]:
antz_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 644 entries, 0 to 643
Data columns (total 9 columns):
Speaker             644 non-null object
Text                644 non-null object
Gender              644 non-null object
Role                644 non-null object
Speaker_Status      644 non-null object
Song                644 non-null object
Movie               644 non-null object
Year                644 non-null int64
UTTERANCE_NUMBER    644 non-null int64
dtypes: int64(2), object(7)
memory usage: 27.7+ KB


In [20]:
antz_df.sample(5)

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
101,bala,"i'm just a common worker, cooling off after a ...",f,PRO,NON-P,D,Antz,1998,102
553,bala,z! you came back!,f,PRO,NON-P,D,Antz,1998,554
393,formica,no one should have to. have him brought to me.,m,HELPER,NON-P,D,Antz,1998,394
239,azteca,are you asking me out to dinner?,f,HELPER,NON-P,D,Antz,1998,240
45,queen,"bala has always been a hopeless romantic, gene...",f,N,NON-P,D,Antz,1998,46


## Shrek

In [21]:
shrek_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/shrek_lines.pkl")

In [22]:
shrek_df.head()

Unnamed: 0,Speaker,Text,Speaker_Status,Movie,Year,UTTERANCE_NUMBER
0,shrek,once upon a time there was a lovely princess. ...,NON-P,Shrek,2001,1
1,man1,think it's in there?,NON-P,Shrek,2001,2
2,man2,all right. let's get it!,NON-P,Shrek,2001,3
3,man1,whoa. hold on. do you know what that thing can...,NON-P,Shrek,2001,4
4,man3,"yeah, it'll grind your bones for it's bread.",NON-P,Shrek,2001,5


In [23]:
sorted(shrek_df.Speaker.unique())

['big bad wolf', 'blind mouse1', 'blind mouse2', 'congregation', 'crowd', 'donkey', 'dwarf', 'farquaad', 'fiona', 'gingerbread man', 'gipetto', 'gordo', 'guard', 'guards', 'head guard', 'little bear', 'little pig', 'man', 'man1', 'man2', 'man3', 'men', 'merry man', 'merry men', 'mirror', 'old woman', 'peter pan', 'pinocchio', 'priest', 'robin hood', 'shrek', 'shrek & fiona', 'thelonius', 'whispers', 'woman', 'wooden people']

### Gender

In [24]:
shrek_male = ['big bad wolf', 'blind mouse1', 'blind mouse2', 'donkey', 'dwarf', 'farquaad', 'gingerbread man',
             'gipetto', 'gordo', 'guard', 'guards', 'head guard', 'little bear', 'little pig', 'man', 'man1', 'man2',
             'man3', 'men', 'merry man', 'merry men', 'mirror', 'peter pan', 'pinocchio', 'priest', 'robin hood', 'shrek',
             'thelonius']
shrek_female = ['fiona', 'old woman', 'woman']


In [25]:
gender_func = partial(whichgen, female = shrek_female, male= shrek_male)
shrek_df["Gender"] = shrek_df.Speaker.map(gender_func)

## Role

In [26]:
shrek_pro = ['shrek', 'fiona', 'shrek & fiona']
shrek_ant = ['farquaad', 'thelonius', 'head guard', 'guard', 'guards']
shrek_helper = ['donkey']

In [27]:
role_func = partial(whichrole, pro = shrek_pro, ant=shrek_ant, helper = shrek_helper)
shrek_df["Role"] = shrek_df.Speaker.map(role_func)

### Song
There's a part where Fiona sings (and a bird explodes lol) but the script doesn't include it. What about the 'wooden people'?

In [28]:
shrek_df[shrek_df.Speaker == 'wooden people']

Unnamed: 0,Speaker,Text,Speaker_Status,Movie,Year,UTTERANCE_NUMBER,Gender,Role
169,wooden people,welcome to duloc such a perfect town,NON-P,Shrek,2001,170,n,N


In [29]:
shrek_df.iloc[168:175] # it's just the beginning of their song, so no need to mark.

Unnamed: 0,Speaker,Text,Speaker_Status,Movie,Year,UTTERANCE_NUMBER,Gender,Role
168,donkey,"hey, look at this!",NON-P,Shrek,2001,169,m,HELPER
169,wooden people,welcome to duloc such a perfect town,NON-P,Shrek,2001,170,n,N
170,donkey,wow! let's do that again!,NON-P,Shrek,2001,171,m,HELPER
171,shrek,"no. no. no, no, no! no.",NON-P,Shrek,2001,172,m,PRO
172,farquaad,brave knights. you are the best and brightest ...,PRINCE,Shrek,2001,173,m,ANT
173,shrek,all right. you're going the right way for a sm...,NON-P,Shrek,2001,174,m,PRO
174,donkey,sorry about that.,NON-P,Shrek,2001,175,m,HELPER


In [30]:
shrek_df['Song'] = 'D'

In [31]:
shrek_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 722 entries, 0 to 721
Data columns (total 9 columns):
Speaker             722 non-null object
Text                722 non-null object
Speaker_Status      722 non-null object
Movie               722 non-null object
Year                722 non-null int64
UTTERANCE_NUMBER    722 non-null int64
Gender              722 non-null object
Role                722 non-null object
Song                722 non-null object
dtypes: int64(2), object(7)
memory usage: 31.1+ KB


## Shrek The Third

In [32]:
shrek3_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/shrek3_lines.pkl")

In [33]:
shrek3_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 870 entries, 0 to 869
Data columns (total 2 columns):
Speaker    870 non-null object
Text       870 non-null object
dtypes: object(2)
memory usage: 6.8+ KB


In [34]:
shrek3_df.head()

Unnamed: 0,Speaker,Text
0,prince charming,"onward chauncey, to the highest room of the ta..."
1,gingerbread man,this is worse than love letters! i hate dinner...
2,pinocchio,me too.
3,prince charming,"whoa there, chauncey!"
4,actress,hark! the brave prince charming approach-ith.


### Gender

In [35]:
sorted(shrek3_df.Speaker.unique())

['actress', 'all', 'announcer', 'artie', 'audience', 'bohort', 'captain hook', 'cheerleaders', 'cinderella', 'cyclops', 'donkey', 'doris', 'dragon', 'drivers ed instructor', 'evil dwarfs', 'evil queen', 'evil trees', 'evil witches', 'fairytale creatures', 'fiddlesworth', 'fiona', 'gary', 'gingerbread man', 'guineverre', 'hall monitor', 'headless horseman', 'heckler', 'hook', 'jocks', 'king harold', 'knight', 'lancelot', 'mabel', 'master of ceremonies', 'merlin', 'mother', 'muffin man', 'nanny dwarf', 'ogre', 'ogre babies', 'ogre baby', 'old lady', 'other pirates', 'pigs', 'pinocchio', 'pirates', 'prince charming', 'princesses', 'principal pynchley', 'puppet master', 'puss', 'queen', 'rapunzel', 'raul', 'rumplestiltskin', 'ship captain', 'shrek', 'sleeping beauty', 'snow white', 'students', 'teacher', 'teenager', 'tiffany', 'van student', 'villains', 'waiter', 'wizard head', 'wolf', 'woman', 'xavier']

In [36]:
shrek3_female = ['actress', 'cheerleaders', 'cinderella', 'doris', 'dragon', 'evil queen', 'evil witches', 'fiona',
                'guineverre', 'mabel', 'mother', 'old lady', 'princesses', 'queen', 'rapunzel', 'sleeping beauty', 
                'snow white', 'tiffany', 'woman']
shrek3_male = ['artie', 'bohort', 'captain hook', 'cyclops', 'donkey', 'evil dwarfs', 'evil trees', 'fiddlesworth',
              'gary', 'gingerbread man', 'hall monitor', 'headless horseman', 'hook', 'jocks', 'king harold', 'knight',
              'lancelot', 'master of ceremonies', 'merlin', 'muffin man', 'nanny dwarf', 'pigs', 'pinocchio', 
              'prince charming', 'principal pynchley', 'puppet master', 'puss', 'raul', 'rumplestiltskin', 'ship captain',
              'shrek', 'wizard head', 'wolf', 'xavier']

In [37]:
gender_func = partial(whichgen, female = shrek3_female, male= shrek3_male)
shrek3_df["Gender"] = shrek3_df.Speaker.map(gender_func)

## Role
Why do I list so many more helpers? These characters play bigger roles in this film (the first movie doesn't feature the pigs or pinocchio, etc.., much besides in the beginning

In [38]:
shrek3_pro = ['shrek', 'fiona', 'artie']
shrek3_ant = ['prince charming', 'rapunzel']
shrek3_helper = ['donkey', 'cinderella', 'doris', 'dragon', 'gingerbread man', 'merlin', 'pigs', 'pinocchio', 'puss',
                'sleeping beauty', 'snow white', 'wolf', 'princesses']

In [39]:
role_func = partial(whichrole, pro = shrek3_pro, ant=shrek3_ant, helper = shrek3_helper)
shrek3_df["Role"] = shrek3_df.Speaker.map(role_func)

### Speaker Status

In [40]:
shrek3_princess  = ['cinderella', 'fiona','rapunzel','snow white', 'sleeping beauty', 'princesses']
shrek3_prince = ['prince charming', 'artie']

In [41]:
status_func = partial(status, princess = shrek3_princess, prince = shrek3_prince)
shrek3_df['Speaker_Status'] = shrek3_df.Speaker.map(status_func)

### Song/Movie/Year/Utterance Number

In [42]:
shrek3_df['Song'] = 'D'

In [43]:
shrek3_df['Movie'] = 'Shrek 3'

In [44]:
shrek3_df['Year'] = 2007

In [45]:
shrek3_df['UTTERANCE_NUMBER'] = shrek3_df.Text.index + 1

In [46]:
shrek3_df.head()

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
0,prince charming,"onward chauncey, to the highest room of the ta...",m,ANT,PRINCE,D,Shrek 3,2007,1
1,gingerbread man,this is worse than love letters! i hate dinner...,m,HELPER,NON-P,D,Shrek 3,2007,2
2,pinocchio,me too.,m,HELPER,NON-P,D,Shrek 3,2007,3
3,prince charming,"whoa there, chauncey!",m,ANT,PRINCE,D,Shrek 3,2007,4
4,actress,hark! the brave prince charming approach-ith.,f,N,NON-P,D,Shrek 3,2007,5


## How to Train Your Dragon

In [47]:
httyd_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/httyd_lines.pkl")

In [48]:
httyd_df.head()

Unnamed: 0.1,Unnamed: 0,Speaker,Text
0,0,hiccup,"this, is berk. it's twelve days north of hopel..."
1,1,hiccup,"my village. in a word, sturdy. and it's been h..."
2,2,hiccup,"we have fishing, hunting, and a charming view ..."
3,3,hiccup,...dragons.
4,4,hiccup,most people would leave. not us. we're vikings...


In [49]:
httyd_df = httyd_df.drop(columns="Unnamed: 0")

In [50]:
httyd_df.head()

Unnamed: 0,Speaker,Text
0,hiccup,"this, is berk. it's twelve days north of hopel..."
1,hiccup,"my village. in a word, sturdy. and it's been h..."
2,hiccup,"we have fishing, hunting, and a charming view ..."
3,hiccup,...dragons.
4,hiccup,most people would leave. not us. we're vikings...


### Gender

In [51]:
sorted(httyd_df.Speaker.unique())

['astrid', 'burnthair', 'catapult operator', 'fishlegs', 'gobber', 'hiccup', 'hiccup stoick', 'hoark', 'phlegma the fierce', 'ruffnut', 'snotlout', 'spitelout', 'stoick', 'stoick hiccup', 'teens', 'tuffnut', 'tuffnut ruffnut', 'viking', 'viking #1', 'viking #2', 'viking #3', 'viking #4', 'viking #5', 'viking #6', 'viking #7', 'viking in crowd', 'vikings']

In [52]:
httyd_female = ['astrid', 'phlegma the fierce', 'ruffnut']
httyd_male = ['burnthair', 'fishlegs', 'gobber', 'hiccup', 'hiccup stoick', 'hoark', 'snotlout', 'spitelout', 'stoick',
             'stoick hiccup', 'tuffnut']

In [53]:
gender_func = partial(whichgen, female = httyd_female, male= httyd_male)
httyd_df["Gender"] = httyd_df.Speaker.map(gender_func)

### Role

In [54]:
httyd_pro = ['hiccup', 'astrid']
httyd_ant = ['stoick']
httyd_helper = ['fishlegs', 'snotlout', 'tuffnut', 'ruffnut', 'tuffnut ruffnut', 'teens']

In [55]:
role_func = partial(whichrole, pro = httyd_pro, ant=httyd_ant, helper = httyd_helper)
httyd_df["Role"] = httyd_df.Speaker.map(role_func)

### Status
No royalty here!

In [56]:
httyd_df['Speaker_Status'] = 'NON-P'

### Song/Movie/Year/Utterance Number
No songs!

In [57]:
httyd_df['Song'] = 'D'

In [58]:
httyd_df['Movie'] = 'How to Train Your Dragon'

In [59]:
httyd_df['Year'] = 2010

In [60]:
httyd_df.reset_index(drop = True, inplace=True)

In [61]:
httyd_df['UTTERANCE_NUMBER'] = httyd_df.Text.index + 1

In [62]:
httyd_df.head()

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
0,hiccup,"this, is berk. it's twelve days north of hopel...",m,PRO,NON-P,D,How to Train Your Dragon,2010,1
1,hiccup,"my village. in a word, sturdy. and it's been h...",m,PRO,NON-P,D,How to Train Your Dragon,2010,2
2,hiccup,"we have fishing, hunting, and a charming view ...",m,PRO,NON-P,D,How to Train Your Dragon,2010,3
3,hiccup,...dragons.,m,PRO,NON-P,D,How to Train Your Dragon,2010,4
4,hiccup,most people would leave. not us. we're vikings...,m,PRO,NON-P,D,How to Train Your Dragon,2010,5


In [63]:
httyd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 726 entries, 0 to 725
Data columns (total 9 columns):
Speaker             726 non-null object
Text                726 non-null object
Gender              726 non-null object
Role                726 non-null object
Speaker_Status      726 non-null object
Song                726 non-null object
Movie               726 non-null object
Year                726 non-null int64
UTTERANCE_NUMBER    726 non-null int64
dtypes: int64(2), object(7)
memory usage: 31.2+ KB


## How to Train Your Dragon 2

In [64]:
httyd2_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/httyd2_lines.pkl")

In [65]:
httyd2_df.head()

Unnamed: 0,Speaker,Text
0,HICCUP,This... is Berk. The best kept secret this sid...
1,HICCUP,"Life here is amazing, just not for the faint o..."
2,HICCUP,DRAGON RACING!
3,SNOTLOUT,"Oh, I'm sorry, Fishlegs! Did you want that?"
4,FISHLEGS,Snotlout! That's mine!


In [66]:
httyd2_df.Speaker = httyd2_df.Speaker.str.lower().str.strip()

In [67]:
httyd2_df.Text = httyd2_df.Text.str.lower().str.strip()

In [68]:
httyd2_df.head()

Unnamed: 0,Speaker,Text
0,hiccup,this... is berk. the best kept secret this sid...
1,hiccup,"life here is amazing, just not for the faint o..."
2,hiccup,dragon racing!
3,snotlout,"oh, i'm sorry, fishlegs! did you want that?"
4,fishlegs,snotlout! that's mine!


### Gender

In [69]:
sorted(httyd2_df.Speaker.unique())

['archer', 'astrid', 'drago', 'eret', 'fishlegs', 'gobber', 'hiccup', 'hoark', 'ruffnut', 'snotlout', 'soldier', 'starkard', 'stoick', 'teeny', 'tuffnut', 'tuffnut ruffnut', 'ug', 'valka', 'warrior']

In [70]:
httyd2_female = ['astrid', 'valka', 'ruffnut']
httyd2_male = ['drago', 'eret', 'fishlegs', 'gobber', 'hiccup', 'hoark', 'snotlout', 'starkard', 'stoick', 
               'teeny', 'tuffnut','ug']

In [71]:
gender_func = partial(whichgen, female = httyd2_female, male= httyd2_male)
httyd2_df["Gender"] = httyd2_df.Speaker.map(gender_func)

### Role
In this film, stoick supports hiccup, and the villagers who were apathetic towards him are supportive

In [72]:
httyd2_pro = ['hiccup', 'astrid']
httyd2_ant = ['drago', 'eret', 'ug']
httyd2_helper = ['fishlegs', 'gobber', 'hoark', 'ruffnut', 'snotlout', 'stoick', 'tuffnut', 'tuffnut ruffnut', 'valka']

In [73]:
role_func = partial(whichrole, pro = httyd2_pro, ant=httyd2_ant, helper = httyd2_helper)
httyd2_df["Role"] = httyd2_df.Speaker.map(role_func)

### Status
again, no royalty!

In [74]:
httyd2_df['Speaker_Status'] = 'NON-P'

### Song
There is a song in this script, called "For the Dancing and Dreaming". Valka and Stoick sing it

In [75]:
httyd2_song_words = ['scorching sun', 'stop me on', 'promise me your heart', 'love me for eternity', 'my darling dear',
                    'rings of gold', 'from all harm', 'dancing and the dreaming', 'swim and sail']
for line in httyd2_song_words:
    print(line)
    print(httyd2_df[["Speaker", "Text"]][httyd2_df.Text.str.contains(line, regex = False)])
    print('\n')

scorching sun
    Speaker                                 Text
485  stoick  no scorching sun, nor freezing cold


stop me on
    Speaker                                   Text
486  gobber  -- will stop me on my journey! sorry.


promise me your heart
    Speaker                                  Text
487  stoick  if you will promise me your heart...


love me for eternity
    Speaker                       Text
489   valka  and love me for eternity.


my darling dear
    Speaker                                               Text
490   valka  my dearest one, my darling dear, you mighty wo...


rings of gold
    Speaker                                               Text
491  stoick  but i would bring you rings of gold. i'd even ...
493   valka  i have no use for rings of gold. i care not fo...


from all harm
    Speaker                                               Text
492  stoick  and i would keep you from all harm, if you'd s...


dancing and the dreaming
    Speaker                   

In [76]:
httyd2_df.Text.iloc[482:496]

482                                 oh, i love this one!
483                              remember our song, val?
484    i'll swim and sail on savage seas, with ne'er ...
485                  no scorching sun, nor freezing cold
486                -- will stop me on my journey! sorry.
487                 if you will promise me your heart...
488                                          and love...
489                            and love me for eternity.
490    my dearest one, my darling dear, you mighty wo...
491    but i would bring you rings of gold. i'd even ...
492    and i would keep you from all harm, if you'd s...
493    i have no use for rings of gold. i care not fo...
494    c'mon, hiccup! valka, stoick, & gobber to love...
495                                   i'm still going...
Name: Text, dtype: object

In [77]:
httyd2_song_list = list(range(484,495))

In [78]:
song_func = partial(is_dialogue, song = httyd2_song_list)
httyd2_df["Song"] = httyd2_df.index.map(song_func)

### Movie/Year/Utterance Number

In [79]:
httyd2_df['Movie'] = 'How to Train Your Dragon 2'
httyd2_df['Year'] = 2014
httyd2_df['UTTERANCE_NUMBER'] = httyd2_df.Text.index + 1

In [80]:
httyd2_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 708 entries, 0 to 707
Data columns (total 9 columns):
Speaker             708 non-null object
Text                708 non-null object
Gender              708 non-null object
Role                708 non-null object
Speaker_Status      708 non-null object
Song                708 non-null object
Movie               708 non-null object
Year                708 non-null int64
UTTERANCE_NUMBER    708 non-null int64
dtypes: int64(2), object(7)
memory usage: 30.5+ KB


In [81]:
httyd2_df.head()

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
0,hiccup,this... is berk. the best kept secret this sid...,m,PRO,NON-P,D,How to Train Your Dragon 2,2014,1
1,hiccup,"life here is amazing, just not for the faint o...",m,PRO,NON-P,D,How to Train Your Dragon 2,2014,2
2,hiccup,dragon racing!,m,PRO,NON-P,D,How to Train Your Dragon 2,2014,3
3,snotlout,"oh, i'm sorry, fishlegs! did you want that?",m,HELPER,NON-P,D,How to Train Your Dragon 2,2014,4
4,fishlegs,snotlout! that's mine!,m,HELPER,NON-P,D,How to Train Your Dragon 2,2014,5


## The Croods

In [82]:
croods_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/croods_lines.pkl")

In [83]:
croods_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 772 entries, 0 to 771
Data columns (total 2 columns):
Speaker    772 non-null object
Text       771 non-null object
dtypes: object(2)
memory usage: 6.1+ KB


In [84]:
croods_df[croods_df.Text.isnull()]

Unnamed: 0,Speaker,Text
257,EEP_,


In [85]:
croods_df.dropna(axis = 0, inplace = True)

In [86]:
croods_df.reset_index(drop=True, inplace=True)

In [87]:
croods_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 771 entries, 0 to 770
Data columns (total 2 columns):
Speaker    771 non-null object
Text       771 non-null object
dtypes: object(2)
memory usage: 6.1+ KB


### Gender

In [88]:
sorted(croods_df.Speaker.unique())

['BELT', 'CHUNKY', 'CREATURE', 'CROODS', 'EEP', 'ERF ERF', 'GRAN', 'GRUG', 'GUY', 'MACAWNIVORE', 'SANDY', 'THUNK', 'UGA', 'UGGA']

In [89]:
croods_df.Speaker = croods_df.Speaker.str.lower().str.strip()
croods_df.Text = croods_df.Text.str.lower().str.strip()

In [90]:
croods_df.head()

Unnamed: 0,Speaker,Text
0,eep,with every sun comes a new day. a new beginnin...
1,eep,"but not for me. my name's eep. this, is my fam..."
2,eep,we were the last ones around. there used to be...
3,eep,because of my dad. he was strong... and he fol...
4,eep,... the ones painted on the cave walls. anythi...


In [91]:
sorted(croods_df.Speaker.unique())

['belt', 'chunky', 'creature', 'croods', 'eep', 'erf erf', 'gran', 'grug', 'guy', 'macawnivore', 'sandy', 'thunk', 'uga', 'ugga']

In [92]:
## uga is a typo of ugga
croods_df.Speaker.replace("uga", "ugga", inplace = True)

In [93]:
sorted(croods_df.Speaker.unique())

['belt', 'chunky', 'creature', 'croods', 'eep', 'erf erf', 'gran', 'grug', 'guy', 'macawnivore', 'sandy', 'thunk', 'ugga']

In [94]:
croods_df[croods_df.Speaker == 'erf erf']

Unnamed: 0,Speaker,Text
259,erf erf,glaaabbbllllellelller!


In [95]:
croods_df.iloc[259]

Speaker                   erf erf
Text       glaaabbbllllellelller!
Name: 259, dtype: object

In [96]:
croods_df.drop(index = 259, inplace = True)

In [97]:
croods_df.reset_index(drop=True, inplace = True)

In [98]:
croods_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 770 entries, 0 to 769
Data columns (total 2 columns):
Speaker    770 non-null object
Text       770 non-null object
dtypes: object(2)
memory usage: 6.1+ KB


In [99]:
sorted(croods_df.Speaker.unique())

['belt', 'chunky', 'creature', 'croods', 'eep', 'gran', 'grug', 'guy', 'macawnivore', 'sandy', 'thunk', 'ugga']

In [100]:
croods_female = ['eep', 'gran', 'ugga', 'sandy']
croods_male = ['belt', 'grug', 'guy', 'thunk', 'chunky']

In [101]:
gender_func = partial(whichgen, female = croods_female, male= croods_male)
croods_df["Gender"] = croods_df.Speaker.map(gender_func)

### Role
another movie in which a daughter rebels against her dad...making the antagonist/protagnist line very blurry.

In [102]:
croods_pro = ['eep', 'guy']
croods_ant = ['grug', 'chunky', 'macawnivore']
croods_helper = ['belt', 'thunk', 'gran', 'ugga', 'sandy']

In [103]:
role_func = partial(whichrole, pro = croods_pro, ant=croods_ant, helper = croods_helper)
croods_df["Role"] = croods_df.Speaker.map(role_func)

### Status/Song/Movie/Year/Utterance number

In [104]:
croods_df['Speaker_Status'] = 'NON-P'
croods_df['Song'] = 'D'
croods_df['Movie'] = 'The Croods'
croods_df['Year'] = 2013
croods_df['UTTERANCE_NUMBER'] = croods_df.Text.index + 1

In [105]:
croods_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 770 entries, 0 to 769
Data columns (total 9 columns):
Speaker             770 non-null object
Text                770 non-null object
Gender              770 non-null object
Role                770 non-null object
Speaker_Status      770 non-null object
Song                770 non-null object
Movie               770 non-null object
Year                770 non-null int64
UTTERANCE_NUMBER    770 non-null int64
dtypes: int64(2), object(7)
memory usage: 33.1+ KB


In [106]:
croods_df.sample(5)

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
356,gran,i don't blame it.,f,HELPER,NON-P,D,The Croods,2013,357
355,thunk,ow! something bit me!,m,HELPER,NON-P,D,The Croods,2013,356
140,guy,it's dying. i can fix it!,m,PRO,NON-P,D,The Croods,2013,141
299,sandy,"weeeee!!!! gran, who is still trying to put ou...",f,HELPER,NON-P,D,The Croods,2013,300
277,guy,"you know, you're a lot like your daughter.",m,PRO,NON-P,D,The Croods,2013,278


## Rise of the Guardians

In [107]:
rotg_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/rotg_lines.pkl")

In [108]:
rotg_df.head()

Unnamed: 0,Speaker,Text
0,JACK,Darkness. That's the first thing I remember. I...
1,JACK,But then...then I saw the moon. It was so big ...
2,JACK,"Why I was there, and what I was meant to do - ..."
3,JACK,"Hello. Hello. Good evening, ma'am. Ma'am?"
4,JACK,"Oh, ah, excuse me, can you tell me where I am?"


In [109]:
rotg_df.Speaker = rotg_df.Speaker.str.lower().str.strip()
rotg_df.Text = rotg_df.Text.str.lower().str.strip()

In [110]:
rotg_df.head()

Unnamed: 0,Speaker,Text
0,jack,darkness. that's the first thing i remember. i...
1,jack,but then...then i saw the moon. it was so big ...
2,jack,"why i was there, and what i was meant to do - ..."
3,jack,"hello. hello. good evening, ma'am. ma'am?"
4,jack,"oh, ah, excuse me, can you tell me where i am?"


In [111]:
rotg_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 930 entries, 0 to 929
Data columns (total 2 columns):
Speaker    930 non-null object
Text       928 non-null object
dtypes: object(2)
memory usage: 7.3+ KB


In [112]:
rotg_df[rotg_df.Text.isnull()]

Unnamed: 0,Speaker,Text
686,baby tooth_,
929,white_,


In [113]:
rotg_df.dropna(axis=0, inplace=True)

In [114]:
rotg_df.reset_index(drop=True, inplace=True)

In [115]:
rotg_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 928 entries, 0 to 927
Data columns (total 2 columns):
Speaker    928 non-null object
Text       928 non-null object
dtypes: object(2)
memory usage: 7.3+ KB


### Gender

In [116]:
sorted(rotg_df.Speaker.unique())

['boy', 'british boy', 'british girl', 'british kids', 'bunny', 'bunnymund', 'caleb', 'claude', 'cupcake', 'jack', "jack's mother", "jack's sister", 'jamie', "jamie's mom", 'kids', 'mom', 'monty', 'north', 'pippa', 'pitch', 'residents', 'russian boy', 'sister', 'sophie', 'tooth', 'walker', 'yeti', 'yetis']

In [117]:
rotg_female = ['british girl', 'cupcake', "jack's mother", "jack's sister", "jamie's mom", 'mom', 'pippa', 'sister',
              'sophie', 'tooth']
rotg_male = ['boy', 'british boy', 'bunny', 'bunnymund', 'caleb', 'claude', 'jack', 'jamie', 'monty', 'north', 'pitch',
            'russian boy', 'walker', 'yeti']

In [118]:
gender_func = partial(whichgen, female = rotg_female, male= rotg_male)
rotg_df["Gender"] = rotg_df.Speaker.map(gender_func)

### Role

In [119]:
rotg_pro = ['jack']
rotg_ant = ['pitch']
rotg_helper = ['bunny', 'bunnymund', 'north', 'tooth']

In [120]:
role_func = partial(whichrole, pro = rotg_pro, ant=rotg_ant, helper = rotg_helper)
rotg_df["Role"] = rotg_df.Speaker.map(role_func)

### Status/Song/Movie/Year/Utterance Number
No royalty, and no songs!

In [121]:
rotg_df['Speaker_Status'] = 'NON-P'
rotg_df['Song'] = 'D'
rotg_df['Movie'] = 'Rise of the Guardians'
rotg_df['Year'] = 2012
rotg_df['UTTERANCE_NUMBER'] = rotg_df.Text.index + 1

In [122]:
rotg_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 928 entries, 0 to 927
Data columns (total 9 columns):
Speaker             928 non-null object
Text                928 non-null object
Gender              928 non-null object
Role                928 non-null object
Speaker_Status      928 non-null object
Song                928 non-null object
Movie               928 non-null object
Year                928 non-null int64
UTTERANCE_NUMBER    928 non-null int64
dtypes: int64(2), object(7)
memory usage: 39.9+ KB


In [123]:
rotg_df.sample(5)

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
305,north,out of the way!,m,HELPER,NON-P,D,Rise of the Guardians,2012,306
125,monty,no.,m,N,NON-P,D,Rise of the Guardians,2012,126
660,caleb,it was a dream. you should be happy you still ...,m,N,NON-P,D,Rise of the Guardians,2012,661
343,north,why are you doing this?,m,HELPER,NON-P,D,Rise of the Guardians,2012,344
566,jack,"uh, how much time do we have?",m,PRO,NON-P,D,Rise of the Guardians,2012,567


## Kung Fu Panda

In [124]:
kfp_df = pd.read_pickle(r"C:\Users\cassi\Desktop\Data_Science\Animated-Movie-Gendered-Dialogue\privatę̨\kfp_lines.pkl")

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\cassi\\Desktop\\Data_Science\\Animated-Movie-Gendered-Dialogue\\privatę̨\\kfp_lines.pkl'

In [125]:
#Hmmm, my pickle file isn't being read in....
kfp_df = pd.read_csv(r"C:\Users\cassi\Desktop\Data_Science\Animated-Movie-Gendered-Dialogue\private\kfp_lines.csv")

In [126]:
kfp_df.head()

Unnamed: 0.1,Unnamed: 0,Speaker,Text
0,0,narrator,legend tells of a legendary warrior whose kung...
1,1,narrator,he traveled the land in search of worthy foes.
2,2,gang boss,i see you like to chew! maybe you should chew ...
3,3,narrator,the warrior said nothing for his mouth was ful...
4,4,narrator,"and then, he spoke."


In [127]:
kfp_df.drop(columns="Unnamed: 0", inplace=True)

In [128]:
kfp_df.head()

Unnamed: 0,Speaker,Text
0,narrator,legend tells of a legendary warrior whose kung...
1,narrator,he traveled the land in search of worthy foes.
2,gang boss,i see you like to chew! maybe you should chew ...
3,narrator,the warrior said nothing for his mouth was ful...
4,narrator,"and then, he spoke."


In [129]:
kfp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 689 entries, 0 to 688
Data columns (total 2 columns):
Speaker    689 non-null object
Text       689 non-null object
dtypes: object(2)
memory usage: 5.4+ KB


### Gender

In [130]:
sorted(kfp_df.Speaker.unique())

['angry patron', 'announcer', 'bunny fan #1', 'bunny fan #2', 'commander', 'crane', 'crowd', 'disgusted patron', 'furious five', 'gang boss', 'gator', 'grateful bunny', 'jr shaw', 'kg shaw', 'mantis', 'monkey', 'narrator', 'ninja cat', 'oogway', 'pig fan', 'po', "po's dad", 'rhino guard #1', 'shifu', 'smitten bunny', 'tai lung', 'tigress', 'viper', 'warrior', 'zeng']

In [131]:
kfp_female = ['bunny fan #1', 'bunny fan #2', 'smitten bunny', 'tigress', 'viper', 'ninja cat']
kfp_male = ['commander', 'crane', 'gang boss', 'gator', 'grateful bunny', 'jr shaw', 'kg shaw', 'mantis', 'monkey', 'oogway', 'po', "po's dad",
            'rhino guard #1', 'shifu', 'tai lung','zeng']

In [132]:
gender_func = partial(whichgen, female = kfp_female, male= kfp_male)
kfp_df["Gender"] = kfp_df.Speaker.map(gender_func)

### Role

In [133]:
kfp_pro = ['po']
kfp_ant = ['tai lung']
kfp_helper = ['oogway', 'shifu', 'crane', 'furious five', 'mantis', 'monkey', 'tigress', 'viper']

In [134]:
role_func = partial(whichrole, pro = kfp_pro, ant=kfp_ant, helper = kfp_helper)
kfp_df["Role"] = kfp_df.Speaker.map(role_func)

### Status/Song/Movie/Year/Utterance number

In [135]:
kfp_df['Speaker_Status'] = 'NON-P'
kfp_df['Song'] = 'D'
kfp_df['Movie'] = 'Kung Fu Panda'
kfp_df['Year'] = 2008
kfp_df['UTTERANCE_NUMBER'] = kfp_df.Text.index + 1

In [136]:
kfp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 689 entries, 0 to 688
Data columns (total 9 columns):
Speaker             689 non-null object
Text                689 non-null object
Gender              689 non-null object
Role                689 non-null object
Speaker_Status      689 non-null object
Song                689 non-null object
Movie               689 non-null object
Year                689 non-null int64
UTTERANCE_NUMBER    689 non-null int64
dtypes: int64(2), object(7)
memory usage: 29.6+ KB


In [137]:
kfp_df.sample(5)

Unnamed: 0,Speaker,Text,Gender,Role,Speaker_Status,Song,Movie,Year,UTTERANCE_NUMBER
269,crane,he is so mighty! the dragon warrior fell out o...,m,HELPER,NON-P,D,Kung Fu Panda,2008,270
160,shifu,"master oogway, wait! that flabby panda can't p...",m,HELPER,NON-P,D,Kung Fu Panda,2008,161
177,zeng,oh my...,m,N,NON-P,D,Kung Fu Panda,2008,178
294,po,okay. alright. goodnight. sleep well.,m,PRO,NON-P,D,Kung Fu Panda,2008,295
667,po,"oh, you know this hold?",m,PRO,NON-P,D,Kung Fu Panda,2008,668


## Megamind

In [138]:
mega_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/megamind_lines.pkl")

In [139]:
mega_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 617 entries, 0 to 616
Data columns (total 2 columns):
Speaker    617 non-null object
Text       617 non-null object
dtypes: object(2)
memory usage: 4.9+ KB


In [140]:
mega_df.head()

Unnamed: 0,Speaker,Text
0,megamind,"here’s my day so far; went to jail, lost the g..."
1,megamind’s mother,"here is your minion, he will take care of you."
2,megamind’s father,and here is you binky.
3,megamind’s father,you are destined….
4,megamind,"i didn’t quite here that last part, but it sou..."


### Gender

In [141]:
sorted(mega_df.Speaker.unique())

['bernard', 'hal', 'lady scott', 'lord scott', 'mayor', 'megamind', 'megamind’s father', 'megamind’s mother', 'metro man', 'minion', 'prisoner', 'roxanne ritchie', 'voice from crowd', 'warden']

In [142]:
mega_female = ['lady scott', "megamind's mother", 'roxanne ritchie']
mega_male = ['bernard', 'hal', 'lord scott', 'mayor', 'megamind', "megamind's father", 'metro man', 'minion',
            'prisoner', 'warden']

In [143]:
gender_func = partial(whichgen, female = mega_female, male= mega_male)
mega_df["Gender"] = mega_df.Speaker.map(gender_func)

### Role
So...this one is interesting. The protagonist of the story is a villain. The hero metro man isn't exactly an antagonist, just a washed up super hero.

In [144]:
mega_pro = ['megamind']
mega_ant = ['hal']
mega_helper = ['roxanee ritchie', 'minion']

In [145]:
role_func = partial(whichrole, pro = mega_pro, ant=mega_ant, helper = mega_helper)
mega_df["Role"] = mega_df.Speaker.map(role_func)

### Status/Song/Movie/Year/Utterance Number

In [146]:
mega_df['Speaker_Status'] = 'NON-P'
mega_df['Song'] = 'D'
mega_df['Movie'] = 'Megamind'
mega_df['Year'] = 2010
mega_df['UTTERANCE_NUMBER'] = mega_df.Text.index + 1

In [147]:
mega_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 617 entries, 0 to 616
Data columns (total 9 columns):
Speaker             617 non-null object
Text                617 non-null object
Gender              617 non-null object
Role                617 non-null object
Speaker_Status      617 non-null object
Song                617 non-null object
Movie               617 non-null object
Year                617 non-null int64
UTTERANCE_NUMBER    617 non-null int64
dtypes: int64(2), object(7)
memory usage: 26.6+ KB


# Concatenating all the Data Frames
Let's concatenate these in chronological order


In [148]:
dreamworks_df = pd.concat([antz_df, shrek_df, shrek3_df, kfp_df, httyd_df, mega_df, rotg_df, croods_df, httyd2_df],
                          axis=0, sort=True, ignore_index = True)

In [150]:
dreamworks_df.shape

(6674, 9)

In [151]:
dreamworks_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6674 entries, 0 to 6673
Data columns (total 9 columns):
Gender              6674 non-null object
Movie               6674 non-null object
Role                6674 non-null object
Song                6674 non-null object
Speaker             6674 non-null object
Speaker_Status      6674 non-null object
Text                6674 non-null object
UTTERANCE_NUMBER    6674 non-null int64
Year                6674 non-null int64
dtypes: int64(2), object(7)
memory usage: 286.8+ KB


In [153]:
#Adding the period column, which will act as a Dreamworks Marker
dreamworks_df['Disney_Period'] = 'DREAMWORKS'

In [154]:
disney_df = pd.read_pickle(r"C:/Users/cassi/Desktop/Data_Science/Animated-Movie-Gendered-Dialogue/private/all_disney_annotated.pkl")

In [155]:
disney_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7422 entries, 0 to 7421
Data columns (total 10 columns):
Disney_Period       7422 non-null object
Gender              7422 non-null object
Movie               7422 non-null object
Role                7422 non-null object
Song                7422 non-null object
Speaker             7422 non-null object
Speaker_Status      7422 non-null object
Text                7422 non-null object
UTTERANCE_NUMBER    7422 non-null int64
Year                7422 non-null int64
dtypes: int64(2), object(8)
memory usage: 347.9+ KB


In [156]:
animated_movies_df = pd.concat([disney_df, dreamworks_df], axis=0, sort=True, ignore_index = True)

In [158]:
animated_movies_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14096 entries, 0 to 14095
Data columns (total 10 columns):
Disney_Period       14096 non-null object
Gender              14096 non-null object
Movie               14096 non-null object
Role                14096 non-null object
Song                14096 non-null object
Speaker             14096 non-null object
Speaker_Status      14096 non-null object
Text                14096 non-null object
UTTERANCE_NUMBER    14096 non-null int64
Year                14096 non-null int64
dtypes: int64(2), object(8)
memory usage: 660.8+ KB


In [160]:
animated_movies_df.tail()

Unnamed: 0,Disney_Period,Gender,Movie,Role,Song,Speaker,Speaker_Status,Text,UTTERANCE_NUMBER,Year
14091,DREAMWORKS,m,How to Train Your Dragon 2,PRO,D,hiccup,NON-P,"we may be small in numbers, but we stand for s...",704,2014
14092,DREAMWORKS,m,How to Train Your Dragon 2,PRO,D,hiccup,NON-P,"we are the voice of peace. and bit by bit, we ...",705,2014
14093,DREAMWORKS,m,How to Train Your Dragon 2,PRO,D,hiccup,NON-P,"you see, we have something they don't. oh sure...",706,2014
14094,DREAMWORKS,m,How to Train Your Dragon 2,PRO,D,hiccup,NON-P,but we... we have...,707,2014
14095,DREAMWORKS,m,How to Train Your Dragon 2,PRO,D,hiccup,NON-P,our dragons!,708,2014


In [161]:
data_sample = animated_movies_df.sample(200)

In [162]:
data_sample.to_csv(r"C:\Users\cassi\Desktop\Data_Science\Animated-Movie-Gendered-Dialogue\data_sample\all_movies_200.csv")

In [163]:
animated_movies_df.to_pickle(r"C:\Users\cassi\Desktop\Data_Science\Animated-Movie-Gendered-Dialogue\private\all_movies.pkl")