# Refining My Disney Data
This code will accomplish several things, including
* Clearly outlining what the disney script corpus includes
    * What columns?
    * How are songs treated compared to normal dialogue?
* Separating out movies I do not wish to analyze
    * Removing the Lion King
* Adding Moana, complete with gender, role, and song vs dialogue annotations


In [1]:
#import necessary modules
import numpy as np
import pandas as pd

In [2]:
#import csv as a dataframe
disney = pd.read_csv(r'C:\Users\cassi\Desktop\Disney_Corpus.csv')

In [3]:
disney.head()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
0,EARLY,slave in the magic mirror come from the farthe...,NON-P,Snow White,queen,1937,1
1,EARLY,"what wouldst thou know, my queen ?",NON-P,Snow White,mirror,1937,2
2,EARLY,"magic mirror on the wall, who is the fairest o...",NON-P,Snow White,queen,1937,3
3,EARLY,"famed is thy beauty, majesty. but hold, a love...",NON-P,Snow White,mirror,1937,4
4,EARLY,alas for her ! reveal her name.,NON-P,Snow White,queen,1937,5


In [4]:
disney.tail()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
7743,LATE,we are never closing them again.,PRINCESS,Frozen,elsa,2013,984
7744,LATE,form on anna's boots.,PRINCESS,Frozen,elsa,2013,985
7745,LATE,"what? oh, elsa, they're beautiful, but you kno...",PRINCESS,Frozen,anna,2013,986
7746,LATE,look out. reindeer coming through!,NON-P,Frozen,kristoff,2013,987
7747,LATE,that's it. glide and pivot and glide and pivot.,NON-P,Frozen,olaf,2013,988


In [5]:
disney.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7748 entries, 0 to 7747
Data columns (total 7 columns):
Disney_Period       7748 non-null object
Text                7748 non-null object
Speaker_Status      7748 non-null object
Movie               7748 non-null object
Speaker             7748 non-null object
Year                7748 non-null int64
UTTERANCE_NUMBER    7748 non-null int64
dtypes: int64(2), object(5)
memory usage: 272.4+ KB


There are no null objects here! The corpus includes
* Which time period the movie was made in
* Lines of Dialogue (in Text)
* If the speaker is a princess, prince, or neither (in Speaker_Status)
* Which movie the line of dialogue is from (in Movie)
* Who uttered the line of dialogue (In Speaker)
* The year the movie was made (in Year)
* When the line was said in the movie (in UTTERANCE_NUMBER) 

This is good, but I need to double check that there aren't any typos in the categories.

1) What Periods are included?

In [6]:
disney.Disney_Period.value_counts()

MID      4155
LATE     2268
EARLY    1325
Name: Disney_Period, dtype: int64

Okay, three categories: MID, LATE, EARLY. No typos.

2) What status can a speaker have?

In [7]:
disney.Speaker_Status.value_counts()

NON-P       4925
PRINCESS    1793
PRINCE      1030
Name: Speaker_Status, dtype: int64

Looks like characters can be a princess, prince, or non-princess. This does not account for gender or for role in the story. Gender and Role columns will need to be added down the line.

3) What speakers are there?

In [8]:
disney.Speaker.value_counts()

anna                                    335
simba                                   241
aladdin                                 238
merida                                  181
tiana                                   176
kristoff                                159
belle                                   150
pocahontas                              142
john smith                              125
prince naveen                           125
jasmine                                 117
mulan                                   116
mushu                                   114
scar                                    114
genie                                   113
timon                                   112
cinderella                              112
elsa                                    111
jafar                                   111
flora                                   106
queen elinor                            103
olaf                                     93
snow white                      

In [9]:
disney.Speaker.describe()

count     7748
unique     424
top       anna
freq       335
Name: Speaker, dtype: object

424 unique speakers. Yikes. These might be easier to go through movie by movie, where character names will be easier to parse through. Also, look! A typo! "Sweep again and by then it's like 7" is a lyric Rapunzel sings, not a speaker! It looks like this data is not as neat as I'd originally hoped!

4) What Years are inlcuded?

In [10]:
disney.Year.value_counts()

2013    988
1994    952
1992    842
1991    772
2009    676
1995    638
1998    554
1950    497
1959    462
2012    411
1989    397
1937    366
2010    193
Name: Year, dtype: int64

These are all accurate and without typos.

In [11]:
disney.Movie.value_counts()

Frozen                        988
The Lion King                 952
Aladdin                       842
Beauty and the Beast          772
The Princess and the Frog     676
Pocahontas                    638
Mulan                         554
Cinderella                    497
Sleeping Beauty               462
Brave                         411
The Little Mermaid            397
Snow White                    366
Tangled                       193
Name: Movie, dtype: int64

These columns look good, and all their counts line up! Sweet! (These counts should line up with the utterance numbers for each film). Let's take a look at Tangled, since it's a little hairy.

In [12]:
disney_tangled = disney.loc[disney.Movie == 'Tangled']

In [13]:
disney_tangled.head()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER


Uh-oh. We have a problem. Apparently Tangled isn't a movie name? Something's up with that column

In [14]:
for m in disney.Movie[:10]:
    print('hi'+m+'hi')

hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi
hiSnow White hi


HAHA! So there's a ghost space that's messing things up.(this was actually in all entries, I just didn't want to flash all of it.) Okay, let's fix that.

In [15]:
disney.Movie = disney.Movie.map(lambda x: x.strip())

In [16]:
for m in disney.Movie[:10]:
    print('hi'+m+'hi')

hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi
hiSnow Whitehi


Yay! Fixed! (I hope)

In [17]:
disney_frozen = disney[disney.Movie == 'Frozen']

In [18]:
disney_frozen.head()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
6760,LATE,born of cold and winter air and mountain rain ...,NON-P,Frozen,ice harvesters,2013,1
6761,LATE,", and his reindeer calf, sven, share a carrot ...",NON-P,Frozen,ice harvesters,2013,2
6762,LATE,ice harvesters hup! ho! watch your step! let i...,NON-P,Frozen,ice harvesters,2013,3
6763,LATE,"ice harvesters stronger than one, stronger tha...",NON-P,Frozen,ice harvesters,2013,4
6764,LATE,ice harvesters born of cold and winter air and...,NON-P,Frozen,ice harvesters,2013,5


In [19]:
disney_frozen.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 988 entries, 6760 to 7747
Data columns (total 7 columns):
Disney_Period       988 non-null object
Text                988 non-null object
Speaker_Status      988 non-null object
Movie               988 non-null object
Speaker             988 non-null object
Year                988 non-null int64
UTTERANCE_NUMBER    988 non-null int64
dtypes: int64(2), object(5)
memory usage: 42.5+ KB


No null objects, which is good, but look at utterance number 2: that looks like a scene heading, NOT a line of dialogue!

## Song Lyrics
How are song lyrics treated in this data set? Is there an easy way to separate them out? The opening of Frozen is a song, so let's see what we have there

In [20]:
disney_frozen.Text.iloc[:10]

6760    born of cold and winter air and mountain rain ...
6761    , and his reindeer calf, sven, share a carrot ...
6762    ice harvesters hup! ho! watch your step! let i...
6763    ice harvesters stronger than one, stronger tha...
6764    ice harvesters born of cold and winter air and...
6765    ice harvesters this icy force both foul and fa...
6766               ext. the kingdom of arendelle — night 
6767    sleeps in her bed. her little sister anna: (5)...
6768    elsa. psst. elsa! psst. elsa doesn't stir. ann...
6769                          wake up. wake up. wake up. 
Name: Text, dtype: object

In [21]:
disney_frozen.Text.iloc[0]

'born of cold and winter air and mountain rain combining, this icy force both foul and fair has a frozen heart worth mining. the men drag giant ice blocks through channels of water. ice harvesters cut through the heart, cold and clear. strike for love and strike for fear. see the beauty sharp and sheer. split the ice apart! and break the frozen heart. hup! ho! watch your step! let it go! '

In [22]:
disney_frozen.Text.iloc[1] #Here's that pesky scene heading!

', and his reindeer calf, sven, share a carrot as they try to keep up with the men.'

In [23]:
for line in disney_frozen.Text[:10]:
    print(line)

born of cold and winter air and mountain rain combining, this icy force both foul and fair has a frozen heart worth mining. the men drag giant ice blocks through channels of water. ice harvesters cut through the heart, cold and clear. strike for love and strike for fear. see the beauty sharp and sheer. split the ice apart! and break the frozen heart. hup! ho! watch your step! let it go! 
, and his reindeer calf, sven, share a carrot as they try to keep up with the men.
ice harvesters hup! ho! watch your step! let it go! kristoff struggles to get a block of ice out of the water. he fails, ends up soaked. sven licks his wet cheek. ice harvesters beautiful! powerful! dangerous! cold! ice has a magic can't be controlled. 
ice harvesters stronger than one, stronger than ten stronger than a hundred men! 
ice harvesters born of cold and winter air and mountain rain combining 
ice harvesters this icy force both foul and fair has a frozen heart worth mining. cut through the heart, cold and clea

WOW! A closer look at this data very quickly reveals that the utterances aren't really utterances--they're parts of a script, including scene headers! This data is all lowercase text, too, unlike the shrek script (for that code, refer to my progress report). Here we'll just run a quick example of what might not be considered dialogue.

In [24]:
not_line = [i for i in disney_frozen.Text if 'ext.' in i]

In [25]:
len(not_line)

4

In [26]:
print(not_line)

['ext. the kingdom of arendelle — night ', 'faster, sven! ext. the valley of the living rock — night ', 'ext. mountain forest clearing — day', "olaf oh, look at that. i've been impaled. he laughs it off. ext. steep mountain face — day"]


Is this true for all the movies?

In [27]:
disney_sb = disney[disney.Movie == 'Sleeping Beauty']
disney_sb.head()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
863,EARLY,"in a far away land, long ago, lived a king and...",NON-P,Sleeping Beauty,narrator,1959,1
864,EARLY,"joyfully now to our princess we come, bringing...",NON-P,Sleeping Beauty,choir,1959,2
865,EARLY,thus on this great and joyous day did all the ...,NON-P,Sleeping Beauty,narrator,1959,3
866,EARLY,"their royal highnesses, king hubert and prince...",NON-P,Sleeping Beauty,announcer,1959,4
867,EARLY,fondly had these monarchs dreamed one day thei...,NON-P,Sleeping Beauty,narrator,1959,5


In [28]:
disney_sb.tail()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
1320,EARLY,"i know you, i walked with you once upon a dream",NON-P,Sleeping Beauty,choir,1959,458
1321,EARLY,blue!,NON-P,Sleeping Beauty,merryweather,1959,459
1322,EARLY,"i know you, the gleam in your eyes is so famil...",NON-P,Sleeping Beauty,choir,1959,460
1323,EARLY,and i know it's true that visions are seldom a...,NON-P,Sleeping Beauty,choir,1959,461
1324,EARLY,you'll love me at once the way you did once up...,NON-P,Sleeping Beauty,choir,1959,462


In [29]:
disney_sb.Text.iloc[175:225]

1038                               but i wanted it blue. 
1039           now, dear, we decided pink was her color. 
1040                                        you decided! 
1041             two eggs, fold in gently fold? oh well. 
1042                                    i can't breathe! 
1043                                     it looks awful. 
1044                   that's because it's on you, dear. 
1045                            now yeast, one tsp. tsp? 
1046                                       one teaspoon! 
1047                            one teaspoon, of course. 
1048                oh gracious how the child has grown. 
1049    oh, it seems only yesterday we brought her here. 
1050                                   just a tiny baby. 
1051                                   why merryweather! 
1052                        whatever's the matter, dear? 
1053    after the day she'll be a princess, and we won...
1054                                           oh flora! 
1055          

At first glance, Sleeping Beauty doesn't appear to have screenplay annotations like Frozen. So this data is VERY varying in accuracy. However, it looks like song lyrics are included (See entry 1063)! These will have to be marked up. (These should be quick finds, just a matter of marking up rows that contain lyrics.)

# Moana
I have to create my own csv file for Moana. I don't need to include Movie, Year, or Disney_Period labels for this one because I can easily add that to a dataframe! But, I will initially include text, speaker, if the line is dialogue (D) or a song (S), and if the line marks the beginning of a scene. Then, I can add role, gender, speaker_status, etc! I have to say, gathering the data to create the csv file in the first place was probably the most time consuming part of this process!

In [30]:
moana = pd.read_csv(r'C:\Users\cassi\Desktop\moana.csv') #importing my personal csv file

In [31]:
moana.head() #a glimpse at the data

Unnamed: 0,Text,Speaker,Song,Start_Scene
0,"in the beginning, there was only ocean until t...",tala,D,Y
1,"Whoa, whoa, whoa! mother, that's enough.",tui,D,
2,papa!,young moana,D,
3,No one goes outside the reef. We are safe here...,tui,D,
4,Monsters! Monsters! Monsters!,children,D,


In [32]:
moana.info() #checking for null entries

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 382 entries, 0 to 381
Data columns (total 4 columns):
Text           382 non-null object
Speaker        382 non-null object
Song           382 non-null object
Start_Scene    21 non-null object
dtypes: object(4)
memory usage: 6.0+ KB


Based on how I divied up the data, there are 21 scenes. I marked the first line in the scene so that when I eventually analyze my data, any kind of dialogue interaction analysis won't include lines that overlap scenes (which would lead to inaccurate results). This is why there are so many null objects in Start_Scene -- i didn't bother marking the line of dialogue if it didn't begin a movie scene.

## Marking Gander, Speaker_Status, and Role
### Gender

In [33]:
moana.Speaker.describe() #20 speakers in Moana

count       382
unique       20
top       moana
freq        167
Name: Speaker, dtype: object

In [34]:
moana.Speaker.value_counts() #all their names--no repeats!

moana                167
maui                 108
tui                   36
tala                  24
tamatoa                9
sina                   8
male villager 3        5
young moana            5
chorus                 4
children               3
female villager 1      2
male villager 2        2
male villager 1        2
old male villager      1
male villager 7        1
male villager 5        1
male villager 8        1
male villager 4        1
male villager 6        1
female villager        1
Name: Speaker, dtype: int64

I'm constructing lists of characters based on gender. The same process will be used for other movies, and then those films can all use the generic function found below 

In [35]:
female = ['moana', 'tala', 'sina', 'young moana', 'female villager 1', 'female villager']
male = ['maui', 'tui', 'tamatoa', 'male villager 1', 'male villager 2', 'male villager 3', 'male villager 4',
       'male villager 5', 'male villager 6', 'male villager 7', 'male villager 8', 'old male villager']
neutral = ['children', 'chorus']
len(set(female+male+neutral)) #a quick check that this is 20 people

20

In [36]:
def whichgen(name):
    if name in female: return 'f'
    elif name in male: return 'm'
    else: return 'n'
moana["Gender"] = moana.Speaker.map(whichgen)

In [37]:
moana.head()

Unnamed: 0,Text,Speaker,Song,Start_Scene,Gender
0,"in the beginning, there was only ocean until t...",tala,D,Y,f
1,"Whoa, whoa, whoa! mother, that's enough.",tui,D,,m
2,papa!,young moana,D,,f
3,No one goes outside the reef. We are safe here...,tui,D,,m
4,Monsters! Monsters! Monsters!,children,D,,n


### Speaker_Status
in this film, the only princess is moana, and everyone else is a NON-P. Again, I'm using a generic function in case I want to use it again (although the movies outside disney I'm looking at don't usually involve princesses).

In [38]:
def whichstat(name):
    if name == 'moana' or name == 'young moana': return 'PRINCESS'
    else: return 'NON-P'

moana['Speaker_Status'] = moana.Speaker.map(whichstat)

In [39]:
moana.head()

Unnamed: 0,Text,Speaker,Song,Start_Scene,Gender,Speaker_Status
0,"in the beginning, there was only ocean until t...",tala,D,Y,f,NON-P
1,"Whoa, whoa, whoa! mother, that's enough.",tui,D,,m,NON-P
2,papa!,young moana,D,,f,PRINCESS
3,No one goes outside the reef. We are safe here...,tui,D,,m,NON-P
4,Monsters! Monsters! Monsters!,children,D,,n,NON-P


### Role
Is the speaker a protagonist, antagonist, helper, or neutral. I'm listing Tui, Moana's father, as an antagonist along with tamatoa, since he prevents her from going on her journey at the start of the film. Moana and Maui are going to be listed as protagonists, while tala and sina can be listed as helpers who aid Moana and Maui on their journey. Perhaps later on I may refine this (Maui is arguably a helper, and towards the beginning he acts more like an antagonist). That will be an easy edit, as long as I have editable code down. Again, I will craft lists and a generic function.

In [40]:
pro = ['young moana','moana', 'maui']
ant = ['tui', 'tamatoa']
helper = ['sina', 'tala']


In [41]:
def whichrole(name):
    if name in pro: return 'PRO'
    if name in ant: return 'ANT'
    if name in helper: return 'HELPER'
    else: return "N" #for neutral

In [42]:
moana['Role'] = moana.Speaker.map(whichrole)

In [43]:
moana.head()

Unnamed: 0,Text,Speaker,Song,Start_Scene,Gender,Speaker_Status,Role
0,"in the beginning, there was only ocean until t...",tala,D,Y,f,NON-P,HELPER
1,"Whoa, whoa, whoa! mother, that's enough.",tui,D,,m,NON-P,ANT
2,papa!,young moana,D,,f,PRINCESS,PRO
3,No one goes outside the reef. We are safe here...,tui,D,,m,NON-P,ANT
4,Monsters! Monsters! Monsters!,children,D,,n,NON-P,N


## Adding pre-existing columns
Cool! Now, let's annotate this Moana data with data from the original corpus!

First, let's make all the text lower case. This will make it easier to parse through, and it makes the text formatting identical to that in the original corpus

In [44]:
moana.Text = moana.Text.map(lambda x: x.lower())

In [45]:
moana.head()

Unnamed: 0,Text,Speaker,Song,Start_Scene,Gender,Speaker_Status,Role
0,"in the beginning, there was only ocean until t...",tala,D,Y,f,NON-P,HELPER
1,"whoa, whoa, whoa! mother, that's enough.",tui,D,,m,NON-P,ANT
2,papa!,young moana,D,,f,PRINCESS,PRO
3,no one goes outside the reef. we are safe here...,tui,D,,m,NON-P,ANT
4,monsters! monsters! monsters!,children,D,,n,NON-P,N


In [46]:
moana["Movie"] = 'Moana'
moana['Year'] = 2016
moana['Disney_Period'] = 'LATE'

In [47]:
moana.head()

Unnamed: 0,Text,Speaker,Song,Start_Scene,Gender,Speaker_Status,Role,Movie,Year,Disney_Period
0,"in the beginning, there was only ocean until t...",tala,D,Y,f,NON-P,HELPER,Moana,2016,LATE
1,"whoa, whoa, whoa! mother, that's enough.",tui,D,,m,NON-P,ANT,Moana,2016,LATE
2,papa!,young moana,D,,f,PRINCESS,PRO,Moana,2016,LATE
3,no one goes outside the reef. we are safe here...,tui,D,,m,NON-P,ANT,Moana,2016,LATE
4,monsters! monsters! monsters!,children,D,,n,NON-P,N,Moana,2016,LATE


In [48]:
moana["UTTERANCE_NUMBER"] = moana.Text.index + 1 #just adding one to its spot in the dataframe!


In [49]:
moana.head()

Unnamed: 0,Text,Speaker,Song,Start_Scene,Gender,Speaker_Status,Role,Movie,Year,Disney_Period,UTTERANCE_NUMBER
0,"in the beginning, there was only ocean until t...",tala,D,Y,f,NON-P,HELPER,Moana,2016,LATE,1
1,"whoa, whoa, whoa! mother, that's enough.",tui,D,,m,NON-P,ANT,Moana,2016,LATE,2
2,papa!,young moana,D,,f,PRINCESS,PRO,Moana,2016,LATE,3
3,no one goes outside the reef. we are safe here...,tui,D,,m,NON-P,ANT,Moana,2016,LATE,4
4,monsters! monsters! monsters!,children,D,,n,NON-P,N,Moana,2016,LATE,5


## Taking out Lion King
Right now, I don't want to look at the film "The Lion King". However, I might want to look at it later. I'll take it out of this dataframe and save it for later.

In [50]:
lk = disney[disney.Movie == "The Lion King"]

In [51]:
lk.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 952 entries, 3336 to 4287
Data columns (total 7 columns):
Disney_Period       952 non-null object
Text                952 non-null object
Speaker_Status      952 non-null object
Movie               952 non-null object
Speaker             952 non-null object
Year                952 non-null int64
UTTERANCE_NUMBER    952 non-null int64
dtypes: int64(2), object(5)
memory usage: 40.9+ KB


In [52]:
lk.head()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
3336,MID,background singer,NON-P,The Lion King,background singers,1994,1
3337,MID,nants ingonyama bagithi baba,NON-P,The Lion King,male singer,1994,2
3338,MID,sithi uhhmm ingonyama,NON-P,The Lion King,background singers,1994,3
3339,MID,nants ingonyama bagithi baba,NON-P,The Lion King,male singer,1994,4
3340,MID,sithi uhm ingonyama ingonyama,NON-P,The Lion King,background singers,1994,5


In [53]:
lk.tail()

Unnamed: 0,Disney_Period,Text,Speaker_Status,Movie,Speaker,Year,UTTERANCE_NUMBER
4283,MID,ubuse ngo thando,NON-P,The Lion King,male singer,1994,948
4284,MID,ubuse ngo xolo,NON-P,The Lion King,male singer,1994,949
4285,MID,ingonyama nengw' enamabala ingonyama nengw' en...,NON-P,The Lion King,background singer,1994,950
4286,MID,(ngw' enamabalawa),NON-P,The Lion King,male singer,1994,951
4287,MID,till we find our place on the path unwinding i...,NON-P,The Lion King,full chorus,1994,952


In [54]:
lk.to_csv('lion_king.csv') #writing to a csv file to be referenced later

In [55]:
disney_princess = disney[disney.Movie != 'The Lion King'] #removing lion king

In [56]:
disney_princess.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6796 entries, 0 to 7747
Data columns (total 7 columns):
Disney_Period       6796 non-null object
Text                6796 non-null object
Speaker_Status      6796 non-null object
Movie               6796 non-null object
Speaker             6796 non-null object
Year                6796 non-null int64
UTTERANCE_NUMBER    6796 non-null int64
dtypes: int64(2), object(5)
memory usage: 292.0+ KB


The data is missing the correct number of entries! Now all that there's left to do is add moana to the existing data frame, and export it to a new csv file! Because Moana currently has Scene, gender, song, and role markers, a merger may result in a lot of null entries! These nullities will be revised when the existing data is modified.

In [57]:
disney_princess_new = disney_princess.append(moana, ignore_index=True, sort=True) #reindexes and lines up columns

In [58]:
disney_princess_new.info() #should have 7178 entries: 6796 (Disney data) + 382 (Moana)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7178 entries, 0 to 7177
Data columns (total 11 columns):
Disney_Period       7178 non-null object
Gender              382 non-null object
Movie               7178 non-null object
Role                382 non-null object
Song                382 non-null object
Speaker             7178 non-null object
Speaker_Status      7178 non-null object
Start_Scene         21 non-null object
Text                7178 non-null object
UTTERANCE_NUMBER    7178 non-null int64
Year                7178 non-null int64
dtypes: int64(2), object(9)
memory usage: 364.5+ KB


In [59]:
disney_princess_new.tail() #checking that reindexing worked

Unnamed: 0,Disney_Period,Gender,Movie,Role,Song,Speaker,Speaker_Status,Start_Scene,Text,UTTERANCE_NUMBER,Year
7173,LATE,m,Moana,ANT,D,tui,NON-P,,it suits you.,378,2016
7174,LATE,m,Moana,N,D,male villager 8,NON-P,,she's back!,379,2016
7175,LATE,f,Moana,N,D,female villager,NON-P,,moana!,380,2016
7176,LATE,f,Moana,PRO,D,moana,PRINCESS,,pua!,381,2016
7177,LATE,n,Moana,N,S,chorus,NON-P,,we set a course to find a brand new island eve...,382,2016


In [62]:
disney_princess_new.to_csv(r"C:\Users\cassi\Desktop\Data_Science\Animated-Movie-Gendered-Dialogue\private\Disney_1938_2016.csv") #a new csv file with all disney princess movies!