# Creating a function to split scripts
I want to make a function to split up the scripts into a dictionary with speaker name as the key and all of their lines as a string put together.

* **Input**: a string containing a full movie script with this format:

```
    SPEAKER
    Hello, my name is speaker

    SPEAKER2
    Hi, my name is speaker2

    SPEAKER
    Hi speaker2
```
* **Output**: a dictionary that looks like this:

```
    {'SPEAKER': 'Hello, my name is speaker. Hi speaker2', 
     'SPEAKER2': 'Hi, my name is speaker2.'}
```

## Importing the scripts
First, I'll make a function to easily import the scripts from the moviescripts folder as a string. We'll use the Megamind script as a sample.

In [27]:
import re
import os

def read_script(filename):
    ''' takes the script's filename
    and outputs it as a string'''
    path = os.path.join(os.path.abspath(''), 'moviescripts', filename)
    script_file = open(path)
    script = script_file.read()
    script_file.close()
    return script

megamind = read_script('megamind.txt')
megamind[1:200]


'MEGAMIND\n\n\n\nWritten by\n\nAlan Schoolcraft & Brent Simons\n\n\n\n\nCREDITS SEQUENCE\n\nNEWSPAPER HEADLINE MONTAGE:\n\nHEADLINES flash before us, displaying their accompanying\nphotographs.\n\n"UBERMAN - METRO CITY'

## Script-to-dictionary function
Now we've covered the importing, we need to actually convert them into a dictionary as outlined above.

In [29]:

def script_to_dict(script):
    ''' Takes in a script (string object) and outputs
    a dictionary with each character and their spoken lines '''

    pars = re.split(r'\n\n+', script, maxsplit=0)
    d = {}

    for p in pars:
        # Capture the name (anchored to the beginning of line and all capitals)
        # and the rest of the paragraph - (.*)
        regex = re.search(r'^([A-Z]+ [A-Z]+|[A-Z]+)(.*)', p, re.S + re.M)

        if not regex:  # Avoid calling group() on null results
            continue

        name, txt = regex.group(1, 2) 

        # Each sentence as a list item
        if name in d:
            d[name] += txt.strip().split('\n')
        else:
            d[name] = txt.strip().split('\n')
    
    for key in d:
        d[key] = ' '.join(d[key])
    
    return d

megamind_dict = script_to_dict(megamind)
d1['MASTER MIND'][1:200] # perfect


'he real Einstein once said, "God does not play dice with the world." He was right, because the world is MY dice. Is that understood? Alright, then - clean slate. Do we have the girl? Reporters are a '

## Generating the rest of the movies
Now that we've seen it works on Megamind, let's try it for everyone else!

**Note**: there's probably a faster way of doing this (can you iterate through a folder?) so maybe ask about that lol. Perhaps make a list of the file names and iterate through that...

In [32]:
#addams_family = read_script('addams_family.txt')
american_psycho = read_script('american-psycho.txt')
avengers = read_script('avengers.txt')
dumb_and_dumber = read_script('dumb_and_dumber.txt')
finding_nemo = read_script('finding_nemo.txt')
harold_kumar = read_script('harold_kumar_white_castle.txt')
#harry_potter = read_script('harry_potter_chamber.txt')
indiana_jones = read_script('indiana_jones_raiders.txt')
it = read_script('it.txt')
lord_rings = read_script('lord_of_rings_return.txt')
#titanic = read_script('titanic.txt')
twilight = read_script('twilight.txt')

#addams_family_dict = script_to_dict(addams_family)
american_psycho_dict = script_to_dict(american_psycho)
avengers_dict = script_to_dict(avengers)
dumb_and_dumber_dict = script_to_dict(dumb_and_dumber)
finding_nemo_dict = script_to_dict(finding_nemo)
harold_kumar_dict = script_to_dict(harold_kumar)
#harry_potter_dict = script_to_dict(harry_potter)
indiana_jones_dict = script_to_dict(indiana_jones)
it_dict = script_to_dict(it)
lord_rings_dict = script_to_dict(lord_rings)
#titanic_dict = script_to_dict(titanic)
twilight_dict = script_to_dict(twilight)



Upon further inspection, Addams Family, Harry Potter, and Titanic are not formatted correctly and the function will not work. Find a replacement or fix it somehow!

{'THE AVENGERS': "  . (Loki looks at him, confused) It's what we call ourselves, sort of like a team. `EARTH'S MIGHTIEST HEROES' type of thing. look up. Way out of their fucking element.",
 'A': 'ssemble!" gent Coulson leads Hill and Fury through the radiation section of the facility. Hundreds of technicians and other staff run around, taking only the essentials. gent Hill SLIPS into a JEEP and follows after Barton\'s truck.Loki\'s trucks SCREECH across the tunnel. Several SHIELD trucks pull up to them. A drive-by shooting ensues. Loki, who stands on top of the bed of the truck, uses his scepter and EMITS energy blasts, flipping over SHIELD trucks. They get in, the cars roar out after them. Agent Hill puts herself at a distance. gent Coulson and several SHIELD agents fall down the steps, dropping SILVER CASES of information. They attempt to grab them, but... gent Hill\'s JEEP ROARS out of a side of Barton\'s truck and pulls up alongside them on the left. She goes way ahead and pulls he