In [201]:
# Space Wars: Star Trek through the Years

## Step 00: Extracting Dates of Release

Let's load the data from the mongoDB:

In [202]:
from pymongo import MongoClient
import pandas as pd
import numpy as np
import re

client = MongoClient()
db = client.StarTrek_Database
scr = db.scripts

I have previously ran [Scrapy](https://scrapy.org/) on this wonderful [web page]() to download all star trek scripts into a JSON file. I then loaded the scripts into my Mongo database using the following lines:  
```python
import json
with open('Data/st_scripts.json') as data_file:    
    data = json.load(data_file)
for episode in data:
    scr.insert_one(episode)
```

In [203]:
df = pd.DataFrame(list(scr.find()))

In [204]:
df.head()

Unnamed: 0,_id,end,raw_text,series,start,url
0,598a2572fcd2e313d769f6fc,various,[Screenplay by:\n G...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt
1,598a2572fcd2e313d769f6fd,1994,[STAR TREK: THE NEXT GENERATION \n ...,The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt
2,598a2572fcd2e313d769f6fe,1994,[STAR TREK: THE NEXT GENERATION \n ...,The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt
3,598a2572fcd2e313d769f6ff,1994,[STAR TREK: THE NEXT GENERATION \n ...,The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt
4,598a2572fcd2e313d769f700,1994,[STAR TREK: THE NEXT GENERATION \n ...,The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt


In [205]:
df.loc[300, 'raw_text']



Let's remove all the tabs and extra characters:

In [206]:
df.raw_text = df.raw_text.apply(lambda x: " ".join(x[0].split()))

In [207]:
df.loc[300, 'raw_text']



I would like to extract the original airdate on per episode basis to track how topics change through time:

In [208]:
df['airdate'] = np.nan
df.head()

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,


In [209]:
df[df.series == "The Original Series"].shape

(56, 7)

In [210]:
df[df.series == "The Original Series"].iloc[0,2]



In [211]:
t = df[df.series == "The Original Series"].iloc[40,2]
re.findall(r"Airdate: (\w+ \d+, \d+) \D+", t)[0]

'Nov 17, 1966'

Now we need to extend this to the entire database for the Original Series:

In [212]:
def orig_series_airdate(x):
    ans = (re.findall(r"Original Airdate: (\w+ \d+, \d+) \D+", x))
    if len(ans)>0:
        return ans[0]
    else:
        return np.nan
df.loc[df.series == "The Original Series", "airdate"] = df[df.series == "The Original Series"].iloc[:,2].map(orig_series_airdate)

Let's check if the date search was successful:

In [213]:
df[df.series == "The Original Series"].iloc[:,6]

16    Mar 29, 1968
17     Mar 1, 1968
18     Mar 8, 1968
19    Feb 16, 1968
20     Feb 9, 1968
21    Feb 23, 1968
22    Jan 12, 1968
23    Jan 19, 1968
24    Dec 15, 1967
25     Jan 5, 1968
26     Feb 2, 1968
27    Nov 17, 1967
28    Mar 15, 1968
29    Dec 29, 1967
30     Nov 3, 1967
31     Dec 8, 1967
32     Oct 6, 1967
33    Oct 13, 1967
34    Sep 29, 1967
35    Dec 22, 1967
36    Oct 20, 1967
37    Sep 15, 1967
38    Sep 22, 1967
39     Dec 1, 1967
40    Nov 10, 1967
41    Oct 27, 1967
42    Apr 13, 1967
43     Apr 6, 1967
44    Mar 23, 1967
45     Mar 9, 1967
46     Mar 2, 1967
47    Feb 16, 1967
48    Feb 23, 1967
49     Feb 9, 1967
50    Jan 26, 1967
51    Mar 30, 1967
52    Jan 19, 1967
53    Jan 12, 1967
54    Dec 29, 1966
55    Nov 24, 1966
56    Nov 17, 1966
57    Feb 02, 1967
58     Jan 5, 1967
59     Dec 8, 1966
60    Oct 27, 1966
61     Nov 3, 1966
62    Oct 20, 1966
63    Dec 15, 1966
64    Sep 15, 1966
65    Sep 29, 1966
66     Sep 8, 1966
67     Oct 6, 1966
68    Oct 13

The last episode seems to have a diferent date format:

In [214]:
df[df.series == "The Original Series"].iloc[:,2][71]

"Title: The Cage Stardate: Unknown Airdate: 1988-10-04 [Bridge] SPOCK: Check the circuit. TYLER: All operating, sir. SPOCK: It can't be the screen then. Definitely something out there, Captain, headed this way. TYLER: It could be these meteorites. ONE: No, it's something else. There's still something out there. TYLER: It's coming at the speed of light, collision course. The meteorite beam has not deflected it, Captain. ONE: Evasive manoeuvres, sir? PIKE: Steady as we go. GARISON: It's a radio wave, sir. We're passing through an old-style distress signal. PIKE: They were keyed to cause interference and attract attention this way. GARISON: A ship in trouble making a forced landing, sir. That's it. No other message. TYLER: I have a fix. It comes from the Talos star group. ONE: We've no ships or Earth colonies that far out. SPOCK: Their call letters check with a survey expedition. SS Columbia. It disappeared in that region approximately eighteen years ago. TYLER: It would take that long fo

Actually, this episode was never aired, so I will use the date mentioned in [its Wikipedia article](https://en.wikipedia.org/wiki/The_Cage_(Star_Trek:_The_Original_Series)), instead:

In [215]:
df.set_value(71, 'airdate', 'Feb 1, 1966')

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,
5,598a2572fcd2e313d769f701,1994,"STAR TREK: THE NEXT GENERATION Skin of Evil ""f...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/122.txt,
6,598a2572fcd2e313d769f702,1994,"STAR TREK: THE NEXT GENERATION ""Symbiosis"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/123.txt,
7,598a2572fcd2e313d769f703,1994,"STAR TREK: THE NEXT GENERATION ""The Arsenal Of...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/121.txt,
8,598a2572fcd2e313d769f704,1994,"STAR TREK: THE NEXT GENERATION ""Heart Of Glory...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/120.txt,
9,598a2572fcd2e313d769f705,1994,"STAR TREK: THE NEXT GENERATION ""Coming Of Age""...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/119.txt,


In [216]:
df[df.series == "The Original Series"].iloc[:,6][71]

'Feb 1, 1966'

Now, for the rest of all series:

### Series "The Next Generation" airdate extraction:

In [217]:
df[df.series == "The Next Generation"].shape

(176, 7)

In [218]:
df[df.series == "The Next Generation"].iloc[0,2]

'STAR TREK: THE NEXT GENERATION "The Child" #40272-127 Based on a Premise by Jaron Summers and Jon Povill Teleplay by Maurice Hurley Directed by Rob Bowman THE WRITING CREDITS MAY NOT BE FINAL AND SHOULD NOT BE USED FOR PUBLICITY OR ADVERTISING PURPOSES WITHOUT FIRST CHECKING WITH THE TELEVISION LEGAL DEPARTMENT. Copyright 1988 Paramount Pictures Corporation. All Rights Reserved. This script is not for publication or reproduction. No one is authorized to dispose of same. If lost or destroyed, please notify the Script Department. 3RD REV. FINAL DRAFT SEPTEMBER 20, 1988 STAR TREK: "The Child" - 9/20/88 - CAST STAR TREK: THE NEXT GENERATION "The Child" CAST PICARD HESTER DEALT RIKER DATA PULASKI Voice-Over TROI REPULSE VOICE GEORDI WORF WESLEY GUINAN CREWMEMBER TRANSPORTER CHIEF TEACHER (MISS GLADSTONE) IAN (BABY) IAN (AGE THREE) IAN (AGE EIGHT) Non-Speaking CREWMEMBERS MEDICAL ASSISTANTS SECURITY TEAM GROUP OF KIDS Voice-Over SICKBAY VOICE COMPUTER VOICE STAR TREK: "The Child" - 9/20/88 

In [219]:
t = df[df.series == "The Next Generation"].iloc[0,2]
re.findall(r"(\d+/\d+/\d+) \D+", t)[0]

'9/20/88'

In [220]:
def tng_airdate(x):
    ans = (re.findall(r"(\d+/\d+/\d+) \D+", x))
    if len(ans)>0:
        return ans[0]
    else:
        return np.nan
df.loc[df['series'] == "The Next Generation", 'airdate'] = df.loc[df['series'] == "The Next Generation", 'raw_text'].map(tng_airdate)

Let's check if the date search was successful:

In [221]:
df[df.series == "The Next Generation"].airdate

1       9/20/88
2       3/18/88
3        3/8/88
4       2/22/88
5        2/1/88
6       2/17/88
7       1/25/88
8       1/13/88
9      12/30/87
10     12/11/87
11      12/2/87
12     11/19/87
13      11/9/87
14     10/26/87
15     10/14/87
508    03/14/94
509    03/01/94
510    02/17/94
511    02/10/94
512    01/28/94
513    01/20/94
514    01/07/94
515    12/21/93
516    12/09/93
517    11/30/93
518    11/17/93
519    10/18/93
520    10/27/93
521    10/18/93
522    10/07/93
         ...   
639     4/10/89
640     3/29/89
641     3/17/89
642      3/7/89
643     2/24/89
644     2/10/89
645      2/8/89
646      2/7/89
647    01/10/89
648    01/10/89
649    12/23/88
650    12/14/88
651    12/02/88
652    11/10/88
653     11/4/88
654    10/10/88
655     10/4/88
656    10/12/88
657     9/27/88
658     10/6/87
659     9/25/87
660     9/14/87
661      9/4/87
662     8/21/87
663      8/7/87
664     7/31/87
665     7/13/87
666      7/9/87
667      7/1/87
668         NaN
Name: airdate, Length: 1

In [222]:
df[df.series == "The Next Generation"].airdate.isnull().sum()

1

In [223]:
df[df.series == "The Next Generation"].airdate[668]

nan

In [224]:
df[df.series == "The Next Generation"].raw_text[668]



Again, special case here. The airdate according to [Wikipedia](https://en.wikipedia.org/wiki/Encounter_at_Farpoint) is September 28, 1987, so will set it as such:

In [225]:
df.set_value(668, 'airdate', '9/28/1987')

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,9/20/88
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,3/18/88
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,3/8/88
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,2/22/88
5,598a2572fcd2e313d769f701,1994,"STAR TREK: THE NEXT GENERATION Skin of Evil ""f...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/122.txt,2/1/88
6,598a2572fcd2e313d769f702,1994,"STAR TREK: THE NEXT GENERATION ""Symbiosis"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/123.txt,2/17/88
7,598a2572fcd2e313d769f703,1994,"STAR TREK: THE NEXT GENERATION ""The Arsenal Of...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/121.txt,1/25/88
8,598a2572fcd2e313d769f704,1994,"STAR TREK: THE NEXT GENERATION ""Heart Of Glory...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/120.txt,1/13/88
9,598a2572fcd2e313d769f705,1994,"STAR TREK: THE NEXT GENERATION ""Coming Of Age""...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/119.txt,12/30/87


In [226]:
df[df.series == "The Next Generation"].airdate[668]

'9/28/1987'

### Series "Deep Space 9" airdate extraction:

In [227]:
df[df.series == "Deep Space 9"].shape

(173, 7)

In [228]:
df[df.series == "Deep Space 9"].iloc[0,2]



In [229]:
t = df[df.series == "Deep Space 9"].iloc[0,2]
re.findall(r"(\d+/\d+/\d+) \D+", t)[0]

'03/29/99'

In [230]:
def ds9_airdate(x):
    ans = (re.findall(r"(\d+/\d+/\d+) \D+", x))
    if len(ans)>0:
        return ans[0]
    else:
        return np.nan
df.loc[df['series'] == "Deep Space 9", 'airdate'] = df.loc[df['series'] == "Deep Space 9", 'raw_text'].map(ds9_airdate)

Let's check if the date search was successful:

In [231]:
df[df.series == "Deep Space 9"].airdate

335    03/29/99
336    03/18/99
337    03/05/99
338    02/24/99
339    02/16/99
340    02/03/99
341    01/26/99
342    01/15/99
343    01/04/99
344    12/01/98
345    12/09/98
346    11/18/98
347    11/09/98
348    10/26/98
349    10/19/98
350    10/08/98
351    09/28/98
352    09/18/98
353    09/09/98
354    08/25/98
355    08/14/98
356    08/04/98
357    07/24/98
358    07/15/98
359    07/01/98
360    04/08/98
361    03/27/98
362    03/20/98
363    03/09/98
364    02/27/98
         ...   
478    10/18/93
479    10/08/93
480    09/28/93
481    09/16/93
482    09/03/93
483    08/26/93
484    08/16/93
485    07/29/93
486    07/20/93
487    07/26/93
488    07/02/93
489         NaN
490     3/26/93
491    03/15/93
492      3/4/93
493    11/25/92
494         NaN
495    02/02/93
496    01/21/93
497     1/12/93
498     1/07/93
499    12/07/92
500    11/25/92
501    11/17/92
502    10/30/92
503    10/23/92
504    10/23/92
505    10/05/92
506    09/18/92
507    08/25/92
Name: airdate, Length: 1

In [232]:
df[df.series == "Deep Space 9"].airdate.isnull().sum()

2

In [233]:
df[df.series == "Deep Space 9"].raw_text[489]

'STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" #40511-420 Written by Robert Hewitt Wolfe Directed by David Livingston THE WRITING CREDITS MAY NOT BE FINAL AND SHOULD NOT BE USED FOR PUBLICITY OR ADVERTISING PURPOSES WITHOUT FIRST CHECKING WITH THE TELEVISION LEGAL DEPARTMENT. Copyright 1992 Paramount Pictures Corporation. All Rights Reserved. This script is not for publication or reproduction. No one is authorized to dispose of same. If lost or destroyed, please notify the Script Department. Return to Script Department FINAL DRAFT PARAMOUNT PICTURES CORPORATION April 2, 1993'

In [234]:
df[df.series == "Deep Space 9"].raw_text[494]

'STAR TREK: DEEP SPACE NINE "Progress" #40511-415 Written by Peter Allan Fields Directed by Les Landau THE WRITING CREDITS MAY NOT BE FINAL AND SHOULD NOT BE USED FOR PUBLICITY OR ADVERTISING PURPOSES WITHOUT FIRST CHECKING WITH THE TELEVISION LEGAL DEPARTMENT. Copyright 1992 Paramount Pictures Corporation. All Rights Reserved. This script is not for publication or reproduction. No one is authorized to dispose of same. If lost or destroyed, please notify the Script Department. Return to Script Department FINAL DRAFT PARAMOUNT PICTURES CORPORATION FEBRUARY 16, 1993'

It seems that my web scraper did not pick up much text form these two episodes. Therefore, I will need to fill them in by hand:

In [235]:
with open("Data/420.txt") as f: 
    t489 = f.readlines()
with open("Data/415.txt") as f: 
    t494 = f.readlines() 

In [236]:
t489

['                  STAR TREK: DEEP SPACE NINE \n',
 '                              \n',
 '                "In the Hands of the Prophets" \n',
 '                          #40511-420 \n',
 '                              \n',
 '                          Written by \n',
 '                      Robert Hewitt Wolfe \n',
 '                              \n',
 '                          Directed by \n',
 '                       David Livingston \n',
 '\n',
 'THE WRITING CREDITS MAY NOT BE FINAL AND SHOULD NOT BE USED \n',
 'FOR PUBLICITY OR ADVERTISING PURPOSES WITHOUT FIRST CHECKING \n',
 'WITH THE TELEVISION LEGAL DEPARTMENT.\n',
 '\n',
 'Copyright 1992 Paramount Pictures Corporation. All Rights \n',
 'Reserved. This script is not for publication or \n',
 'reproduction. No one is authorized to dispose of same. If \n',
 'lost or destroyed, please notify the Script Department.\n',
 '\n',
 'Return to Script Department          FINAL DRAFT\n',
 'PARAMOUNT PICTURES CORPORATION\n',
 '             

In [237]:
t489 = " ".join(" ".join(t489).split())
t494 = " ".join(" ".join(t494).split())

In [238]:
t489

'STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" #40511-420 Written by Robert Hewitt Wolfe Directed by David Livingston THE WRITING CREDITS MAY NOT BE FINAL AND SHOULD NOT BE USED FOR PUBLICITY OR ADVERTISING PURPOSES WITHOUT FIRST CHECKING WITH THE TELEVISION LEGAL DEPARTMENT. Copyright 1992 Paramount Pictures Corporation. All Rights Reserved. This script is not for publication or reproduction. No one is authorized to dispose of same. If lost or destroyed, please notify the Script Department. Return to Script Department FINAL DRAFT PARAMOUNT PICTURES CORPORATION April 2, 1993 <C>DEEP SPACE: "In the Hands... " - 04/05/93 - CAST STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" CAST SISKO NEELA O\'BRIEN WINN KIRA VENDOR ODO BAREIL BASHIR VOICES DAX QUARK KEIKO JAKE COMPUTER VOICE Non-Speaking Non-Speaking BAJORAN CREWMEMBERS MONKS VARIOUS STUDENTS DEEP SPACE: "In the Hands... " - 04/02/93 - SETS STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" SETS INTERIORS E

In [239]:
df.set_value(489, 'raw_text', t489)
df.set_value(494, 'raw_text', t494)

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,9/20/88
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,3/18/88
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,3/8/88
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,2/22/88
5,598a2572fcd2e313d769f701,1994,"STAR TREK: THE NEXT GENERATION Skin of Evil ""f...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/122.txt,2/1/88
6,598a2572fcd2e313d769f702,1994,"STAR TREK: THE NEXT GENERATION ""Symbiosis"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/123.txt,2/17/88
7,598a2572fcd2e313d769f703,1994,"STAR TREK: THE NEXT GENERATION ""The Arsenal Of...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/121.txt,1/25/88
8,598a2572fcd2e313d769f704,1994,"STAR TREK: THE NEXT GENERATION ""Heart Of Glory...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/120.txt,1/13/88
9,598a2572fcd2e313d769f705,1994,"STAR TREK: THE NEXT GENERATION ""Coming Of Age""...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/119.txt,12/30/87


In [240]:
df[df.series == "Deep Space 9"].raw_text[489]

'STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" #40511-420 Written by Robert Hewitt Wolfe Directed by David Livingston THE WRITING CREDITS MAY NOT BE FINAL AND SHOULD NOT BE USED FOR PUBLICITY OR ADVERTISING PURPOSES WITHOUT FIRST CHECKING WITH THE TELEVISION LEGAL DEPARTMENT. Copyright 1992 Paramount Pictures Corporation. All Rights Reserved. This script is not for publication or reproduction. No one is authorized to dispose of same. If lost or destroyed, please notify the Script Department. Return to Script Department FINAL DRAFT PARAMOUNT PICTURES CORPORATION April 2, 1993 <C>DEEP SPACE: "In the Hands... " - 04/05/93 - CAST STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" CAST SISKO NEELA O\'BRIEN WINN KIRA VENDOR ODO BAREIL BASHIR VOICES DAX QUARK KEIKO JAKE COMPUTER VOICE Non-Speaking Non-Speaking BAJORAN CREWMEMBERS MONKS VARIOUS STUDENTS DEEP SPACE: "In the Hands... " - 04/02/93 - SETS STAR TREK: DEEP SPACE NINE "In the Hands of the Prophets" SETS INTERIORS E

In [241]:
df.loc[df['series'] == "Deep Space 9", 'airdate'] = df.loc[df['series'] == "Deep Space 9", 'raw_text'].map(ds9_airdate)

In [242]:
df[df.series == "Deep Space 9"].airdate

335    03/29/99
336    03/18/99
337    03/05/99
338    02/24/99
339    02/16/99
340    02/03/99
341    01/26/99
342    01/15/99
343    01/04/99
344    12/01/98
345    12/09/98
346    11/18/98
347    11/09/98
348    10/26/98
349    10/19/98
350    10/08/98
351    09/28/98
352    09/18/98
353    09/09/98
354    08/25/98
355    08/14/98
356    08/04/98
357    07/24/98
358    07/15/98
359    07/01/98
360    04/08/98
361    03/27/98
362    03/20/98
363    03/09/98
364    02/27/98
         ...   
478    10/18/93
479    10/08/93
480    09/28/93
481    09/16/93
482    09/03/93
483    08/26/93
484    08/16/93
485    07/29/93
486    07/20/93
487    07/26/93
488    07/02/93
489    04/05/93
490     3/26/93
491    03/15/93
492      3/4/93
493    11/25/92
494    02/16/93
495    02/02/93
496    01/21/93
497     1/12/93
498     1/07/93
499    12/07/92
500    11/25/92
501    11/17/92
502    10/30/92
503    10/23/92
504    10/23/92
505    10/05/92
506    09/18/92
507    08/25/92
Name: airdate, Length: 1

### Series "Voyager" airdate extraction:

In [243]:
df[df.series == "Voyager"].shape

(167, 7)

In [244]:
df[df.series == "Voyager"].iloc[0,2]



It seems like the airdate format for "Voyager" follows the same RegEx pattern (\w+ \d+, \d+) as "The Original Series", so I am going to reuse the same function to extract airdates:

In [245]:
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(orig_series_airdate)

Let's check if the date search was successful:

In [246]:
df[df.series == "Voyager"].airdate

168         May 23, 2001
169         May 16, 2001
170          May 9, 2001
171          May 2, 2001
172       April 25, 2001
173       April 18, 2001
174       April 11, 2001
175        March 7, 2001
176    February 28, 2001
177    February 21, 2001
178    February 14, 2001
179     February 7, 2001
180     January 31, 2001
181     January 24, 2001
182     January 17, 2001
183    November 29, 2000
184    November 22, 2000
185    November 15, 2000
186     November 8, 2000
187     November 1, 2000
188     October 25, 2000
189         Oct 18, 2000
190     October 11, 2000
191      October 4, 2000
192                  NaN
193         May 17, 2000
194         May 10, 2000
195          May 3, 2000
196       April 26, 2000
197       April 19, 2000
             ...        
305     February 5, 1996
306                  NaN
307                  NaN
308                  NaN
309                  NaN
310                  NaN
311                  NaN
312                  NaN
313                  NaN


In [247]:
df.raw_text[306]



It looks like the date here is in a slightly different format: it does not have a comma.

In [248]:
def voyager_airdate(x):
    ans1 = (re.findall(r"Original Airdate: (\w+ \d+ \d+) \D+", x))
    ans2 = (re.findall(r"Original Airdate: (\w+ \d+, \d+) \D+", x))
    if len(ans1)>0:
        return ans1[0]
    elif len(ans2)>0:
        return ans2[0]
    else:
        return np.nan
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(voyager_airdate)

In [249]:
df[df.series == "Voyager"].airdate

168         May 23, 2001
169         May 16, 2001
170          May 9, 2001
171          May 2, 2001
172       April 25, 2001
173       April 18, 2001
174       April 11, 2001
175        March 7, 2001
176    February 28, 2001
177    February 21, 2001
178    February 14, 2001
179     February 7, 2001
180     January 31, 2001
181     January 24, 2001
182     January 17, 2001
183    November 29, 2000
184    November 22, 2000
185    November 15, 2000
186     November 8, 2000
187     November 1, 2000
188     October 25, 2000
189         Oct 18, 2000
190     October 11, 2000
191      October 4, 2000
192          May 24 2000
193         May 17, 2000
194         May 10, 2000
195          May 3, 2000
196       April 26, 2000
197       April 19, 2000
             ...        
305     February 5, 1996
306      January 29 1996
307      January 22 1996
308      January 15 1996
309     November 27 1995
310     November 20 1995
311     November 13 1995
312      November 6 1995
313      October 30 1995


In [250]:
df[df.series == "Voyager"].airdate.isnull().sum()

16

There are still some NaN for airdate.

In [262]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
217,598a2572fcd2e313d769f7d5,2001,"Title: Equinox, Part 2 Stardate: Unknown Origi...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
218,598a2572fcd2e313d769f7d6,2001,"Title: Equinox, Part 1 Stardate: Unknown Origi...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
267,598a2572fcd2e313d769f807,2001,Title: The Gift Stardate: Unknown Original Air...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
268,598a2572fcd2e313d769f808,2001,"Title: Scorpion, Part 2 Stardate: 51001.2 Orig...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
269,598a2572fcd2e313d769f809,2001,"Title: Scorpion, Part 1 Stardate: 51001.2 Orig...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
270,598a2572fcd2e313d769f80a,2001,Title: Worst Case Scenario Stardate: 50971.5 O...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
271,598a2572fcd2e313d769f80b,2001,Title: Displaced Stardate: 50912.4 Original Ai...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
272,598a2572fcd2e313d769f80c,2001,Title: Distant Origin Stardate: 50899.1 Origin...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
273,598a2572fcd2e313d769f80d,2001,Title: Real Life Stardate: 50836.2 Original Ai...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
274,598a2572fcd2e313d769f80e,2001,Title: Before And After Stardate: 50601.9 Orig...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,


In [263]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[217]



In [264]:
def voyager_airdate2(x):
    ans1 = (re.findall(r"Original Airdate[s]*: (\w+ \d+ \d+) \D+", x))
    ans2 = (re.findall(r"Original Airdate[s]*: (\w+ \d+, \d+) \D+", x))
    if len(ans1)>0:
        return ans1[0]
    elif len(ans2)>0:
        return ans2[0]
    else:
        return np.nan
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(voyager_airdate2)

In [265]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
267,598a2572fcd2e313d769f807,2001,Title: The Gift Stardate: Unknown Original Air...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
268,598a2572fcd2e313d769f808,2001,"Title: Scorpion, Part 2 Stardate: 51001.2 Orig...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
269,598a2572fcd2e313d769f809,2001,"Title: Scorpion, Part 1 Stardate: 51001.2 Orig...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
270,598a2572fcd2e313d769f80a,2001,Title: Worst Case Scenario Stardate: 50971.5 O...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
271,598a2572fcd2e313d769f80b,2001,Title: Displaced Stardate: 50912.4 Original Ai...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
272,598a2572fcd2e313d769f80c,2001,Title: Distant Origin Stardate: 50899.1 Origin...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
273,598a2572fcd2e313d769f80d,2001,Title: Real Life Stardate: 50836.2 Original Ai...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
274,598a2572fcd2e313d769f80e,2001,Title: Before And After Stardate: 50601.9 Orig...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
275,598a2572fcd2e313d769f80f,2001,Title: Favourite Son Stardate: 50589.1 Origina...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
276,598a2572fcd2e313d769f810,2001,Title: Rise Stardate: 50567.4 Original Airdate...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,


In [266]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[267]

"Title: The Gift Stardate: Unknown Original Airdate: 10 September, 1997 [Cargo Bay two] JANEWAY: So, how's the newest addition to our family? EMH: At the moment she's stable, but the prognosis isn't clear. Her human physiology has begun to reassert itself. Respiratory system, neurological functions, immune response. But those systems are swarming with Borg implants There's a battle being waged inside her body, between the biological and the technological, and I'm not sure which is going to win. JANEWAY: Well, it's time we brought her up to date. Wake her. SEVEN: Captain Janeway, What have you... The others, I can't hear the others. The voices are gone. JANEWAY: We had to neutralise the neuro-transceiver in your upper spinal column. Your link to the collective has been severed. SEVEN: You will return this drone to the Borg. JANEWAY: I'm afraid I can't do that SEVEN: You will return this drone to the Borg! JANEWAY: If I were to turn this ship around and head back into Borg territory I'd 

In [267]:
def voyager_airdate3(x):
    ans1 = (re.findall(r"Original Airdate[s]*: (\w+ \d+ \d+) \D+", x))
    ans2 = (re.findall(r"Original Airdate[s]*: (\w+ \d+, \d+) \D+", x))
    ans3 = (re.findall(r"Original Airdate[s]*: (\d+ \w+, \d+) \D+", x))
    if len(ans1)>0:
        return ans1[0]
    elif len(ans2)>0:
        return ans2[0]
    elif len(ans3)>0:
        return ans3[0]
    else:
        return np.nan
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(voyager_airdate3)

In [268]:
df[df.series == "Voyager"].airdate.isnull().sum()

6

In [269]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
268,598a2572fcd2e313d769f808,2001,"Title: Scorpion, Part 2 Stardate: 51001.2 Orig...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
269,598a2572fcd2e313d769f809,2001,"Title: Scorpion, Part 1 Stardate: 51001.2 Orig...",Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
274,598a2572fcd2e313d769f80e,2001,Title: Before And After Stardate: 50601.9 Orig...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
276,598a2572fcd2e313d769f810,2001,Title: Rise Stardate: 50567.4 Original Airdate...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
279,598a2572fcd2e313d769f813,2001,Title: Blood Fever Stardate: 50537.2 Original ...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
284,598a2572fcd2e313d769f818,2001,Title: The Q and the Grey Stardate: 50384.2 27...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,


In [270]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[268]



In [271]:
x = df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[268]
re.findall(r"Original Airdate[s]*: (\w+ \d+[a-z]{2}, \d+) \D+", x)

['Sept 3rd, 1997']

In [272]:
def voyager_airdate4(x):
    ans1 = (re.findall(r"Original Airdate[s]*: (\w+ \d+ \d+) \D+", x))
    ans2 = (re.findall(r"Original Airdate[s]*: (\w+ \d+, \d+) \D+", x))
    ans3 = (re.findall(r"Original Airdate[s]*: (\d+ \w+, \d+) \D+", x))
    ans4 = (re.findall(r"Original Airdate[s]*: (\w+ \d+[a-z]{2}, \d+) \D+", x))
    if len(ans1)>0:
        return ans1[0]
    elif len(ans2)>0:
        return ans2[0]
    elif len(ans3)>0:
        return ans3[0]
    elif len(ans4)>0:
        return ans4[0]
    else:
        return np.nan
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(voyager_airdate4)

In [273]:
df[df.series == "Voyager"].airdate.isnull().sum()

4

In [274]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
274,598a2572fcd2e313d769f80e,2001,Title: Before And After Stardate: 50601.9 Orig...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
276,598a2572fcd2e313d769f810,2001,Title: Rise Stardate: 50567.4 Original Airdate...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
279,598a2572fcd2e313d769f813,2001,Title: Blood Fever Stardate: 50537.2 Original ...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
284,598a2572fcd2e313d769f818,2001,Title: The Q and the Grey Stardate: 50384.2 27...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,


In [275]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[274]

"Title: Before And After Stardate: 50601.9 Original Airdate: 9th April, 1997 EMH [OC]: Activate the bio-temporal chamber. [Sickbay, 2379] ANDREW: Is she going to be all right? EMH: Not if you don't all clear out of here and let me do my work. LINNIS: She's my mother. I'm staying. EMH: This is a very delicate procedure, and I could use some peace and quiet. KIM: The Doctor's right. Let him do his work. LINNIS: All right. EMH: I wish I'd told you this before, but better late than never. You're the finest friend I've ever had. Prepare to bring the bio-temporal chamber online. We'll begin in approximately five minutes. ANDREW: Grandma? Are you awake? I brought you a present. Grandma Kes? I finally finished your birthday present. Sorry it's late, but I wanted to get it right. KES: I don't know you. ANDREW: What do you mean? I'm Andrew, your grandson. KES: I don't know you. ANDREW: Doctor? Doctor Van Gogh? EMH: What is it? ANDREW: She doesn't recognise me. EMH: Kes? How are you feeling? KES:

In [276]:
x = df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[274]
re.findall(r"Original Airdate[s]*: (\d+[a-z]{2} \w+, \d+) \D+", x)

['9th April, 1997']

In [277]:
def voyager_airdate5(x):
    ans1 = (re.findall(r"Original Airdate[s]*: (\w+ \d+ \d+) \D+", x))
    ans2 = (re.findall(r"Original Airdate[s]*: (\w+ \d+, \d+) \D+", x))
    ans3 = (re.findall(r"Original Airdate[s]*: (\d+ \w+, \d+) \D+", x))
    ans4 = (re.findall(r"Original Airdate[s]*: (\w+ \d+[a-z]{2}, \d+) \D+", x))
    ans5 = (re.findall(r"Original Airdate[s]*: (\d+[a-z]{2} \w+, \d+) \D+", x))
    if len(ans1)>0:
        return ans1[0]
    elif len(ans2)>0:
        return ans2[0]
    elif len(ans3)>0:
        return ans3[0]
    elif len(ans4)>0:
        return ans4[0]
    elif len(ans5)>0:
        return ans5[0]
    else:
        return np.nan
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(voyager_airdate5)

In [278]:
df[df.series == "Voyager"].airdate.isnull().sum()

3

In [279]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
276,598a2572fcd2e313d769f810,2001,Title: Rise Stardate: 50567.4 Original Airdate...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
279,598a2572fcd2e313d769f813,2001,Title: Blood Fever Stardate: 50537.2 Original ...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,
284,598a2572fcd2e313d769f818,2001,Title: The Q and the Grey Stardate: 50384.2 27...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,


In [283]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[276]

"Title: Rise Stardate: 50567.4 Original Airdate:26 February, 1997 [Bridge] JANEWAY: Fire! TUVOK: The asteroid is fragmenting, but most of the debris is still on a collision course with the planet. JANEWAY: Target the fragments. Destroy them. CHAKOTAY: That asteroid should have been vaporised. What happened? KIM: Not sure. Sensors showed a simple nickel-iron composition. We shouldn't be seeing fragments more than a centimetre in diameter. SKLAR: Ambassador, I'm afraid I was right. This isn't going to work. The same thing happened to us yesterday. We tried to vaporize two incoming asteroids but they fragmented and struck the surface. TUVOK: I've destroyed most of the debris, Captain, however targeting scanners were unable to track two of the fragments. They have already entered the upper atmosphere. The debris impacted on the largest continent, approximately 500 kilometres from the southern tip. NEZU AMBASSADOR: The central desert. Fortunately that region isn't heavily populated. TUVOK: 

In [284]:
x = df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[276]
re.findall(r"Original Airdate[s]*:[ ]*(\d+ \w+, \d+) \D+", x)

['26 February, 1997']

In [285]:
def voyager_airdate5(x):
    ans1 = (re.findall(r"Original Airdate[s]*: (\w+ \d+ \d+) \D+", x))
    ans2 = (re.findall(r"Original Airdate[s]*: (\w+ \d+, \d+) \D+", x))
    ans3 = (re.findall(r"Original Airdate[s]*:[ ]*(\d+ \w+, \d+) \D+", x))
    ans4 = (re.findall(r"Original Airdate[s]*: (\w+ \d+[a-z]{2}, \d+) \D+", x))
    ans5 = (re.findall(r"Original Airdate[s]*: (\d+[a-z]{2} \w+, \d+) \D+", x))
    if len(ans1)>0:
        return ans1[0]
    elif len(ans2)>0:
        return ans2[0]
    elif len(ans3)>0:
        return ans3[0]
    elif len(ans4)>0:
        return ans4[0]
    elif len(ans5)>0:
        return ans5[0]
    else:
        return np.nan
df.loc[df.series == "Voyager", "airdate"] = df[df.series == "Voyager"].iloc[:,2].map(voyager_airdate5)

In [286]:
df[df.series == "Voyager"].airdate.isnull().sum()

1

In [287]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
284,598a2572fcd2e313d769f818,2001,Title: The Q and the Grey Stardate: 50384.2 27...,Voyager,1995,https://scifi.media/wp-content/uploads/t/voy/s...,


In [288]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))].raw_text[284]

"Title: The Q and the Grey Stardate: 50384.2 27 November 1996 [Bridge] JANEWAY: Oh! CHAKOTAY: Incredible. JANEWAY: Absolutely thrilling. NEELIX: All I can say is wow! What about you, Mister Vulcan? Isn't that just wow! TUVOK: Your inarticulate expression of awe notwithstanding, Mister Neelix, it was a fascinating spectacle. KIM: That's the edge of the shock wave. The pressure's over ninety kilopascals, thirty percent more than we predicted. JANEWAY: Tom, back us off at full impulse. I want to stay ahead of the brunt of that wave. PARIS: Yes, ma'am. JANEWAY: Congratulations, everyone. Only two crews in the history of Starfleet have witnessed a supernova explosion. KIM: But neither one was this close. Less than ten billion kilometres. Definitely a record. JANEWAY: Who brought the champagne? NEELIX: Champagne? Captain, if I thought you wanted champagne. JANEWAY: Relax, Neelix. It's a figure of speech. KES: Thanks for inviting us to watch with you, Captain. It's really got me interested in

In [289]:
df.set_value(284, 'airdate', '27 November 1996')

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,9/20/88
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,3/18/88
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,3/8/88
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,2/22/88
5,598a2572fcd2e313d769f701,1994,"STAR TREK: THE NEXT GENERATION Skin of Evil ""f...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/122.txt,2/1/88
6,598a2572fcd2e313d769f702,1994,"STAR TREK: THE NEXT GENERATION ""Symbiosis"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/123.txt,2/17/88
7,598a2572fcd2e313d769f703,1994,"STAR TREK: THE NEXT GENERATION ""The Arsenal Of...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/121.txt,1/25/88
8,598a2572fcd2e313d769f704,1994,"STAR TREK: THE NEXT GENERATION ""Heart Of Glory...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/120.txt,1/13/88
9,598a2572fcd2e313d769f705,1994,"STAR TREK: THE NEXT GENERATION ""Coming Of Age""...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/119.txt,12/30/87


In [290]:
df[(df.series == "Voyager") & (pd.isnull(df.airdate))]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate


### Series "Enterprise" airdate extraction:

In [39]:
df[df.series == "Enterprise"].shape

(96, 6)

Let's try all the tricks we tried for "Voyager" first:

In [291]:
df.loc[df.series == "Enterprise", "airdate"] = df[df.series == "Enterprise"].iloc[:,2].map(voyager_airdate5)

In [293]:
df[df.series == "Enterprise"].airdate.isnull().sum()

0

Seemed like it worked!

In [295]:
df[df.series == "Enterprise"].head()

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
72,598a2572fcd2e313d769f744,2005,These Are The Voyages... Original Airdate: May...,Enterprise,2001,https://scifi.media/wp-content/uploads/t/ent/9...,"May 13, 2005"
73,598a2572fcd2e313d769f745,2005,"Terra Prime Original Airdate: May 13, 2005 T'P...",Enterprise,2001,https://scifi.media/wp-content/uploads/t/ent/9...,"May 13, 2005"
74,598a2572fcd2e313d769f746,2005,"Demons Mission Date: Jan 19, 2155 Original Air...",Enterprise,2001,https://scifi.media/wp-content/uploads/t/ent/9...,"May 6, 2005"
75,598a2572fcd2e313d769f747,2005,"In A Mirror, Darkly - part 2 Original Airdate:...",Enterprise,2001,https://scifi.media/wp-content/uploads/t/ent/9...,"Apr 29, 2005"
76,598a2572fcd2e313d769f748,2005,"In A Mirror, Darkly - part 1 Original Airdate:...",Enterprise,2001,https://scifi.media/wp-content/uploads/t/ent/9...,"Apr 22, 2005"


### Motion Picture files release date extraction:

In [40]:
df[df.series == "Motion Picture"].shape

(10, 6)

In [298]:
df[(df.series == "Motion Picture")]

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,
669,598a2572fcd2e313d769f999,various,Screenplay by: John Logan SHOOTING SCRIPT INT....,Motion Picture,various,https://scifi.media/wp-content/uploads/t/nem.txt,
670,598a2572fcd2e313d769f99a,various,Screenplay by Michael Pillar & Rick Berman REV...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/ins.txt,
671,598a2572fcd2e313d769f99b,various,Story by Rick Berman & Brannon Braga & Ronald ...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/fc.txt,
672,598a2572fcd2e313d769f99c,various,"Screenplay by Rick Berman, Ronald D. Moore, Br...",Motion Picture,various,https://scifi.media/wp-content/uploads/t/gens.txt,
673,598a2572fcd2e313d769f99d,various,Screenplay by Nicholas Meyer & Denny Martin Fl...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tuc.txt,
674,598a2572fcd2e313d769f99e,various,Screenplay by: David Loughery Story by William...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tff.txt,
675,598a2572fcd2e313d769f99f,various,Screenplay by HARVE BENNETT & NICHOLAS MEYER S...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tvh.txt,
676,598a2572fcd2e313d769f9a0,various,Written by: HARVE BENNETT REV. FINAL DRAFT Oct...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/tsfs.txt,
677,598a2572fcd2e313d769f9a1,various,Written By: Harve Bennett Participating Writer...,Motion Picture,various,https://scifi.media/wp-content/uploads/t/twok.txt,


In [299]:
df[(df.series == "Motion Picture")].raw_text[669]



It doesn't look like I will be able to extract movie release dates from their scripts, so I will have to set the dates manually per information in Wikipedia:

In [300]:
df[(df.series == "Motion Picture")].index

Int64Index([0, 669, 670, 671, 672, 673, 674, 675, 676, 677], dtype='int64')

In [301]:
df.set_value(0, 'airdate', 'December 7, 1979')
df.set_value(0, 'series', 'The Motion Picture')

df.set_value(669, 'airdate', 'December 13, 2002')
df.set_value(669, 'series', 'Nemesis')

df.set_value(670, 'airdate', 'December 11, 1998')
df.set_value(670, 'series', 'Insurrection')

df.set_value(671, 'airdate', 'November 22, 1996')
df.set_value(671, 'series', 'First Contact')

df.set_value(672, 'airdate', 'November 18, 1994')
df.set_value(672, 'series', 'Generations')

df.set_value(673, 'airdate', 'December 6, 1991')
df.set_value(673, 'series', 'The Undiscovered Country')

df.set_value(674, 'airdate', 'June 9, 1989')
df.set_value(674, 'series', 'The Final Frontier')

df.set_value(675, 'airdate', 'November 26, 1986')
df.set_value(675, 'series', 'The Voyage Home')

df.set_value(676, 'airdate', 'June 1, 1984')
df.set_value(676, 'series', 'The Search for Spock')

df.set_value(677, 'airdate', 'June 4, 1982')
df.set_value(677, 'series', 'The Wrath of Khan')

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,The Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,"December 7, 1979"
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,9/20/88
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,3/18/88
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,3/8/88
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,2/22/88
5,598a2572fcd2e313d769f701,1994,"STAR TREK: THE NEXT GENERATION Skin of Evil ""f...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/122.txt,2/1/88
6,598a2572fcd2e313d769f702,1994,"STAR TREK: THE NEXT GENERATION ""Symbiosis"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/123.txt,2/17/88
7,598a2572fcd2e313d769f703,1994,"STAR TREK: THE NEXT GENERATION ""The Arsenal Of...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/121.txt,1/25/88
8,598a2572fcd2e313d769f704,1994,"STAR TREK: THE NEXT GENERATION ""Heart Of Glory...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/120.txt,1/13/88
9,598a2572fcd2e313d769f705,1994,"STAR TREK: THE NEXT GENERATION ""Coming Of Age""...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/119.txt,12/30/87


In [302]:
df.airdate.isnull().sum()

0

Now we need to convert all these strings to dates:

In [303]:
df.airdate = pd.to_datetime(df.airdate)

In [323]:
df.head()

Unnamed: 0,_id,end,raw_text,series,start,url,airdate
0,598a2572fcd2e313d769f6fc,various,Screenplay by: GENE RODDENBERRY & HAROLD LIVIN...,The Motion Picture,various,https://scifi.media/wp-content/uploads/t/tmp.txt,1979-12-07
1,598a2572fcd2e313d769f6fd,1994,"STAR TREK: THE NEXT GENERATION ""The Child"" #40...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/127.txt,1988-09-20
2,598a2572fcd2e313d769f6fe,1994,"STAR TREK: THE NEXT GENERATION ""The Neutral Zo...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/126.txt,1988-03-18
3,598a2572fcd2e313d769f6ff,1994,"STAR TREK: THE NEXT GENERATION ""Conspiracy"" #4...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/125.txt,1988-03-08
4,598a2572fcd2e313d769f700,1994,"STAR TREK: THE NEXT GENERATION ""We'll Always H...",The Next Generation,1987,https://scifi.media/wp-content/uploads/t/124.txt,1988-02-22


In [308]:
df.airdate.dtypes

dtype('<M8[ns]')

This data type is a type of datetime, so now we can finally move on! Let's save the progress first.

In [311]:
import pickle

with open('Data/df.pkl', 'wb') as picklefile:
    pickle.dump(df, picklefile)

## Step 01: Creating Different Dataframes for Different Purposes  
There is a lot of junk in scripts that should probably be deleted if I want to extract some semantics form them.

In [314]:
df_nonames = df

In [325]:
df_nojunk[df_nojunk.start == 'various'].raw_text.iloc[2]

