# Lab 4.2- Regular Expression Practice

Now that we have grouped the data blocks, we need to identify and correct problems in the data.  The go-to tools for this task is the regular expression.  In this lab, we will point out a number of problems in the grouped data and create regular expression to perform various tasks (matching/splitting/substitution).

## Problem 1 -- Reading in current progress

Recall that we saved the results of grouping the data in a file named `911_Deaths_Grouped.csv`.  Read in the content of this file and split the content into a list of lines.

In [2]:
# Your code here

#### Key

In [1]:
with open('911_Deaths_Grouped.csv') as f:
    content = f.read()
content[:500]

"Gordon M. Aamoth, Jr., 32, Sandler O'Neill + Partners, World Trade Center.\nEdelmiro Abad, 54, Brooklyn, N.Y., Fiduciary Trust Company International, World Trade Center.\nMarie Rose Abad, 49, Keefe, Bruyette&Woods, Inc., World Trade Center.\nAndrew Anthony Abate, 37, Melville, N.Y., Cantor Fitzgerald, World Trade Center.\nVincent Paul Abate, 40, Brooklyn, N.Y., Cantor Fitzgerald, World Trade Center.\nLaurence Christopher Abel, 37, New York City, Cantor Fitzgerald, World Trade Center.\nAlona Abraham, 3"

In [2]:
grouped_lines = content.split('\n')
grouped_lines

antor Fitzgerald, World Trade Center.',
 'Anthony J. Fallone, Jr., 39, New York City, Cantor Fitzgerald, World Trade Center.',
 'Dolores Brigitte Fanelli, 38, Farmingville, N.Y., Marsh&McLennan Companies, Inc., World Trade Center.',
 'Robert John Fangman, 33, Chelsea, Mass., Flight Crew, United 175, World Trade Center.',
 'John Joseph Fanning, 54, West Hempstead, N.Y., New York City Fire Department, World Trade Center.',
 'Kathleen Anne Faragher, 33, Risk Waters Group conference attendee from Janus Capital Group, World Trade Center.',
 'Thomas James Farino, 37, Bohemia, N.Y., New York City Fire Department, World Trade Center.',
 'Nancy C. Doloszycki Farley, 45, Jersey City, N.J., Reinsurance Solutions, World Trade Center.',
 'Paige Marie Farley-Hackel, 46, Newton, Mass., Passenger, United 11, World Trade Center.',
 'Elizabeth Ann Farmer, 62, Cantor Fitzgerald contractor, World Trade Center.',
 'Douglas Jon Farnum, 33, Brooklyn, N.Y., Marsh&McLennan Companies, Inc., World Trade Center.'

## Problem 2 -- Inspecting problem lines

I have provided some examples of problems that can be found in this data set below.  Inspect the lines and determine one or more things that are problematic for each line.

In [5]:
example_idx = (0, 33, 75, 76, 150, 232, 1304, 1305, 1343)
examples = [l for i, l in enumerate(grouped_lines) if i in example_idx]
examples

[&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners, World Trade Center.&quot;,
 &#39;Godwin O. Ajala, 33, Summit Security Services, Inc., World Trade Center, died 9/15/01.&#39;,
 &#39;Mary Lynn Edwards Angell, 52, Cape Cod, Mass. and Pasadena, Calif., Passenger, United 11, World Trade Center.&#39;,
 &#39;Laura Angilletta, 23, Staten Island, N.Y., Cantor Fitzgerald, World Trade Center.&#39;,
 &#39;Lorraine G. Bay, 58, East Windsor, N.J., Flight Crew, United 93, Shanksville, Pa.&#39;,
 &#39;Canfield D. Boone, ??, United States Army, Pentagon.&#39;,
 &#39;Albert Gunnis Joseph, 79, New York City, Morgan Stanley, World Trade Center, died 1/2/02.&#39;,
 &#39;Ingeborg Joseph, 53, Marriott guest, World Trade Center, died 10/9/01.&#39;,
 &#39;Brenda Kegler, ??, Capitol Heights, Md., United States Army Civilian, Pentagon.&#39;]

> Your answers here

#### Key

Some problems include

1. Comma's in names, companies, location
2. Missing ages (represented as `??`)
3. Rows missing an entry for hometowns, passenger type, flight, and date of death.
4. Non-uniform hometown/state entries.


## General procedure

1. Create the expression and test against positive case.
2. Match/search against all example
3. After you know it works, add the `groups` method for all examples.
4. Look for any non-matches in the full data set.
5. Test on the fill data set if all rows match.

## What's the big deal?

So why are we being so careful to make sure everything matches? Turns out that if any row fails to match, adding `groups` will crash the code :/

In [6]:
import re
test = re.compile(', \d\d,')
[test.search(l) for l in examples]

[&lt;re.Match object; span=(21, 26), match=&#39;, 32,&#39;&gt;,
 &lt;re.Match object; span=(15, 20), match=&#39;, 33,&#39;&gt;,
 &lt;re.Match object; span=(24, 29), match=&#39;, 52,&#39;&gt;,
 &lt;re.Match object; span=(16, 21), match=&#39;, 23,&#39;&gt;,
 &lt;re.Match object; span=(15, 20), match=&#39;, 58,&#39;&gt;,
 None,
 &lt;re.Match object; span=(20, 25), match=&#39;, 79,&#39;&gt;,
 &lt;re.Match object; span=(15, 20), match=&#39;, 53,&#39;&gt;,
 None]

re.compile(', \d\d,') is looking for a comma, then a space and then two of any digits followed by a comma

In [7]:
[test.search(l).groups() for l in examples]

AttributeError: &#39;NoneType&#39; object has no attribute &#39;groups&#39;

## Problem 3 -- Capturing the age field

Notice that all victims have a passenger field that contains either their age or `??` if the age is unknown.

In this problem, we will build a regular expression to match this field, which will ALSO allow us to capture the name field (even when there are problems with extra commas).

#### Task 1 - Capture the age field.

Write a regular expression that matches and captures the age field.  

**Hints:** Remember that 

* Use `(pat)` to capture a pattern.
* Use `\d` to match digits
* `(p1|p2)` allows you to match `p1` or `p2`.   

In [6]:
# Your code here

#### Key

In [8]:
import re
age = re.compile(', (\?\?|\d{1,3}),')
type(age.search(examples[0]))

re.Match

re.compile(', (\?\?|\d{1,3}),') is looking for a comma, then a space, then either two question marks OR (|) 1, 2, or 3 numbers of digits followed by a comma. The question marks or the digits will be saved and be able to be retrieved with .group() function because they are in parenthesis

In [9]:
examples[0], age.search(examples[0]).groups()

(&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners, World Trade Center.&quot;,
 (&#39;32&#39;,))

In [10]:
[age.search(l) for l in examples]

[&lt;re.Match object; span=(21, 26), match=&#39;, 32,&#39;&gt;,
 &lt;re.Match object; span=(15, 20), match=&#39;, 33,&#39;&gt;,
 &lt;re.Match object; span=(24, 29), match=&#39;, 52,&#39;&gt;,
 &lt;re.Match object; span=(16, 21), match=&#39;, 23,&#39;&gt;,
 &lt;re.Match object; span=(15, 20), match=&#39;, 58,&#39;&gt;,
 &lt;re.Match object; span=(17, 22), match=&#39;, ??,&#39;&gt;,
 &lt;re.Match object; span=(20, 25), match=&#39;, 79,&#39;&gt;,
 &lt;re.Match object; span=(15, 20), match=&#39;, 53,&#39;&gt;,
 &lt;re.Match object; span=(13, 18), match=&#39;, ??,&#39;&gt;]

In [11]:
[age.search(l).groups() for l in examples]

[(&#39;32&#39;,),
 (&#39;33&#39;,),
 (&#39;52&#39;,),
 (&#39;23&#39;,),
 (&#39;58&#39;,),
 (&#39;??&#39;,),
 (&#39;79&#39;,),
 (&#39;53&#39;,),
 (&#39;??&#39;,)]

In [12]:
[(i, l) for i, l in enumerate(grouped_lines) if not age.search(l)]

[(2977, &#39;&#39;)]

In [13]:
[age.search(l).groups() for l in grouped_lines]

AttributeError: &#39;NoneType&#39; object has no attribute &#39;groups&#39;

#### Task 2 - Capture the age field, as well as everything before and after.

Adapt your work from the last problem to not only capture the age field, but also everything before and after.

**Hint:** Remember that 

* Use greedy wild-cards `.*` and/or `.+` to grab as much as possible.
* Use comma's to anchor the three parts, e.g. `(pat1), (pat2), (pat3)`

In [14]:
# Your code here

#### Key

In [15]:
import re
age_plus = re.compile('(.+), (\?\?|\d{1,3}), (.+)')

re.compile('(.+), (\?\?|\d{1,3}), (.+)') is one more of any character, comma and space, and then either 2 question marks OR 1,2, or 3 of any combination of digits then a comma, space and then one or more of any characters. each of the 'one or more characters' are being saved as well as the question marks or digits in between them.

In [16]:
[age_plus.search(l) for l in examples]

[&lt;re.Match object; span=(0, 74), match=&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Part&gt;,
 &lt;re.Match object; span=(0, 86), match=&#39;Godwin O. Ajala, 33, Summit Security Services, In&gt;,
 &lt;re.Match object; span=(0, 109), match=&#39;Mary Lynn Edwards Angell, 52, Cape Cod, Mass. and&gt;,
 &lt;re.Match object; span=(0, 81), match=&#39;Laura Angilletta, 23, Staten Island, N.Y., Cantor&gt;,
 &lt;re.Match object; span=(0, 81), match=&#39;Lorraine G. Bay, 58, East Windsor, N.J., Flight C&gt;,
 &lt;re.Match object; span=(0, 52), match=&#39;Canfield D. Boone, ??, United States Army, Pentag&gt;,
 &lt;re.Match object; span=(0, 89), match=&#39;Albert Gunnis Joseph, 79, New York City, Morgan S&gt;,
 &lt;re.Match object; span=(0, 70), match=&#39;Ingeborg Joseph, 53, Marriott guest, World Trade &gt;,
 &lt;re.Match object; span=(0, 79), match=&#39;Brenda Kegler, ??, Capitol Heights, Md., United S&gt;]

In [17]:
[age_plus.search(l).groups() for l in examples]

[(&#39;Gordon M. Aamoth, Jr.&#39;,
  &#39;32&#39;,
  &quot;Sandler O&#39;Neill + Partners, World Trade Center.&quot;),
 (&#39;Godwin O. Ajala&#39;,
  &#39;33&#39;,
  &#39;Summit Security Services, Inc., World Trade Center, died 9/15/01.&#39;),
 (&#39;Mary Lynn Edwards Angell&#39;,
  &#39;52&#39;,
  &#39;Cape Cod, Mass. and Pasadena, Calif., Passenger, United 11, World Trade Center.&#39;),
 (&#39;Laura Angilletta&#39;,
  &#39;23&#39;,
  &#39;Staten Island, N.Y., Cantor Fitzgerald, World Trade Center.&#39;),
 (&#39;Lorraine G. Bay&#39;,
  &#39;58&#39;,
  &#39;East Windsor, N.J., Flight Crew, United 93, Shanksville, Pa.&#39;),
 (&#39;Canfield D. Boone&#39;, &#39;??&#39;, &#39;United States Army, Pentagon.&#39;),
 (&#39;Albert Gunnis Joseph&#39;,
  &#39;79&#39;,
  &#39;New York City, Morgan Stanley, World Trade Center, died 1/2/02.&#39;),
 (&#39;Ingeborg Joseph&#39;,
  &#39;53&#39;,
  &#39;Marriott guest, World Trade Center, died 10/9/01.&#39;),
 (&#39;Brenda Kegler&#39;,
  &#39;??&#39;,
 

In [18]:
[(i, l) for i, l in enumerate(grouped_lines) if not age_plus.search(l)]

[(2977, &#39;&#39;)]

In [19]:
[age_plus.search(l).groups() for l in grouped_lines]

AttributeError: &#39;NoneType&#39; object has no attribute &#39;groups&#39;

## Problem 4 -- Capturing the date of death

While most victims of the attack died on 9/11, a few died at a later date.  Notice that those that those that died later have an additional field at the end of the line.

In this problem, we will build a regular expression to match this field.

In [20]:
examples[-2]

&#39;Ingeborg Joseph, 53, Marriott guest, World Trade Center, died 10/9/01.&#39;

#### Task 1 - Capture the date of death field.
  
Write a regular expression that matches and captures the date of death (e.g. `10/9/01`).  This expression should return `None` when this field is missing.

**Hints:** Remember that 

* Use `$` to match the end of the line.
* Escape to match periods exactly, i.e. `\.`
* Use `\d{n,m}` to match digits to match between `n` and `m` digits
* `?` allows you to match optional patterns

In [21]:
# Your code here
import re
dod = re.compile('(, died \d{1,2}/\d{1,2}/\d{1,2})?(\.)$')

#### Key

In [22]:
dod = re.compile('(, died \d{1,2}/\d{1,2}/\d{1,2})?(\.)?$')
dod.search(examples[-2]).groups()

(&#39;, died 10/9/01&#39;, &#39;.&#39;)

In [23]:
[dod.search(l) for l in examples]

[&lt;re.Match object; span=(73, 74), match=&#39;.&#39;&gt;,
 &lt;re.Match object; span=(71, 86), match=&#39;, died 9/15/01.&#39;&gt;,
 &lt;re.Match object; span=(108, 109), match=&#39;.&#39;&gt;,
 &lt;re.Match object; span=(80, 81), match=&#39;.&#39;&gt;,
 &lt;re.Match object; span=(80, 81), match=&#39;.&#39;&gt;,
 &lt;re.Match object; span=(51, 52), match=&#39;.&#39;&gt;,
 &lt;re.Match object; span=(75, 89), match=&#39;, died 1/2/02.&#39;&gt;,
 &lt;re.Match object; span=(55, 70), match=&#39;, died 10/9/01.&#39;&gt;,
 &lt;re.Match object; span=(78, 79), match=&#39;.&#39;&gt;]

In [24]:
[dod.search(l).groups() for l in examples]

[(None, &#39;.&#39;),
 (&#39;, died 9/15/01&#39;, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (&#39;, died 1/2/02&#39;, &#39;.&#39;),
 (&#39;, died 10/9/01&#39;, &#39;.&#39;),
 (None, &#39;.&#39;)]

In [25]:
[(i, l) for i, l in enumerate(grouped_lines) if not dod.search(l)]

[]

In [26]:
[dod.search(l).groups() for l in grouped_lines]

[(None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (&#39;, died 9/15/01&#39;, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39;.&#39;),
 (None, &#39

#### Task 2 - Capture the age field, as well as everything before and after.

Adapt your work from the last problem to not only capture the data of death field, but also everything before.

**Hint:** Remember that 

* Use greedy wild-cards `.*` and/or `.+` to grab as much as possible.
* Use comma's to anchor the three parts, e.g. `(pat1), (pat2), (pat3)`

In [27]:
examples

[&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners, World Trade Center.&quot;,
 &#39;Godwin O. Ajala, 33, Summit Security Services, Inc., World Trade Center, died 9/15/01.&#39;,
 &#39;Mary Lynn Edwards Angell, 52, Cape Cod, Mass. and Pasadena, Calif., Passenger, United 11, World Trade Center.&#39;,
 &#39;Laura Angilletta, 23, Staten Island, N.Y., Cantor Fitzgerald, World Trade Center.&#39;,
 &#39;Lorraine G. Bay, 58, East Windsor, N.J., Flight Crew, United 93, Shanksville, Pa.&#39;,
 &#39;Canfield D. Boone, ??, United States Army, Pentagon.&#39;,
 &#39;Albert Gunnis Joseph, 79, New York City, Morgan Stanley, World Trade Center, died 1/2/02.&#39;,
 &#39;Ingeborg Joseph, 53, Marriott guest, World Trade Center, died 10/9/01.&#39;,
 &#39;Brenda Kegler, ??, Capitol Heights, Md., United States Army Civilian, Pentagon.&#39;]

In [28]:
# Your code here
everything = re.compile('^(.*?)(, died \d{1,2}/\d{1,2}/\d{1,2})?(\.)?$')
#[everything.search(l).groups() for l in examples]
everything.search(examples[0]).groups(), examples[1]

((&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners, World Trade Center&quot;,
  None,
  &#39;.&#39;),
 &#39;Godwin O. Ajala, 33, Summit Security Services, Inc., World Trade Center, died 9/15/01.&#39;)

In [29]:
[everything.search(l).groups() for l in examples]

[(&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners, World Trade Center&quot;,
  None,
  &#39;.&#39;),
 (&#39;Godwin O. Ajala, 33, Summit Security Services, Inc., World Trade Center&#39;,
  &#39;, died 9/15/01&#39;,
  &#39;.&#39;),
 (&#39;Mary Lynn Edwards Angell, 52, Cape Cod, Mass. and Pasadena, Calif., Passenger, United 11, World Trade Center&#39;,
  None,
  &#39;.&#39;),
 (&#39;Laura Angilletta, 23, Staten Island, N.Y., Cantor Fitzgerald, World Trade Center&#39;,
  None,
  &#39;.&#39;),
 (&#39;Lorraine G. Bay, 58, East Windsor, N.J., Flight Crew, United 93, Shanksville, Pa&#39;,
  None,
  &#39;.&#39;),
 (&#39;Canfield D. Boone, ??, United States Army, Pentagon&#39;, None, &#39;.&#39;),
 (&#39;Albert Gunnis Joseph, 79, New York City, Morgan Stanley, World Trade Center&#39;,
  &#39;, died 1/2/02&#39;,
  &#39;.&#39;),
 (&#39;Ingeborg Joseph, 53, Marriott guest, World Trade Center&#39;,
  &#39;, died 10/9/01&#39;,
  &#39;.&#39;),
 (&#39;Brenda Kegler, ??, Capitol Heights,

In [30]:
[l for l in grouped_lines if "United" in l or "American" in l]

ssenger, United 175, World Trade Center.&#39;,
 &#39;Gerald Francis Hardacre, 61, Carlsbad, Calif., Passenger, United 175, World Trade Center.&#39;,
 &#39;Eric Hartono, 19, Passenger, United 175, World Trade Center.&#39;,
 &#39;Peter Paul Hashem, 40, Passenger, United 11, World Trade Center.&#39;,
 &#39;James Edward Hayden, 47, Passenger, United 175, World Trade Center.&#39;,
 &#39;Robert Jay Hayes, 37, Amerbury, Mass., Passenger, United 11, World Trade Center.&#39;,
 &#39;Michele M. Heidenberger, 57, Chevy Chase, Md., Flight Crew, American 77, Pentagon.&#39;,
 &#39;Sheila M.S. Hein, 51, University Park, Md., United States Army Civilian, Pentagon.&#39;,
 &#39;Ronald John Hemenway, 37, Washington, D.C., United States Navy, Pentagon.&#39;,
 &#39;Edward R. Hennessy, Jr., 35, Passenger, United 11, World Trade Center.&#39;,
 &#39;John A. Hofer, 45, Passenger, United 11, World Trade Center.&#39;,
 &#39;Wallace Cole Hogan, Jr., ??, United States Army, Pentagon.&#39;,
 &#39;Cora Hidalgo Hollan

## Problem 5 -- Working with passenger data

Notice that 

1. Passengers on the flights have two extra fields: passenger status and flight
2. Other victim are missing these fields.

In this problem, we will build a regular expression to match these fields and use this expression to split the data.  In the process, we will be able to add the missing fields to the other rows.

#### Task 1 - Make an expression that matches the passenger status.

Make a regular expression that matches and extracts the passenger status field.  This expression should match all lines, returning `None` for the other rows.

**Hint:** Remember that 

* `(p1|p2)` allows you to match `p1` or `p2`.   
* `?` allows you to match optional patterns

In [31]:
# Your code here.
passengerstatus = re.compile('^(.*?)(, Passenger,|, Flight Crew,)?( United \d{2,3},| American \d{2,3},)? (World Trade Center|Shanksville, Pa|Pentagon)(.*?)$')
[passengerstatus.search(l).groups() for l in examples]

[(&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners,&quot;,
  None,
  None,
  &#39;World Trade Center&#39;,
  &#39;.&#39;),
 (&#39;Godwin O. Ajala, 33, Summit Security Services, Inc.,&#39;,
  None,
  None,
  &#39;World Trade Center&#39;,
  &#39;, died 9/15/01.&#39;),
 (&#39;Mary Lynn Edwards Angell, 52, Cape Cod, Mass. and Pasadena, Calif.&#39;,
  &#39;, Passenger,&#39;,
  &#39; United 11,&#39;,
  &#39;World Trade Center&#39;,
  &#39;.&#39;),
 (&#39;Laura Angilletta, 23, Staten Island, N.Y., Cantor Fitzgerald,&#39;,
  None,
  None,
  &#39;World Trade Center&#39;,
  &#39;.&#39;),
 (&#39;Lorraine G. Bay, 58, East Windsor, N.J.&#39;,
  &#39;, Flight Crew,&#39;,
  &#39; United 93,&#39;,
  &#39;Shanksville, Pa&#39;,
  &#39;.&#39;),
 (&#39;Canfield D. Boone, ??, United States Army,&#39;, None, None, &#39;Pentagon&#39;, &#39;.&#39;),
 (&#39;Albert Gunnis Joseph, 79, New York City, Morgan Stanley,&#39;,
  None,
  None,
  &#39;World Trade Center&#39;,
  &#39;, died 1/2/02.&#39;),


In [32]:
examples

[&quot;Gordon M. Aamoth, Jr., 32, Sandler O&#39;Neill + Partners, World Trade Center.&quot;,
 &#39;Godwin O. Ajala, 33, Summit Security Services, Inc., World Trade Center, died 9/15/01.&#39;,
 &#39;Mary Lynn Edwards Angell, 52, Cape Cod, Mass. and Pasadena, Calif., Passenger, United 11, World Trade Center.&#39;,
 &#39;Laura Angilletta, 23, Staten Island, N.Y., Cantor Fitzgerald, World Trade Center.&#39;,
 &#39;Lorraine G. Bay, 58, East Windsor, N.J., Flight Crew, United 93, Shanksville, Pa.&#39;,
 &#39;Canfield D. Boone, ??, United States Army, Pentagon.&#39;,
 &#39;Albert Gunnis Joseph, 79, New York City, Morgan Stanley, World Trade Center, died 1/2/02.&#39;,
 &#39;Ingeborg Joseph, 53, Marriott guest, World Trade Center, died 10/9/01.&#39;,
 &#39;Brenda Kegler, ??, Capitol Heights, Md., United States Army Civilian, Pentagon.&#39;]

#### Task 2 - Capture the flight field

Make a regular expression that matches and extracts the flight field.  This expression should match all lines, returning `None` for the other rows.

**Hint:** Remember that 

* Look through the data file to identify the airlines.
* Use `\d` to match digits
* `(p1|p2)` allows you to match `p1` or `p2`.   
* `?` allows you to match optional patterns

In [33]:
# Your code here.
# see Task 1


#### Task 3 - Combine the last two expression

Now combine the last two expressions to capture the two flight fields, but also all content before and after these fields.

In [34]:
# Your code here.
# see Task 1