**STEP 1**

The 12-step program to nirvana. The goal of this 12-step program is to show you an enhanced library for string pattern matching, using the Python programming language.

First, let's review regular expressions which are already built into the Python programming langauge. The module is named "re", also refered as RE.

Say we are using regular expressions to analyze a text that contains simple sentences of the form:

* WHO went to WHERE looking for WHAT.

For example:

* I went to Louisiana, Texas and back looking for people and places.

Let's write a Python program using regular expressions that extracts all of these items, WHO, WHERE, and WHAT. This program will store this information in three variables and if we are successful, the result will be the following:

* who = 'I'
* where_to = ['Louisiana', 'Texas', 'back']
* what_for = ['people', 'places']

In [None]:
import re # import the already built-in regular expression module in Python
from pprint import PrettyPrinter # import the pretty-print object
pp = PrettyPrinter(indent=2, width=120)
# Here are samples of text similar to those we want to analyze:
sentences = [
  "I went to Louisiana, Texas and back looking for people and places."
, "He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas."
, "You went to school and work looking for fun, frolic and fantasy."
]
for sentence in sentences:
    print(sentence)

I went to Louisiana, Texas and back looking for people and places.
He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.
You went to school and work looking for fun, frolic and fantasy.


**STEP 2**

Let's try using the *re.search* function.

After a successful regular expression match, the *re.search* returns an *re.Match* object. To nicely display any Python object, the pretty-print function, *pprint*, is used. To gain access to any captured groups being returned, the *re.Match.groups* (or *re.Match.groupdict*) is used. The *groups* function returns a list of positionally captured groups and the *groupdict* function returns a dictionary of named captured groups.

This is going to be amazing! You mean all I have to do is create just one regular expression pattern to verify the validity of the text and simulataneously extract all the separate elements into structured data in *one fell swoop*?

In [None]:
for sentence in sentences:
    if results := re.search(
          r"^(I|He|You)"                          # the caret denotes to match beginning of string
                                                  # (___), parenthesis, denotes to group and to capture
                                                  # _|_|_, the vertical-bar, denotes to match alternatives
          r" went to "                            # nothing special denotes to match this text literally
          r"([A-Za-z][a-z]*)"                     # [A-Za-z][a-z]* denotes to match a word, possibly capitalized
          r"(?:(?:,|,? and) ([A-Za-z][a-z]*))*"   # ___? denotes to match optionally, zero or one instances
                                                  # (___)* denotes matching a repetition of zero or more patterns
                                                  # (?:___) denotes to group but not to capture
                                                  # (?:(?:___) (___))*
                                                  #   comma or conjunction, or both
                                                  #   followed by a space
                                                  #   followed by a word possibly capitalized
          r" looking for "                        # match literal text
          r"([A-Za-z][a-z]*)"                     # group and capture the 1st thing
          r"(?:(?:,|,? and) ([A-Za-z][a-z]*))*"   # also capture all remaining things
                                                  # but ignore commas, spaces, and the word "and"
          r"\.$"                                  # the period character must be escaped with a backslash
                                                  # the dollar-sign denotes to match end of the string
        , sentence):
          pp.pprint(["Matched:", results.groups(), sentence])
    else: pp.pprint(["Unmatched!", sentence])

[ 'Matched:',
  ('I', 'Louisiana', 'back', 'people', 'places'),
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  ('He', 'Albertsons', 'Kroger', 'pineapples', 'bananas'),
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  ('You', 'school', 'work', 'fun', 'fantasy'),
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 3**

What happened!? I can't do my work.\
Where is Texas, Aldi, and frolic?\
Where are the two lists for the places and things?\
I think I might have choosen unwisely.\
\
Maybe there is a bug? Can it be that the construct (?:___) specifying a non-capturing group is in some way interferring with properly capturing its nested pattern? So let's modify these to simply be capturing groups instead.

In [None]:
for sentence in sentences:
    if results := re.search(
          r"^(I|He|You)"
          r" went to"
          r" ([A-Za-z][a-z]*)"
          r"((?:,|,? and) ([A-Za-z][a-z]*))*" # (?:(?:___) (___))* becomes ((?:___) (___))*
          r" looking for"
          r" ([A-Za-z][a-z]*)"
          r"((?:,|,? and) ([A-Za-z][a-z]*))*" # (?:(?:___) (___))* becomes ((?:___) (___))*
          r"\.$"
        , sentence):
          pp.pprint(["Matched:", results.groups(), sentence])
    else: pp.pprint(["Unmatched!", sentence])

[ 'Matched:',
  ('I', 'Louisiana', ' and back', 'back', 'people', ' and places', 'places'),
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  ('He', 'Albertsons', ', and Kroger', 'Kroger', 'pineapples', ', and bananas', 'bananas'),
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  ('You', 'school', ' and work', 'work', 'fun', ' and fantasy', 'fantasy'),
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 4**

What happened!? I still can't do my work.\
If anything, things seem worse than before.\
I think I might have choosen unwisely.\
\
Can it be that capturing elements of a reptition just isn't possible?\
Can it be that the *re.search* function will just not return a list of elements?\
Unfortunately, the RE module is working as designed and hence this limitation will likely never to be lifted. It will only ever just return the last element of the repitition, and will always refuse to capture the remaining parts.\
\
Unfortunately, the developers of the RE module might just have built this bug into the product as a feature. So, let's try one more time. Let's at least attempt to capture the entire text of these lists and not capture their nested elements.

In [None]:
for sentence in sentences:
    if results := re.search(
          r"^(I|He|You)"
          r" went to"
          r" ([A-Za-z][a-z]*)"
          r"((?:(?:,|,? and) (?:[A-Za-z][a-z]*))*)" # ((?:___) (___))* becomes ((?:(?:___) (___))*)
          r" looking for"
          r" ([A-Za-z][a-z]*)"
          r"((?:(?:,|,? and) (?:[A-Za-z][a-z]*))*)" # ((?:___) (___))* becomes ((?:(?:___) (___))*)
          r"\.$"
        , sentence):
          pp.pprint(["Matched:", results.groups(), sentence])
    else: pp.pprint(["Unmatched!", sentence])

[ 'Matched:',
  ('I', 'Louisiana', ', Texas and back', 'people', ' and places'),
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  ('He', 'Albertsons', ', Aldi, and Kroger', 'pineapples', ', and bananas'),
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  ('You', 'school', ' and work', 'fun', ', frolic and fantasy'),
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 5**

What happened? It performed as expected, but it seems I'll never get my work done.

So, it appears this caliber of results is the best we can accomplish using the RE module, and that our initial desire to develop one regular expression pattern to extract structured data from text will not be fulfilled. For this task, it appears more coding will be necessary. Oh but I desperately wanted to avoid procedural coding all together. I wanted the pattern to look similar to the subject. I want the solution to resemble the problem.

Since we are limited to having just a single capture group to return a repitition in its entirety, and since individual elements can not be captured, let's at least merge the patterns for the first and remaining parts into one pattern for a proper comma-seperated string for later processing.

In [None]:
for sentence in sentences:
    if results := re.search(
          r"^(I|He|You)"
          r" went to"
          r" ("
              r"(?:[A-Za-z][a-z]*)"
              r"(?:(?:,|,? and) (?:[A-Za-z][a-z]*))*"
          r")" # ((___)(?:(?:___) (___))*) becomes ((?:___)(?:(?:___) (___))*)
          r" looking for"
          r" ("
              r"(?:[A-Za-z][a-z]*)"
              r"(?:(?:,|,? and) (?:[A-Za-z][a-z]*))*"
          r")" # ((___)(?:(?:___) (___))*) becomes ((?:___)(?:(?:___) (___))*)
          r"\.$"
        , sentence):
          pp.pprint(["Matched:", results.groups(), sentence])
    else: pp.pprint(["Unmatched!", sentence])

[ 'Matched:',
  ('I', 'Louisiana, Texas and back', 'people and places'),
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  ('He', 'Albertsons, Aldi, and Kroger', 'pineapples, and bananas'),
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  ('You', 'school and work', 'fun, frolic and fantasy'),
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 6**

So what happens now? I've got so much work and not enough time.\
I must write more Python code! Ooh, somebody stop me.

In [None]:
for sentence in sentences:
    if results := re.search(
          r"^(?P<who>I|He|You)" # (?P<name>___) denotes to match, group and capture by name
          r" went to"
          r" (?P<where>"
            r"(?:[A-Za-z][a-z]*)"
            r"(?:(?:,|,? and)? (?:[A-Za-z][a-z]*))*"
          r")" # (?P<where>___) denotes to capture group named where
          r" looking for"
          r" (?P<what>"
            r"(?:[A-Za-z][a-z]*)"
            r"(?:(?:,|,? and)? (?:[A-Za-z][a-z]*))*"
          r")" # 'who', 'where', and 'what' are keys of groupdict()
          r"\.$"
        , sentence):
          who = results.groupdict()['who']
          where_to = []
          for word_results in re.finditer(
              r"(?:(?:^|,? |,? and )(?!and)(?P<word>[A-Za-z][a-z]*))",
              results.groupdict()['where']
          ):  where_to.append(word_results.groupdict()['word'])
          what_for = []
          for word_results in re.finditer(
              r"(?:(?:^|,? |,? and )(?!and)(?P<word>[A-Za-z][a-z]*))",
              results.groupdict()['what']
          ):  what_for.append(word_results.groupdict()['word'])
          pp.pprint(["Matched:", who, where_to, what_for, sentence])
    else: pp.pprint(["Unmatched!", sentence])

[ 'Matched:',
  'I',
  ['Louisiana', 'Texas', 'back'],
  ['people', 'places'],
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  'He',
  ['Albertsons', 'Aldi', 'Kroger'],
  ['pineapples', 'bananas'],
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  'You',
  ['school', 'work'],
  ['fun', 'frolic', 'fantasy'],
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 7**

So what happened there? I created a beautiful mess.

But it works. It validates the input text and produces three variables, one containing who, and two containing the list of where to and what for.

But is there another way?\
Let's try using the SNOBOL4python library instead.\
The following code will mount and import the SNOBOL4python package.

In [None]:
!pip install SNOBOL4python==0.4.5
import sys
from pprint import pprint, pformat
## Thirty one (31) flavors of patterns to choose from ...
from SNOBOL4python import ε, σ, π, λ, Λ, ζ, θ, Θ, φ, Φ, α, ω
from SNOBOL4python import ABORT, ANY, ARB, ARBNO, BAL, BREAK, BREAKX, FAIL
from SNOBOL4python import FENCE, LEN, MARB, MARBNO, NOTANY, POS, REM, RPOS
from SNOBOL4python import RTAB, SPAN, SUCCESS, TAB
# Miscellaneous
from SNOBOL4python import GLOBALS, TRACE, PATTERN, Ϩ, STRING
from SNOBOL4python import ALPHABET, DIGITS, UCASE, LCASE, NULL
from SNOBOL4python import nPush, nInc, nPop, Shift, Reduce, Pop
GLOBALS(globals()) # Instantiate the global variable space
#import os
#pp.pprint(os.listdir('/content/modules/'))
#pp.pprint(os.listdir('/content/modules/My Drive/'))

Collecting SNOBOL4python==0.4.4
  Downloading snobol4python-0.4.4-py3-none-any.whl.metadata (823 bytes)
Downloading snobol4python-0.4.4-py3-none-any.whl (25 kB)
Installing collected packages: SNOBOL4python
Successfully installed SNOBOL4python-0.4.4


**STEP 8**

To use the new PATTERN datatype provided by the SNOBOL4python Python module:

*   r"^" becomes *POS*(0)
*   r"$" becomes *RPOS*(0)
*   r"[a-z]" becomes *ANY*(LCASE)
*   r"xyz" becomes σ('xyz'), or alternatively
*   r"xyz" becomes σ('x') + σ('y') + σ('z')
*   r"(\_\_\_)*" becomes *ARBNO*(___)
*   re.search(pattern, subject) becomes subject in PATTERN

Let's start by just getting the PATTERN to work, and not dealing with capturing any results.

In [None]:
word = ANY(UCASE+LCASE) + (SPAN(LCASE) | ε())
delimiter = σ(', and ') | σ(' and ') | σ(', ')
for sentence in sentences:
    if sentence in \
          ( POS(0)
          + (σ('I') | σ('He') | σ('You'))
          + σ(' went to ')
          + word + ARBNO(delimiter + word)
          + σ(' looking for ')
          + word + ARBNO(delimiter + word)
          + σ('.')
          + RPOS(0)
          ):
          pp.pprint(['Matched.', None, None, None, sentence])
    else: pp.pprint(['Unmatched!', None, None, None, sentence])

['Matched.', None, None, None, 'I went to Louisiana, Texas and back looking for people and places.']
['Matched.', None, None, None, 'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
['Matched.', None, None, None, 'You went to school and work looking for fun, frolic and fantasy.']


**STEP 9**

Using the theta function and immediate assignment operator in conjunction with the OUTPUT variable, you can trace the progress of the pattern matching scanner.

In [None]:
word = θ("OUTPUT") + (ANY(UCASE+LCASE) + (SPAN(LCASE) | ε())) @ "OUTPUT" # + θ("OUTPUT")
delimiter = θ("OUTPUT") + (σ(', and ') | σ(' and ') | σ(', ')) @ "OUTPUT"
for sentence in sentences:
    if sentence in \
          ( θ("OUTPUT") + POS(0)
          + θ("OUTPUT") + (σ('I') | σ('He') | σ('You')) @ "OUTPUT"
          + θ("OUTPUT") + σ(' went to ') @ "OUTPUT"
                         + word + ARBNO(delimiter + word)
          + θ("OUTPUT") + σ(' looking for ') @ "OUTPUT"
                        + word + ARBNO(delimiter + word)
          + θ("OUTPUT") + σ('.') @ "OUTPUT"
          + θ("OUTPUT") + RPOS(0)
          ):
          print(f'Matched. {sentence}\n')
    else: print(f'Unmatched! {sentence}\n')

0·0·I·1· went to ·10·Louisiana·19·19·, ·21·Texas·26·26· and ·31·back·35· looking for ·48·people·54·54· and ·59·places·65·.·66·
Matched. I went to Louisiana, Texas and back looking for people and places.

0·0·He·2· went to ·11·Albertsons·21·21·, ·23·Aldi·27·27·, and ·33·Kroger·39· looking for ·52·pineapples·62·62·, and ·68·bananas·75·.·76·
Matched. He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.

0·0·You·3· went to ·12·school·18·18· and ·23·work·27· looking for ·40·fun·43·43·, ·45·frolic·51·51· and ·56·fantasy·63·.·64·
Matched. You went to school and work looking for fun, frolic and fantasy.



**STEP 10**

What just happened? It matched! Is there any hope I can complete my work?

Now, let's decorate the above pattern with Python code to capture the PATTERN matching results into variables containing strings and lists.

* r"(?P<name>\_\_\_) becomes ___ % "name"
* r"*no-can-do*" becomes λ(python_code_string)

In [None]:
word = (ANY(UCASE+LCASE) + (SPAN(LCASE) | ε())) % "w"
delimiter = (σ(', and ') | σ(' and ') | σ(', '))
for sentence in sentences:
    if sentence in \
          ( POS(0)
          + (σ('I') | σ('He') | σ('You')) % "who"
          + σ(' went to ')     + word + λ("where_to = [w]")
          + ARBNO(delimiter    + word + λ("where_to.append(w)"))
          + σ(' looking for ') + word + λ("what_for = [w]")
          + ARBNO(delimiter    + word + λ("what_for.append(w)"))
          + σ('.')
          + RPOS(0)
          ):
          pp.pprint(['Matched:', who, where_to, what_for, sentence])
    else: pp.pprint(['Unmatched!', None, None, None, sentence])

[ 'Matched:',
  'I',
  ['Louisiana', 'Texas', 'back'],
  ['people', 'places'],
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  'He',
  ['Albertsons', 'Aldi', 'Kroger'],
  ['pineapples', 'bananas'],
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  'You',
  ['school', 'work'],
  ['fun', 'frolic', 'fantasy'],
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 11**

What happened? My work is done! It's a miracle.\
The solution does seems to resemble the problem.\
Can it really be that easy?\
Now introducing the PATTERN phi, φ(r'___'). It will match a regular expression. And now a solution using regular expression patterns as an integral part of the new PATTERN datatype.

In [None]:
word = φ(r'(?P<w>[A-Za-z][a-z]*)')
delimiter = φ(r'(?:,? and|,) ')
for sentence in sentences:
    if sentence in \
          ( φ(r'^')
          + (φ(r'(?P<who>I|He|You)'))
          + φ(r' went to ')     + word + λ("where_to = [w]")
          + ARBNO(delimiter     + word + λ("where_to.append(w)"))
          + φ(r' looking for ') + word + λ("what_for = [w]")
          + ARBNO(delimiter     + word + λ("what_for.append(w)"))
          + φ(r'\.$')
          ):
          pp.pprint(['Matched:', who, where_to, what_for, sentence])
    else: pp.pprint(['Unmatched!', None, None, None, sentence])

[ 'Matched:',
  'I',
  ['Louisiana', 'Texas', 'back'],
  ['people', 'places'],
  'I went to Louisiana, Texas and back looking for people and places.']
[ 'Matched:',
  'He',
  ['Albertsons', 'Aldi', 'Kroger'],
  ['pineapples', 'bananas'],
  'He went to Albertsons, Aldi, and Kroger looking for pineapples, and bananas.']
[ 'Matched:',
  'You',
  ['school', 'work'],
  ['fun', 'frolic', 'fantasy'],
  'You went to school and work looking for fun, frolic and fantasy.']


**STEP 12**

What happens next! You can do any work in which you want!\
This SNOBOL4python module can process all four levels of the Chompsky heirarchy.\
This concludes this 12-step program. Enjoy Nirvana.

Now, let's try some more examples. Let's process those TASA Treebank trees in the file from assignment #3, but let's first process those sentences after being POS tagged by the UCREL CLAWS.

In [None]:
CLAWS_5_in_TASA = """\
1_CRD :_PUN That_CJT the_AT0 power_NN1 of_PRF taxing_VVG it_PNP by_PRP the_AT0
states_NN2 may_VM0 be_VBI exercised_VVN so_AV0 as_AV0 to_TO0 destroy_VVI
it_PNP ,_PUN is_VBZ too_AV0 obvious_AJ0 to_TO0 be_VBI denied_VVN ._PUN
2_CRD :_PUN None_PNI ever_AV0 penned_VVD a_AT0 manifesto_NN1 as_AV0
stirring_AJ0 as_CJS the_AT0 one_PNI that_CJT appeared_VVD in_PRP the_AT0
first_ORD issue_NN1 of_PRF the_AT0 liberator_NN1 ,_PUN and_CJC no_AT0
other_AJ0 abolitionist_NN1 document_NN1 is_VBZ so_AV0 well_AV0 remembered_VVN
._PUN
"""

In [None]:
claws_info = \
    ( POS(0)
    + λ("mem = dict()")
    + ARBNO(
        ( SPAN(DIGITS) % "num" + σ('_CRD :_PUN')
        + λ("num = int(num)")
        | (NOTANY("_") + BREAK("_")) % "wrd"
        + σ('_')
        + (ANY(UCASE) + SPAN(DIGITS+UCASE)) % "tag"
        + λ("if wrd not in mem:      mem[wrd] = dict()")
        + λ("if tag not in mem[wrd]: mem[wrd][tag] = 0")
        + λ("mem[wrd][tag] += 1")
        )
      + (σ(' \n') | σ(' '))
      )
    + RPOS(0)
    )
pp.pprint(claws_info)

Σ(*4)


In [None]:
# The next examples use files from your Google Drive
from google.colab import drive
drive.mount('/content/modules', force_remount=True)
sys.path.append('/content/modules/My Drive/')

Mounted at /content/modules


In [None]:
mem = None
with open("/content/modules/My Drive/CLAWS5inTASA.dat", "r") as claws_file:
    lines = []
    while line := claws_file.readline():
        lines.append(line[0:-1])
    claws_data = ''.join(lines)
    if not claws_data in claws_info:
        print("Yikes")
    pp.pprint(mem)

In [None]:
VBG_in_TASA = """\
(S (SBAR (IN that) (S (NP (NP (DT the) (NN power)) (PP (IN of) (S (VP (VBG
taxing) (NP (PRP it)) (PP (IN by) (NP (DT the) (NNS states))))))) (VP (MD
may) (VP (VB be) (VP (VBN exercised) (ADVP (RB so) (RB as)) (S (VP (TO to)
(VP (VB destroy) (NP (PRP it))))))))) (, ,)) (VP (VBZ is) (ADJP (RB too)
(JJ obvious)) (S (VP (TO to) (VP (VB be) (VP (VBN denied)))))) (.  .))

(S (S (NP (NN none)) (ADVP (RB ever)) (VP (VBN penned) (NP (DT a) (NN
manifesto)) (PP (IN as) (S (VP (VBG stirring) (PP (IN as) (NP (NP (DT the)
(NN one)) (SBAR (WHNP (WDT that)) (S (VP (VBD appeared) (PP (IN in) (NP (NP
(DT the) (JJ first) (NN issue)) (PP (IN of) (NP (DT the) (NN
liberator))))))))))))))) (, ,) (CC and) (S (NP (DT no) (JJ other) (NN
abolitionist) (NN document)) (VP (VBZ is) (ADVP (RB so) (RB well)) (VP (VBN
remembered)))) (.  .))

"""

In [None]:
delim     = SPAN(" \n")
word      = NOTANY("( )\n") + BREAK("( )\n")
group     = σ('(') + word + ARBNO(delim + (ζ('group') | word)) + σ(')')
treebank  = POS(0) + ARBNO(ARBNO(group) + delim) + RPOS(0)
VBG_in_TASA in treebank # % "OUTPUT"

True

In [None]:
def init_list(v): return λ(f"{v} = None; stack = []")
def push_list(v): return λ(f"stack.append(list()); stack[-1].append({v})")
def push_item(v): return λ(f"stack[-1].append({v})")
def pop_list():   return λ(f"stack[-2].append(tuple(stack.pop()))")
def pop_final(v): return λ(f"{v} = tuple(stack.pop())")
delim =           SPAN(" \n")
word =            NOTANY("( )\n") + BREAK("( )\n")
group =           ( σ('(')
                  + word % "tag"
                  + push_list("tag")
                  + ARBNO(delim + (ζ('group') | word % "wrd" + push_item("wrd")))
                  + pop_list()
                  + σ(')')
                  )
treebank =        ( POS(0)
                  + init_list("bank")
                  + push_list("'BANK'")
                  + ARBNO(push_list("'ROOT'") + ARBNO(group) + pop_list() + delim)
                  + pop_final("bank")
                  + RPOS(0)
                  )
pp.pprint([delim, word, group, treebank])

[SPAN(' \n'), Σ(*2), Σ(*6), Σ(*6)]


In [None]:
bank = None
if VBG_in_TASA in treebank:
    pp.pprint(bank[1])
else: print("Boo!")

( 'ROOT',
  ( 'S',
    ( 'SBAR',
      ('IN', 'that'),
      ( 'S',
        ( 'NP',
          ('NP', ('DT', 'the'), ('NN', 'power')),
          ( 'PP',
            ('IN', 'of'),
            ( 'S',
              ( 'VP',
                ('VBG', 'taxing'),
                ('NP', ('PRP', 'it')),
                ('PP', ('IN', 'by'), ('NP', ('DT', 'the'), ('NNS', 'states'))))))),
        ( 'VP',
          ('MD', 'may'),
          ( 'VP',
            ('VB', 'be'),
            ( 'VP',
              ('VBN', 'exercised'),
              ('ADVP', ('RB', 'so'), ('RB', 'as')),
              ('S', ('VP', ('TO', 'to'), ('VP', ('VB', 'destroy'), ('NP', ('PRP', 'it'))))))))),
      (',', ',')),
    ( 'VP',
      ('VBZ', 'is'),
      ('ADJP', ('RB', 'too'), ('JJ', 'obvious')),
      ('S', ('VP', ('TO', 'to'), ('VP', ('VB', 'be'), ('VP', ('VBN', 'denied')))))),
    ('.', '.')))


In [None]:
bank = None
with open("/content/modules/My Drive/VBGinTASA.dat", "r") as bank_file:
    bank_source = bank_file.read()
    if bank_source in POS(0) + BAL() + RPOS(0):
        if bank_source in treebank:
            print(len(bank), "trees processed.")
    else: print("Boo!")

250 trees processed.
