# Learning from:

Getting Started with Pyparsing by Paul McGuire Publisher: O'Reilly Media 
http://shop.oreilly.com/product/9780596514235.do


In [19]:
from pyparsing import *

In [20]:
import random

### "Hello World ! on Steroids" 
page 9. 

The task is to write a parser for these strings:

Hello, World! <br>
Hi, Mom! <br>
Good morning, Miss Crabtree!   
Yo, Adrian!   
Whattup, G? <br>
How's it goin', Dude? <br>
Hey, Jude! <br>
Goodbye, Mr. Chips! <br>




Giving the input values with a list of strings:

In [21]:
tests=['Hello, World!', 'Hi, Mom!', 
      'Good morning, Miss Crabtree!',
      'Yo, Adrian!',
      'Whattup, G?',
      'Hey, Jude!',
      'Goodbye, Mr. Chips!',
      'How\'s it going\', Dude?']

Printing the input values to check:

In [22]:
print(tests)

['Hello, World!', 'Hi, Mom!', 'Good morning, Miss Crabtree!', 'Yo, Adrian!', 'Whattup, G?', 'Hey, Jude!', 'Goodbye, Mr. Chips!', "How's it going', Dude?"]


"The first step is to identify the pattern that they all follow" <br>

writing this pattern as a BNF:

greeting ::= salutation comma greetee endpunc
salutation ::= word+ <br>
comma ::= , <br>
greetee ::= word+ <br>
word ::= a collection of one or more characters, which are any alpha or 'or <br>
endpunc ::= ! | ? <br>




In [23]:
word = Word(alphas+"'.")
salutation = OneOrMore(word)
comma = Literal(",")
greetee = OneOrMore(word)
endpunc = oneOf("! ?")
greeting = salutation + comma + greetee + endpunc



the greeting variable has the 'formula' for the appropriate parse and is like an object that can do the parse and other operations. Doing the parse for the element 3 in the list (arrays in python begin with the 0 element).

In [24]:
greeting.parseString(tests[2])

(['Good', 'morning', ',', 'Miss', 'Crabtree', '!'], {})

Doing the parse for all the items in the list

In [25]:
for t in tests:
    view = greeting.parseString(t)
    print(view)

['Hello', ',', 'World', '!']
['Hi', ',', 'Mom', '!']
['Good', 'morning', ',', 'Miss', 'Crabtree', '!']
['Yo', ',', 'Adrian', '!']
['Whattup', ',', 'G', '?']
['Hey', ',', 'Jude', '!']
['Goodbye', ',', 'Mr.', 'Chips', '!']
["How's", 'it', "going'", ',', 'Dude', '?']


"to identify the tokens that compose the initial part of the greeting--the salutation--we need to iterate over the results until we reach the comma token:"

In [26]:
for t in tests:
    results = greeting.parseString(t)
    salutation = []
    for token in results:
        if token == ",": break
        salutation.append(token)
    print(salutation) 
        
    

['Hello']
['Hi']
['Good', 'morning']
['Yo']
['Whattup']
['Hey']
['Goodbye']
["How's", 'it', "going'"]


"Since we know that the salutation and greetee parts of the greeting are logical groups, we can use pyparsing's Group class to give more structure to the returned results. By changing the definitions of salutation and greetee to:"   (not so clear)

In [27]:
salutation = Group( OneOrMore(word))

In [28]:
print(salutation)

Group:({W:(ABCD...)}...)


In [29]:
greetee = Group( OneOrMore(word) )

In [30]:
print(greetee)

Group:({W:(ABCD...)}...)


The results are not viewed as in the example

In [31]:
for t in tests:
    view = greeting.parseString(t)
    print(view)

['Hello', ',', 'World', '!']
['Hi', ',', 'Mom', '!']
['Good', 'morning', ',', 'Miss', 'Crabtree', '!']
['Yo', ',', 'Adrian', '!']
['Whattup', ',', 'G', '?']
['Hey', ',', 'Jude', '!']
['Goodbye', ',', 'Mr.', 'Chips', '!']
["How's", 'it', "going'", ',', 'Dude', '?']


Maybe declaring again the structure of the parse, but now with the Group class 

In [32]:
word = Word(alphas+"'.")
salutation = Group(OneOrMore(word))
comma = Literal(",")
greetee = Group(OneOrMore(word))
endpunc = oneOf("! ?")
greeting = salutation + comma + greetee + endpunc

In [33]:
for t in tests:
    view = greeting.parseString(t)
    print(view)

[['Hello'], ',', ['World'], '!']
[['Hi'], ',', ['Mom'], '!']
[['Good', 'morning'], ',', ['Miss', 'Crabtree'], '!']
[['Yo'], ',', ['Adrian'], '!']
[['Whattup'], ',', ['G'], '?']
[['Hey'], ',', ['Jude'], '!']
[['Goodbye'], ',', ['Mr.', 'Chips'], '!']
[["How's", 'it', "going'"], ',', ['Dude'], '?']


Alright, it was necessary to be declared all over again. Just declaring the greeting formula again it's not enough, see [errorJustGreeting](#errorJustGreeting) <!-- How to reference another cell http://stackoverflow.com/a/28080529/7896359 @Amit -->

Using list-to-variable assignment to access the different parts:

In [34]:
for t in tests:
    salutation, dummy, greetee, endpunc = greeting.parseString(t)
    print(salutation, greetee, endpunc)

['Hello'] ['World'] !
['Hi'] ['Mom'] !
['Good', 'morning'] ['Miss', 'Crabtree'] !
['Yo'] ['Adrian'] !
['Whattup'] ['G'] ?
['Hey'] ['Jude'] !
['Goodbye'] ['Mr.', 'Chips'] !
["How's", 'it', "going'"] ['Dude'] ?


"The comma is a very important element during parsing, since it shows where the parser stops reading the salutation and starts the greetee. But in the returned results, the comma is not really very interesting at all, and it would be nice to supress it from the returned results. You can do this by wrapping the definition of comma in a pyparsing Supress instance:"

In [35]:
#comma = Suppress( Literal(",")) # or
comma = Literal(",").suppress() #or
#comma = Suppress(",") # the three are equivalent

<a id='errorJustGreeting'></a>
Seeing again the results, now with the suppress command, and declaring again greeting formula

In [46]:
greeting = salutation + comma + greetee + endpunc

AttributeError: 'Suppress' object has no attribute '_ParseResults__tokdict'

It seems that it is important to declare all over again

In [37]:
word = Word(alphas+"'.")
salutation = Group(OneOrMore(word))
comma = Literal(",").suppress()
greetee = Group(OneOrMore(word))
endpunc = oneOf("! ?")
greeting = salutation + comma + greetee + endpunc

In [38]:
for t in tests:
    view = greeting.parseString(t)
    print(view)

[['Hello'], ['World'], '!']
[['Hi'], ['Mom'], '!']
[['Good', 'morning'], ['Miss', 'Crabtree'], '!']
[['Yo'], ['Adrian'], '!']
[['Whattup'], ['G'], '?']
[['Hey'], ['Jude'], '!']
[['Goodbye'], ['Mr.', 'Chips'], '!']
[["How's", 'it', "going'"], ['Dude'], '?']


"Now that we have a decent parser and a good way to get out the results, we can start to have fun with the test data. First, let's accumulate the salutations and greetees into lists of their own:"

In [39]:
salutes=[]

In [40]:
greetees = []

In [41]:
for t in tests:
    salutation, greetee, endpunc = greeting.parseString(t)
    salutes.append( (" ".join(salutation), endpunc) )
    greetees.append( " ".join(greetee) )

Seeing what is in salutes

In [42]:
print(salutes)

[('Hello', '!'), ('Hi', '!'), ('Good morning', '!'), ('Yo', '!'), ('Whattup', '?'), ('Hey', '!'), ('Goodbye', '!'), ("How's it going'", '?')]


In [43]:
print(salutes[2])

('Good morning', '!')


what is in greetees

In [44]:
print(greetees)

['World', 'Mom', 'Miss Crabtree', 'Adrian', 'G', 'Jude', 'Mr. Chips', 'Dude']


"Now that we have collected these assorted names and salutations, we can use them to contrive some additional, never-before-seen greetings and introductions."

In [45]:
for i in range(50):
    salute = random.choice( salutes )
    greetee = random.choice( greetees )
    print("%s, %s%s" % ( salute[0], greetee, salute[1] ))

Goodbye, Miss Crabtree!
Good morning, Miss Crabtree!
Yo, World!
Whattup, World?
Hi, G!
How's it going', Mr. Chips?
Goodbye, Adrian!
Hello, Dude!
Whattup, Mom?
Hello, World!
Good morning, Miss Crabtree!
Hey, Dude!
How's it going', Dude?
Hi, G!
Yo, Miss Crabtree!
Yo, World!
Good morning, G!
Goodbye, Mom!
Whattup, Mom?
Hi, Jude!
Hello, Jude!
Hey, Dude!
How's it going', Mr. Chips?
Hey, Adrian!
Goodbye, Mom!
Goodbye, Mr. Chips!
Goodbye, Mom!
Yo, Mom!
Whattup, Jude?
Hi, Miss Crabtree!
Good morning, Miss Crabtree!
Hey, Mom!
Hey, Mr. Chips!
Hey, Dude!
Hi, Mr. Chips!
Hey, World!
Goodbye, Mom!
Goodbye, Mom!
Whattup, Miss Crabtree?
Whattup, Mom?
Yo, Adrian!
Whattup, World?
Whattup, G?
Yo, Mom!
Hello, Jude!
Hey, Dude!
Whattup, Jude?
Goodbye, Mr. Chips!
Good morning, Mom!
Hi, Jude!


"We can also simulate some introductions with the following code:"

In [48]:
for i in range(50):
    print('%s, say "%s" to %s.' % (random.choice( greetees ),
                                   "".join( random.choice( salutes ) ),
                                  random.choice( greetees ) ) )

Adrian, say "Hey!" to Mom.
Mom, say "Good morning!" to G.
Mr. Chips, say "Good morning!" to Miss Crabtree.
Mr. Chips, say "Hello!" to World.
G, say "Hi!" to Mom.
Dude, say "Hello!" to World.
Dude, say "Whattup?" to G.
G, say "Whattup?" to Jude.
Dude, say "How's it going'?" to Mr. Chips.
Mr. Chips, say "Hey!" to Mr. Chips.
Dude, say "Good morning!" to Dude.
G, say "Whattup?" to World.
G, say "Good morning!" to Miss Crabtree.
Jude, say "Whattup?" to Mom.
Mr. Chips, say "Goodbye!" to Adrian.
Adrian, say "Hello!" to Jude.
Miss Crabtree, say "Whattup?" to World.
Dude, say "Good morning!" to Dude.
Miss Crabtree, say "Whattup?" to Miss Crabtree.
Miss Crabtree, say "Good morning!" to Dude.
Mom, say "How's it going'?" to Mr. Chips.
Dude, say "Whattup?" to Adrian.
Mr. Chips, say "Whattup?" to Mr. Chips.
Adrian, say "Hey!" to Miss Crabtree.
Miss Crabtree, say "Yo!" to G.
Mom, say "Hey!" to G.
Jude, say "How's it going'?" to Mr. Chips.
World, say "Hi!" to G.
Jude, say "Hello!" to Miss Crabtree.
Wo