## Context-free Grammars (CFGs)

A context-free grammar (CFG) is a set of recursive rewriting rules (or productions) used to generate patterns of strings.

A CFG consists of the following components:

+ a set of **terminal symbols**, which are the characters of the alphabet that appear in the strings generated by the grammar;
+ a set of **nonterminal symbols**, which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols;
+ a set of **productions**, which are rules for replacing (or rewriting) nonterminal symbols (on the left side of the production) in a string with other nonterminal or terminal symbols (on the right side of the production);
+ a **start symbol**, which is a special nonterminal symbol that appears in the initial string generated by the grammar.

To generate a string of terminal symbols from a CFG, we:

+ begin with a string consisting of the start symbol;
+ apply one of the productions with the start symbol on the left hand size, replacing the start symbol with the right hand side of the production;
+ repeat the process of selecting nonterminal symbols in the string, and replacing them with the right hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols.

### A toy CFG for English:

In [9]:
import random

def S():
    print(DP(), VP())
    
def DP():
    detPhrase = D() + ' ' + N()
    return detPhrase

def VP():
    randInt = random.randint(0,1)
    if randInt == 0:
        return V()
    else:
        return V() + ' ' + DP()

def N():
    nouns = ['cat', 'tree', 'Bella']
    randInt = random.randint(0,len(nouns)-1)
    return nouns[randInt]

def D():
    articles = ['the']
    randInt = random.randint(0,len(articles)-1)
    return articles[randInt]

def V():
    verbs = ['ran','kissed']
    randInt = random.randint(0,len(verbs)-1)
    return verbs[randInt]

for num in range(30):
    S()

the Bella ran the cat
the cat ran
the tree ran the cat
the Bella ran
the tree ran
the Bella kissed
the Bella ran
the cat kissed the Bella
the cat kissed the tree
the cat ran the cat
the cat ran the Bella
the cat ran
the Bella ran
the cat ran
the cat ran the tree
the tree ran
the Bella kissed the cat
the Bella kissed
the cat ran
the cat ran
the tree kissed
the cat ran the tree
the Bella ran
the tree ran the cat
the Bella kissed
the tree ran
the tree ran
the cat kissed
the Bella kissed
the Bella ran


In [None]:
import nltk
grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | Det N | Det N PP
Det -> "a"
N -> "man"
P -> "in"
""")

s = input('Your sentence: ')
smod = s.split()
rd_parser = nltk.RecursiveDescentParser(grammar)
for tree in rd_parser.parse(smod):
    print(tree)