# CFG

This is how we represent a CFG

## Symbols

Its symbols are plain python strings. 

A nonterminal is a brackted string such as

        [S]

In [48]:
from symbol import is_terminal, is_nonterminal

In [49]:
is_terminal('a')

True

In [50]:
is_nonterminal('[X]')

True

## Rules

Rules are objects made of a LHS symbol, a RHS sequence (of symbols) and a probability.

In [51]:
from rule import Rule

In [52]:
r = Rule('[S]', ['[X]', 'a'], 1.0)

You can print a rule, you can access its attributes, and you can hash rules with containers such as dict and set.

In [53]:
print r

[S] -> [X] a (1.0)


In [54]:
print r.prob

1.0


In [55]:
r in set([r])

True

In [56]:
D = {r: 1}
D

{[S] -> [X] a (1.0): 1}

## Grammar

A PCFG is organised pretty much as a dictionary mapping from LHS symbols to their rewrite rules.

In [57]:
from cfg import WCFG

In [58]:
G = WCFG()

We can add rules

In [59]:
G.add(Rule('[S]', ['[X]'], 0.0))

In [60]:
G.add(Rule('[S]', ['[S]', '[X]'], 0.0))
G.add(Rule('[X]', ['a'], 0.0))

We can print the grammar

In [61]:
print G

[S] -> [X] (0.0)
[S] -> [S] [X] (0.0)
[X] -> a (0.0)


we can test whether there are rewrite rules for a certain LHS symbol

In [62]:
G.can_rewrite('[S]')

True

In [63]:
G.can_rewrite('a')

False

we can get the set of rewrite rules for a certain LHS symbol

In [64]:
G.get('[S]')

[[S] -> [X] (0.0), [S] -> [S] [X] (0.0)]

In [65]:
G.get('[X]')

[[X] -> a (0.0)]

and when a symbol cannot be rewritten, the grammar will return an empty set

In [66]:
G.get('a')

frozenset()

We can also iterate through rules in the grammar.

Note that the followin is basically counting how many rules we have in the grammar.

In [67]:
sum(1 for r in G)

3

which can also be done in a more efficient way

In [68]:
len(G)

3

Finally we can have access to the set of terminals and nonterminals of the grammar

In [69]:
'[S]' in G.nonterminals

True

In [70]:
'a' in G.terminals

True

## Read from file

Grammar files contain one rule per line.
Each line is a triple with fields separated by '|||'.

The first field is the rule's LHS symbol, the second symbol is the rule's RHS sequence, and the last field is the rule's probability.

Example:

    
        [S] ||| [S] [X] ||| 0.5
       

In [71]:
from cfg import read_grammar_rules

In [72]:
# first we open a file
istream = open('examples/arithmetic')

In [73]:
# then we read rules from this file initialising a WCFG object
G = WCFG(read_grammar_rules(istream))

In [74]:
print G

[T] -> [P] (0.5)
[T] -> [T] * [P] (0.5)
[E] -> [T] (0.5)
[E] -> [E] + [T] (0.4)
[E] -> [T] + [E] (0.1)
[P] -> a (1.0)
