#Context-Free Languages

Context-free languages are a class of formal languages in formal language theory. They are generated by context-free grammars and can be recognized by pushdown automata. Context-free languages are widely used in computer science, particularly in the theory of computation and formal language processing.



#Context-Free Grammars

Context-free grammars are a type of formal grammar in which each production rule is of the form non-terminal → string of terminals and non-terminals. Context-free grammars are used to generate context-free languages, which are important in various areas of computer science, such as compiler design, natural language processing, and artificial intelligence.



In [1]:
# Example of a context-free grammar for generating arithmetic expressions
# S -> E
# E -> E + T | T
# T -> T * F | F
# F -> (E) | id

# Let's create a Python class to represent a context-free grammar
class ContextFreeGrammar:
    def __init__(self, rules):
        self.rules = rules

    def generate(self, start_symbol, max_length):
        generated_strings = []
        self._generate_recursive(start_symbol, '', generated_strings, max_length)
        return generated_strings

    def _generate_recursive(self, symbol, current_string, generated_strings, max_length):
        if len(current_string) > max_length:
            return
        if symbol not in self.rules:
            generated_strings.append(current_string)
            return

        for rule in self.rules[symbol]:
            for s in rule:
                self._generate_recursive(s, current_string + s, generated_strings, max_length)

# Example usage:
rules = {
    'S': ['E'],
    'E': ['E+T', 'T'],
    'T': ['T*F', 'F'],
    'F': ['(E)', 'id']
}

cfg = ContextFreeGrammar(rules)
generated_strings = cfg.generate('S', 10)
print("Generated strings:", generated_strings)


Generated strings: ['EEEEEEEEE+', 'EEEEEEEE+', 'EEEEEEEET*', 'EEEEEEEET*', 'EEEEEEE+', 'EEEEEEETT*', 'EEEEEEET*', 'EEEEEEETF(', 'EEEEEEETF)', 'EEEEEEETFi', 'EEEEEEETFd', 'EEEEEEETF(', 'EEEEEEETF)', 'EEEEEEETFi', 'EEEEEEETFd', 'EEEEEEETT*', 'EEEEEEET*', 'EEEEEEETF(', 'EEEEEEETF)', 'EEEEEEETFi', 'EEEEEEETFd', 'EEEEEEETF(', 'EEEEEEETF)', 'EEEEEEETFi', 'EEEEEEETFd', 'EEEEEE+', 'EEEEEETTT*', 'EEEEEETT*', 'EEEEEETTF(', 'EEEEEETTF)', 'EEEEEETTFi', 'EEEEEETTFd', 'EEEEEETTF(', 'EEEEEETTF)', 'EEEEEETTFi', 'EEEEEETTFd', 'EEEEEET*', 'EEEEEETF(', 'EEEEEETFE+', 'EEEEEETF)', 'EEEEEETFi', 'EEEEEETFd', 'EEEEEETF(', 'EEEEEETFE+', 'EEEEEETF)', 'EEEEEETFi', 'EEEEEETFd', 'EEEEEETTT*', 'EEEEEETT*', 'EEEEEETTF(', 'EEEEEETTF)', 'EEEEEETTFi', 'EEEEEETTFd', 'EEEEEETTF(', 'EEEEEETTF)', 'EEEEEETTFi', 'EEEEEETTFd', 'EEEEEET*', 'EEEEEETF(', 'EEEEEETFE+', 'EEEEEETF)', 'EEEEEETFi', 'EEEEEETFd', 'EEEEEETF(', 'EEEEEETFE+', 'EEEEEETF)', 'EEEEEETFi', 'EEEEEETFd', 'EEEEE+', 'EEEEETTTT*', 'EEEEETTT*', 'EEEEETTTF(', 'EEEEET

#Ambiguity

Ambiguity in the context of context-free grammars refers to the situation where a grammar can generate more than one parse tree for a single input string. Ambiguity can cause difficulties in parsing and can lead to different interpretations of the same input.



In [2]:
# Example of an ambiguous context-free grammar
# S -> S + S | id

# Parsing expression "id + id + id" can be ambiguous
# It can be interpreted as (id + id) + id or id + (id + id)

# Here's a simple demonstration to show the ambiguity
def parse_expression(expression):
    if expression.count('+') == 1:
        parts = expression.split('+')
        return parts[0], parts[1]
    else:
        return expression[:expression.find('+')], expression[expression.find('+') + 1:]

expression = "id + id + id"
left_associative = parse_expression(expression)
right_associative = parse_expression(expression[::-1])[::-1]

print("Left associative parsing:", left_associative)
print("Right associative parsing:", right_associative)

Left associative parsing: ('id ', ' id + id')
Right associative parsing: (' di + di', 'di ')


#Chomsky Normal Form

Chomsky Normal Form (CNF) is a specific form that context-free grammars can be transformed into. In CNF, each production rule is of the form A → BC or A → a, where A, B, and C are non-terminal symbols and a is a terminal symbol. CNF simplifies the process of parsing and analyzing context-free grammars.



In [18]:
import nltk
from nltk import CFG

def convert_to_cnf(grammar):
    cnf_grammar = []
    for production in grammar.productions():
        lhs = production.lhs()
        rhs = production.rhs()

        if len(rhs) == 1 and isinstance(rhs[0], str):
            # Terminal production, no need to modify
            cnf_grammar.append((lhs, rhs))
        elif len(rhs) == 2 and all(isinstance(x, str) for x in rhs):
            # Binary production, no need to modify
            cnf_grammar.append((lhs, rhs))
        else:
            # Convert to binary productions
            new_rhs = []
            for i in range(len(rhs)):
                if not isinstance(rhs[i], str):
                    new_nonterminal = nltk.Nonterminal('_' + str(rhs[i]))
                    cnf_grammar.append((new_nonterminal, [rhs[i]]))
                    new_rhs.append(new_nonterminal)
                else:
                    new_rhs.append(rhs[i])
            cnf_grammar.append((lhs, new_rhs))
    return cnf_grammar

# Example grammar
grammar = CFG.fromstring("""
    S -> NP VP
    NP -> Det N
    VP -> V NP
    Det -> 'the'
    N -> 'cat' | 'dog'
    V -> 'chased'
""")

cnf_grammar = convert_to_cnf(grammar)
for production in cnf_grammar:
    print(production)


(_NP, [NP])
(_VP, [VP])
(S, [_NP, _VP])
(_Det, [Det])
(_N, [N])
(NP, [_Det, _N])
(_V, [V])
(_NP, [NP])
(VP, [_V, _NP])
(Det, ('the',))
(N, ('cat',))
(N, ('dog',))
(V, ('chased',))


#CYK Algorithm

The Cocke-Younger-Kasami (CYK) algorithm is a parsing algorithm that determines whether a given string can be generated by a given context-free grammar. It works by filling up a table bottom-up, considering all possible substrings of the input string and the non-terminals that can generate them.



In [19]:
import numpy as np

def cyk(grammar, word):
    non_terminals = list(grammar.productions())
    n = len(word)

    # Initialize table
    table = np.empty((n, n), dtype=object)
    for i in range(n):
        for j in range(n):
            table[i, j] = set()

    # Fill table
    for i in range(n):
        for production in non_terminals:
            if len(production.rhs()) == 1 and production.rhs()[0] == word[i]:
                table[i, i].add(production.lhs())

    for length in range(2, n + 1):
        for i in range(n - length + 1):
            j = i + length - 1
            for k in range(i, j):
                for production in non_terminals:
                    if len(production.rhs()) == 2:
                        if production.rhs()[0] in table[i, k] and production.rhs()[1] in table[k + 1, j]:
                            table[i, j].add(production.lhs())

    # Check if the start symbol is in the top-right corner
    return grammar.start() in table[0, n - 1]

# Example usage
grammar = nltk.CFG.fromstring("""
    S -> NP VP
    NP -> Det N | 'I'
    VP -> V NP | VP PP
    PP -> P NP
    Det -> 'a' | 'an' | 'the'
    N -> 'boy' | 'dog' | 'telescope'
    V -> 'saw' | 'ate' | 'walked'
    P -> 'with' | 'in'
""")

word = "the boy saw a dog with a telescope".split()
if cyk(grammar, word):
    print("The string can be generated by the grammar.")
else:
    print("The string cannot be generated by the grammar.")


The string can be generated by the grammar.


#Pushdown Automata

Pushdown automata (PDA) are finite automata augmented with a stack, allowing them to recognize context-free languages. PDAs have a finite set of states, an input alphabet, a stack alphabet, a transition function, and one or more final states. They are commonly used in the theory of computation to model processes that involve nested structures.



In [33]:
%pip install automata-lib

Collecting automata-lib
  Downloading automata_lib-8.3.0-py3-none-any.whl (124 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.3/124.3 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Collecting cached-method>=0.1.0 (from automata-lib)
  Downloading cached_method-0.1.0-py3-none-any.whl (4.2 kB)
Installing collected packages: cached-method, automata-lib
Successfully installed automata-lib-8.3.0 cached-method-0.1.0


In [35]:
from automata.pda.dpda import DPDA

# Define the PDA
pda = DPDA(
    states={'q0', 'q1', 'q2', 'q3'},
    input_symbols={'a', 'b'},
    stack_symbols={'Z', 'a'},
    transitions={
        'q0': {
            '': {'Z': ('q1', ('Z',))}
        },
        'q1': {
            'a': {'Z': ('q1', ('a', 'Z')), 'a': ('q1', ('a', 'a'))},
            'b': {'a': ('q2', ('',))}
        },
        'q2': {
            'b': {'a': ('q2', ('',))}
        }
    },
    initial_state='q0',
    initial_stack_symbol='Z',
    final_states={'q3'}
)

# Check if a string is accepted by the PDA
def is_accepted_by_pda(input_string):
    return pda.accepts_input(input_string)

# Test the PDA
test_string = "aabbb"
print(f"Input string '{test_string}' is accepted by the PDA: {is_accepted_by_pda(test_string)}")


Input string 'aabbb' is accepted by the PDA: False
