# Chomsky Normal Form (CNF)

Chomsky Normal Form (CNF) is a way of structuring context-free grammars in formal language theory. A context-free grammar is said to be in Chomsky Normal Form if all of its production rules are of one of the following two forms:

1. **A → BC**: A non-terminal symbol (A) produces exactly two non-terminal symbols (B and C).
2. **A → a**: A non-terminal symbol (A) produces exactly one terminal symbol (a).

Additionally, the grammar may include the rule **S → ε**, where **S** is the start symbol and **ε** represents the empty string, but only if the empty string is in the language generated by the grammar.

---

## Key Characteristics of Chomsky Normal Form

- **Binary Productions**: All productions that generate non-terminals are binary (i.e., they produce exactly two non-terminals).
- **Terminal Productions**: Productions that generate terminals produce exactly one terminal symbol.
- **No ε-Productions (except for the start symbol)**: The grammar cannot have rules like **A → ε** unless **A** is the start symbol and the empty string is part of the language.
- **No Unit Productions**: There are no rules of the form **A → B**, where both **A** and **B** are non-terminals.

---

## Why Chomsky Normal Form?

- **Simplifies Parsing**: CNF is used in algorithms like the CYK (Cocke-Younger-Kasami) parsing algorithm, which can parse any context-free language in polynomial time.
- **Theoretical Analysis**: CNF simplifies the theoretical analysis of context-free grammars, making it easier to prove properties about them.

---

## Example Conversion to Chomsky Normal Form

Consider the following context-free grammar:



This grammar generates strings like `ε`, `ab`, `aabb`, `aaabbb`, etc.

### Steps to Convert to CNF:

1. **Remove ε-Productions** (except for the start symbol):
   - Introduce a new start symbol **S₀** and add the rule **S₀ → S**.
   - Replace **S → ε** with **S₀ → ε**.

2. **Remove Unit Productions**:
   - In this case, there are no unit productions.

3. **Replace Productions with More than Two Non-Terminals**:
   - The production **S → aSb** is not in CNF. We can introduce new non-terminals to break it down:
     - **S → A₁B**
     - **A₁ → a**
     - **B → SB'**
     - **B' → b**

4. **Final CNF Grammar**: