## **UNIT II: Syntax Analysis**

* Role of Parser
* Types of Grammars
* Error Handling in Parsing
* Context-Free Grammars
* Writing a Grammar
* **Top-Down Parsing Techniques**

  * General Strategy
  * Recursive Descent Parser
  * Predictive Parser – **LL(1) Parser**
* **Bottom-Up Parsing Techniques**

  * Shift Reduce Parser
  * LR Parser
  * LR(0) Items
  * Construction of SLR Parsing Table
  * Introduction to **LALR Parser**
* Error Handling and Recovery in Syntax Analyzer
* **YACC Tool**

---


---

# **UNIT II: Syntax Analysis**

## 1. **Role of Parser**

* The **parser** is the part of the compiler that:

  * Takes **tokens** from the lexical analyzer.
  * Checks if the sequence of tokens follows the **grammar rules** of the language.
  * Produces a **parse tree / syntax tree** showing the structure of the input.
* **Main jobs:**

  1. Detect **syntax errors**.
  2. Recover from errors if possible.
  3. Help in building an intermediate representation for the compiler.

---

## 2. **Types of Grammars (Chomsky Hierarchy)**

1. **Type 0 – Unrestricted Grammar**

   * No restrictions on production rules.
   * Most powerful, but not practical.

2. **Type 1 – Context-Sensitive Grammar (CSG)**

   * Rule: αAβ → αγβ (length of RHS ≥ LHS).
   * Example: used in natural languages.

3. **Type 2 – Context-Free Grammar (CFG)**

   * Rule: A → γ (single non-terminal on LHS).
   * Most programming languages use CFG.

4. **Type 3 – Regular Grammar**

   * Rules restricted (A → aB or A → a).
   * Equivalent to **Regular Expressions**.

---

## 3. **Error Handling in Parsing**

* **Types of Errors:**

  1. **Lexical errors** – misspelled tokens (e.g., `intt` instead of `int`).
  2. **Syntactic errors** – grammar mistakes (e.g., missing semicolon).
  3. **Semantic errors** – logical mistakes (e.g., adding int + string).
* **Error Recovery Methods:**

  * **Panic Mode**: Skip input until a synchronizing token (like `;`).
  * **Phrase-Level Recovery**: Insert/delete symbols locally to fix error.
  * **Error Productions**: Add common mistakes to grammar.
  * **Global Correction**: Make minimum changes to input (hard in practice).

---

## 4. **Context-Free Grammars (CFG)**

* A CFG is defined as **G = (V, T, P, S)**:

  * V → Non-terminal symbols
  * T → Terminal symbols
  * P → Productions (rules)
  * S → Start symbol
* **Example:**

  ```
  E → E + T | T
  T → T * F | F
  F → (E) | id
  ```
* Used to describe the **syntax of programming languages**.

---

## 5. **Writing a Grammar**

* Steps:

  1. Identify the **language constructs** (expressions, statements, etc.).
  2. Write rules in terms of **non-terminals → combinations of terminals/non-terminals**.
  3. Avoid **ambiguity** (multiple parse trees for same string).
  4. Remove **left recursion** and **left factoring** if required.

---

## 6. **Top-Down Parsing Techniques**

### (a) General Strategy

* Start from the **start symbol (S)**.
* Try to **expand productions** to reach the input string.
* Works like **prediction**.

---

### (b) Recursive Descent Parser

* Consists of **recursive procedures**, one for each non-terminal.
* Example:

  ```
  E → T E’
  E’ → + T E’ | ε
  T → F T’
  T’ → * F T’ | ε
  F → (E) | id
  ```
* Each rule becomes a function in code.
* **Problem**: Left recursion causes infinite loops.

---

### (c) Predictive Parser – LL(1) Parser

* **LL(1)** means:

  * **L**: Left-to-right scanning of input.
  * **L**: Leftmost derivation.
  * **1**: One lookahead symbol.
* Uses a **parsing table (M)** built from:

  * **FIRST** and **FOLLOW** sets.
* Parser works without **backtracking**.
* **Steps:**

  1. Remove **left recursion**.
  2. Apply **left factoring**.
  3. Build **parsing table**.
  4. Parse using **stack + input buffer**.

---

## 7. **Bottom-Up Parsing Techniques**

### (a) Shift-Reduce Parser

* Builds parse tree from **leaves → root**.
* Uses stack:

  * **Shift** → push input symbol to stack.
  * **Reduce** → replace RHS of a rule by LHS.
  * **Accept** → successful parsing.
  * **Error** → unable to parse.

---

### (b) LR Parser

* Reads **Left-to-right**, does **Rightmost derivation in reverse**.
* Types: LR(0), SLR(1), LALR(1), Canonical LR(1).
* **Advantages:**

  * Can handle a large class of CFGs.
  * Efficient and widely used.

---

### (c) LR(0) Items

* An **LR(0) item** is a production with a **dot (·)** marking progress.
  Example: `A → α · β`

  * Dot before symbol → yet to be parsed.
  * Dot at end → completed.

---

### (d) Construction of SLR Parsing Table

1. Compute **LR(0) items**.
2. Build **DFA of states**.
3. Fill table with:

   * **SHIFT** actions.
   * **REDUCE** actions.
   * **GOTO** for non-terminals.
4. Use **FOLLOW sets** to decide reduce.

---

### (e) LALR Parser (LookAhead LR)

* Improves **SLR parser** by considering **lookaheads**.
* Merges states of **LR(1) parser** that have the same **LR(0) core**.
* **Balance between power & efficiency**.
* Used in real compilers (like YACC).

---

## 8. **Error Handling and Recovery in Syntax Analyzer**

* Same as earlier (panic mode, phrase-level, etc.).
* Syntax analyzer tries to **continue parsing** to detect more errors instead of stopping at first.

---

## 9. **YACC Tool**

* **YACC = Yet Another Compiler Compiler**
* A tool to **generate parsers automatically**.
* Works with **Lex** (scanner generator).
* Programmer writes grammar rules in **YACC format**, and YACC produces a **C program parser**.
* Supports **LALR(1) parsing**.
* Common in Unix/Linux environments.

---
