# 🚀 Compiler Design - Understanding Different Phases

## 🔍 Introduction

Previously, we observed that to convert human-readable **source code** into **machine code**, we need a **language translator**. The **language translator** consists of four key components:
1. **Preprocessor**
2. **Compiler**
3. **Assembler**
4. **Linker & Loader**

### 🔹 Preprocessor Phase
The **preprocessor** converts high-level language code into **pure high-level language code** by embedding required **header files** and **removing preprocessor directives**.

### 🔹 Compiler Phase
The **compiler** takes this **pure high-level language code** and converts it into **assembly language code**.

---

## 🔬 Step-by-Step Compilation Process

Since converting an entire program can be time-consuming, let's analyze the **compilation process of a simple arithmetic expression**.

### 📝 Given Expression:
```c
x = a + b * c;
```
We will observe how this expression **passes through the six phases of the compiler**.

![Screenshot (74).png](attachment:50296f6c-0462-4566-9bb2-4504a4413b12.png)

---

## 1️⃣ Lexical Analysis - Token Generation 🏷️

This phase is handled by the **Lexical Analyzer**.

- The **Lexical Analyzer** takes the input **source code** and breaks it down into **lexemes**.
- **Lexemes** are similar to words but convey meaning **only in groups**.
- Lexemes are converted into **tokens**.
- Tokens are categorized into **identifiers, operators, keywords, and symbols**.
- The **regular expressions (regexes)** help recognize patterns in the code.

Example Regex for Identifiers:

```regex
L (L | D | _ )*
```
Where:
- `L` = Letter
- `D` = Digit
- `_` = Underscore



### 🎭 Example Tokenization:
| Lexeme  | Token Type |
|---------|------------|
| x       | Identifier |
| =       | Assignment Operator |
| a       | Identifier |
| +       | Arithmetic Operator |
| b       | Identifier |
| *       | Arithmetic Operator |
| c       | Identifier |
| ;       | Statement Terminator |

**👉 The output of this phase is a stream of tokens.**

📌 **Regular Expressions** and **Finite Automata** are used to recognize tokens.

![Screenshot (78).png](attachment:a46fe489-6bac-491c-a014-4c9cfb7ac4fd.png)

---

## 2️⃣ Syntax Analysis - Parse Tree 🌲

The **Syntax Analyzer** (or **Parser**) takes **tokens** as input and generates a **Parse Tree**.

### 🎭 Context-Free Grammar (CFG) Rules:
```
S → id = E;
E → E + T | T
T → T * F | F
F → id
```

### 🎯 Parse Tree for `x = a + b * c;`

![Screenshot (82).png](attachment:73bc369d-7c15-46ef-92c6-be8933759125.png)

🔹 **If the yield of the parse tree matches the original expression, there are no syntax errors.** ✅

---

## 3️⃣ Semantic Analysis - Logical Verification 🧠

The **Semantic Analyzer** ensures that the parsed structure is **meaningful**. It checks:

✅ **Type Checking**

✅ **Array Bound Checking**

✅ **Scope Resolution**

✅ **Undeclared Variables & Reserved Keywords Misuse**

Example: The identifier **x** must be a variable (not a constant) because it is followed by an assignment (`=`).

![Screenshot (83).png](attachment:77997826-57c1-40ed-b7c7-bedc6c3afdbc.png)

---

## 4️⃣ Intermediate Code Generation - Three Address Code 📝

This phase generates **platform-independent** intermediate code (e.g., **Three-Address Code (TAC)**).

### 🎯 TAC Representation:
```c
T0 = b * c;
T1 = a + T0;
x = T1;
```

![Screenshot (89).png](attachment:77a4658c-3a4d-4f56-a6ca-742dcbad10f7.png)

📌 **TAC simplifies further code optimization and target code generation.**

---

## 5️⃣ Code Optimization - Enhancing Efficiency ⚡

This phase improves **code efficiency** by:

✅ Eliminating **redundant operations** 🚫

✅ **Rearranging** instructions for speed 🏎️

✅ **Reducing memory usage** 📉

Example Optimization:
```c
x = a + b * c;
```
🔽 **Optimized TAC:**
```c
x = a + (b * c);
```
![Screenshot (93).png](attachment:11959aec-0a03-40de-82e7-aeea6baeb50e.png)
---

## 6️⃣ Code Generation - Machine Code 🖥️

Finally, the compiler converts **TAC** into **machine-specific assembly code**.

### 🎯 Sample Assembly Code:
```assembly
MOV R1, b
MUL R1, c
MOV R2, a
ADD R2, R1
MOV x, R2
```
![Screenshot (98).png](attachment:64fc51a2-195b-4ed6-b175-61b7df1fd9a7.png)

🔹 The generated **machine code** is specific to the **target platform architecture**.

---

## 🏁 Summary 🎉

| Phase | Purpose |
|------------|----------------------------|
| **Lexical Analysis** | Converts lexemes into tokens |
| **Syntax Analysis** | Constructs the parse tree |
| **Semantic Analysis** | Ensures logical correctness |
| **Intermediate Code Generation** | Produces TAC for optimization |
| **Code Optimization** | Enhances efficiency |
| **Code Generation** | Converts to machine code |

🔹 **This is how a compiler works step by step!** 🔥

---

# 📚 Tools for Practical Implementation of Compiler Phases

## 🔍 Overview
A compiler consists of **six phases**, which can be practically implemented using various tools. This lecture covers an overview of these phases and the software tools used for their implementation.

## 🛠️ Tools for Implementation

![Screenshot (100).png](attachment:a3473d2a-b72d-4da6-8107-87c2306fc60d.png)

### 1️⃣ Lexical Analysis (Lex)
- **Lex** is a standard **lexical analyzer generator** available on many UNIX-based systems.
- It reads an **input stream** specifying the lexical analyzer and generates the **source code** that implements the lexical analyzer for **C programming language**.
- Lex is commonly used in combination with **Yacc**.

### 2️⃣ Syntax Analysis (Yacc)
- **Yacc (Yet Another Compiler Compiler)** is an **LALR parser generator**.
- We will study **LALR parsers** in detail in **Chapter 4**.
- Using **Yacc**, we can implement the **syntax analysis phase** of a compiler.

### 🏗️ Compiler Structure
Among the six phases of the compiler:
- The **first four phases** are collectively called the **Front-End**.
- The **last two phases** are known as the **Back-End**.

### 🔧 Implementing the Front-End
Using the **LANCE C Compiler** software platform, we can implement the **entire front-end** of a C-language compiler for an **embedded processor**.

![Screenshot (102).png](attachment:4c14da2e-d961-40e0-9693-295310bf1926.png)

For more details explore the research paper:
**"LANCE: A C Compiler Platform for Embedded Processors"** by **Dr. Reiner Leupers** from **the University of Dortmund, Germany**.

🔗 The link to the paper is provided in the **lecture description**.

---

## 🎯 Summary
In this session, we covered:
- The **six phases** of a compiler.
- Tools like **Lex** and **Yacc** for **lexical and syntax analysis**.
- The **Front-End vs. Back-End** of a compiler.
- The **LANCE C Compiler** platform for embedded processors.

