## **Regular Expressions**

If we want to represent a group of strings according to a particular format/pattern, then we should use **Regular Expressions**.

Regular Expressions (Regex) provide a declarative mechanism to represent a group of strings according to a particular format or pattern.

### **Examples:**
1. Writing a regular expression to represent all mobile numbers.
2. Writing a regular expression to represent all email IDs.

### **Applications of Regular Expressions:**
1. **Validation Frameworks/Logic:** Used to develop validation rules for inputs.
2. **Pattern Matching Applications:** Examples include `Ctrl+F` in Windows or `grep` in UNIX.
3. **Translators:** Useful in developing compilers, interpreters, etc.
4. **Digital Circuits:** Used for designing circuits based on specific patterns.
5. **Communication Protocols:** Helps in creating protocols like TCP/IP, UDP, etc.

### **Using Regular Expressions in Python:**
We can develop regular expression-based applications in Python using the `re` module. This module provides several inbuilt functions that make working with regular expressions easier in Python applications.


1. **compile()**  
   The `re` module contains the `compile()` function to compile a pattern into a RegexObject.  
   Example:  
   ```python
   pattern = re.compile("ab")
   ```

2. **finditer()**  
   Returns an Iterator object which yields a Match object for every match.  
   Example:  
   ```python
   matcher = pattern.finditer("abaababa")
   ```

   On the Match object, the following methods can be called:
   - **start()**: Returns the start index of the match.
   - **end()**: Returns the end+1 index of the match.
   - **group()**: Returns the matched string.



### **Compiler Phases**

The standard compiler phases are:

1.  **Lexical Analysis (Scanning/Tokenization):** This is the first phase. It reads the source code as a stream of characters and groups them into meaningful units called *tokens*. These tokens represent things like keywords, identifiers, operators, and literals. Regular expressions (REs) are heavily used in this phase to define the patterns for these tokens.

2.  **Syntax Analysis (Parsing):** This phase takes the stream of tokens produced by the lexical analyzer and constructs a parse tree (or syntax tree). This tree represents the grammatical structure of the program.

3.  **Semantic Analysis:** This phase checks the program for semantic errors, such as type mismatches and undeclared variables. It also gathers type information for subsequent phases.

4.  **Intermediate Code Generation:** The compiler generates an intermediate representation of the source code, which is easier to manipulate and optimize than the original source code.

5.  **Code Optimization:** This phase attempts to improve the intermediate code so that the generated target code will be more efficient (e.g., faster or smaller).

6.  **Target Code Generation:** The final phase generates the target code, which is typically machine code or assembly language.

### **Role of Regular Expressions (REs)**

The diagram specifically emphasizes the role of REs in **Lexical Analysis**. REs provide a concise and powerful way to specify the patterns for tokens. The lexical analyzer uses these patterns to identify and classify tokens in the source code.

**Example:**

*   The RE `[a-zA-Z][a-zA-Z0-9]*` could be used to define the pattern for identifiers (variable names).
*   The RE `[0-9]+` could be used to define the pattern for integer literals.



`Lexical Analysis (using REs) --> Tokenization/Scanning --> (Input to) Syntax Analysis (Parsing)`

The other compiler phases (Semantic Analysis, Intermediate Code Generation, Code Optimization, and Target Code Generation) are shown as subsequent steps but are not directly related to the use of REs in the diagram's context.

In [1]:
import re
pattern=re.compile('ab')

In [2]:
matcher=pattern.finditer('abaababbaac')

In [9]:
count=0
for match in matcher:
    count+=1
    print(f"{match.start()}------{match.end()}")
print(f"total no of occurences: {count}")

0------2
3------5
5------7
total no of occurences: 3


In [12]:
pattern=re.compile('ab')
matcher=pattern.finditer('abaababbaac')
count=0
for match in matcher:
    count+=1
    print(f"{match.start()}------{match.end()}-----{match.group()}")
print(f"total no of occurences: {count}")

0------2-----ab
3------5-----ab
5------7-----ab
total no of occurences: 3


In [15]:
pattern=re.compile('[a-z]')
matcher=pattern.finditer('abaababbaac')
count=0
for match in matcher:
    count+=1
    print(f"{match.start()}------{match.end()}-----{match.group()}")
print(f"total no of occurences: {count}")

0------1-----a
1------2-----b
2------3-----a
3------4-----a
4------5-----b
5------6-----a
6------7-----b
7------8-----b
8------9-----a
9------10-----a
10------11-----c
total no of occurences: 11


In [17]:
pattern=re.compile('[0-9]')
matcher=pattern.finditer('ab6aaba3bbaa2c')
count=0
for match in matcher:
    count+=1
    print(f"{match.start()}------{match.end()}-----{match.group()}")
print(f"total no of occurences: {count}")

2------3-----6
7------8-----3
12------13-----2
total no of occurences: 3
