**Name**: Kamran Khan

**Batch**: B1

**Class**: TY - CSE AIML

**Roll No**: 11

<h1 align="center">Computing Lab - II | Experiment No - 5</h1>

**Aim**:  Regular Expression in Python

**Objective**: The primary objective of this lab experiment is to introduce students to the concept of Regular Expressions (RegEx) in Python and develop their proficiency in using the re module. The experiment aims to familiarize students with metacharacters, special sequences, and various functions provided by the re module for effective pattern matching and manipulation in textual data.

**Theory**: A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. In Python, the `re` module is used to work with RegEx. Let's explore some basics of regular expressions.

## Specify Pattern Using RegEx

### Metacharacters

Metacharacters are characters interpreted in a special way by a RegEx engine. Here's a list of metacharacters:

- `[]` - Square brackets: Specifies a set of characters.
- `.` - Period: Matches any single character (except newline `\n`).
- `^` - Caret: Checks if a string starts with a certain character.
- `$` - Dollar: Checks if a string ends with a certain character.
- `*` - Star: Matches zero or more occurrences of the pattern.
- `+` - Plus: Matches one or more occurrences of the pattern.
- `?` - Question Mark: Matches zero or one occurrence of the pattern.
- `{}` - Braces: Specifies the number of repetitions.
- `|` - Alternation: Acts like an OR operator.
- `()` - Group: Groups sub-patterns.
- `\` - Backslash: Escapes various characters, including metacharacters.

### Special Sequences
Special sequences make commonly used patterns easier to write. Here are some examples:

- `\A` - Matches if specified characters are at the start of a string.
- `\b` - Matches if specified characters are at the beginning or end of a word.
- `\B` - Opposite of `\b`. Matches if specified characters are not at the beginning or end of a word.
- `\d` - Matches any decimal digit (equivalent to `[0-9]`).
- `\D` - Matches any non-decimal digit (equivalent to `[^0-9]`).
- `\s` - Matches any whitespace character (equivalent to `[ \t\n\r\f\v]`).
- `\S` - Matches any non-whitespace character (equivalent to `[^ \t\n\r\f\v]`).
- `\w` - Matches any alphanumeric character (equivalent to `[a-zA-Z0-9_]`).
- `\W` - Matches any non-alphanumeric character (equivalent to `[^a-zA-Z0-9_]`).
- `\Z` - Matches if specified characters are at the end of a string.


## Python RegEx

Python's `re` module provides functions to work with regular expressions:

### `re.findall()`
Returns a list of strings containing all matches.

In [3]:
import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'
result = re.findall(pattern, string)
print(result) # Output: ['12', '89', '34']

['12', '89', '34']


### `re.split()`
Splits the string where there is a match and returns a list.

In [4]:
import re

string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'
result = re.split(pattern, string)
print(result)  # Output: ['Twelve:', ' Eighty nine:', '.']

['Twelve:', ' Eighty nine:', '.']


### `re.sub()`
Replaces matched occurrences with the content of the `replace` variable.

In [5]:
import re

string = 'abc 12 de 23 \n f45 6'
pattern = '\s+'
replace = ''
new_string = re.sub(pattern, replace, string)
print(new_string)  # Output: 'abc12de23f456'


abc12de23f456


### `re.search()`
Looks for the first location where the RegEx pattern produces a match with the string.

In [6]:
import re

string = "Python is fun"
match = re.search('\APython', string)
if match:
    print("Pattern found inside the string")
else:
    print("Pattern not found")

Pattern found inside the string


### `Match Object`
The match object contains methods and attributes to get information about the match.

In [7]:
import re

string = '39801 356, 2102 1111'
pattern = '(\d{3}) (\d{2})'
match = re.search(pattern, string)

if match:
    print(match.group())  # Output: '801 35'
    print(match.group(1))  # Output: '801'
    print(match.group(2))  # Output: '35'
    print(match.groups())  # Output: ('801', '35')


801 35
801
35
('801', '35')


When using RegEx in Python, it's common to use the r prefix before the pattern to indicate a raw string.

In [8]:
import re

string = '\n and \r are escape sequences.'
result = re.findall(r'[\n\r]', string)
print(result)  # Output: ['\n', '\r']

['\n', '\r']


**Conclusion:** The Python `re` module empowers efficient handling of text patterns through regular expressions. Mastering metacharacters, special sequences, and module functions like `findall` and `search` enables effective string manipulation. Regular expressions in Python offer a powerful and flexible toolset for tasks involving pattern recognition and manipulation within textual data.