# Python with Prof. Chauhan Bhavik
## Regular Expressions

### Part 1: Introduction to Regular Expressions
1) What are Regular Expressions (REs)?
2) Why use Regular Expressions?
3) Using Python's re module
4) Basic regex functions with examples (match, search, findall, split, sub)
5) Small practice exercises

### Part 2: Sequence Characters in Regular Expressions
6) What are sequence characters in regex?
7) Common sequence characters in Python's re module
8) Examples with explanation
9) Practice exercises

### Part 3: Quantifiers in Regular Expressions
10) What are quantifiers in regex?
11) Different types of quantifiers in Python
12) Practical examples with explanation
13) Practice exercises

### Part 4: Special Characters in Regular Expressions
14) Special characters in regex
15) How to use them in Python
16) Examples with explanation
17) Practice exercises

### Part 5: Using Regular Expressions on Files
18) How to use regex with text files
19) Reading a file in Python
20) Applying regex patterns on file content
21) Extracting useful information
22) Practice exercises

### Part 6: Retrieving Information from an HTML File using Regex
23) Reading HTML content
24) Using regex to extract:
  - Titles
  - Headings
  - Hyperlinks
  - Emails
25) Practice exercises

## Part 1: Introduction to Regular Expressions
### 1) What are Regular Expressions (REs)?
- Regular Expressions (REs) are patterns used to match strings. They are very useful for searching, replacing, and extracting text.

- Real-life use cases:
  - Validating email or phone number
  - Extracting data from text/logs
  - Web scraping
### 2) Why use Regular Expressions?
- We use regular expressions to find, validate, extract, and replace patterns in text (like emails, phone numbers, dates, log entries) quickly and efficiently.

### 3) Using Python's re module
- To work with regular expressions in Python, we use the built-in re module.
- Let's import the regular expressions module

In [195]:
import re  

### 4) Basic regex functions with examples (match, search, findall, split, sub)

#### re.match()
- re.match() → checks for a match at the beginning of the string
    - match.start() → Returns the starting index of the matched substring.
    - match.end() → Returns the ending index (exclusive) of the matched substring. 

In [199]:
text = 'Bhavik'
m = re.match("Bh", text) 
print(m)

<re.Match object; span=(0, 2), match='Bh'>


In [201]:
text = 'Energy can be transformed from one form to another form'

m1 = re.match("form", text)
m2 = re.match("Energy can be", text)
print(m1)
print(m2)

None
<re.Match object; span=(0, 13), match='Energy can be'>


#### re.search()
- re.search() → searches for a match anywhere in the string

In [204]:
text = 'Energy can be transformed from one form to another form'
pattern = 'form'

match = re.search(pattern, text)
print(match)

<re.Match object; span=(19, 23), match='form'>


In [206]:
if match:
    print('Pattern found at:', match.start(), 'to', match.end())
else:
    print('Pattern not found')

Pattern found at: 19 to 23


#### re.findall()
- re.findall() → returns all matches in a list

In [209]:
print(re.findall(pattern, text))

['form', 'form', 'form']


#### re.split()
- re.split() → splits string by a pattern

In [212]:
print(re.split(' ', text))

['Energy', 'can', 'be', 'transformed', 'from', 'one', 'form', 'to', 'another', 'form']


#### re.sub()
- re.sub() → replaces pattern with another string

In [215]:
print(re.sub('Energy', 'Water', text))

Water can be transformed from one form to another form


### 5) Small practice exercises

Try these small tasks:
- Find all numbers in the string: 'My phone number is 9876543210 and office number is 079-23232323'
- Replace all spaces in 'Python is fun to learn' with '-'
- Split an email address like 'student@example.com' into username and domain.

## Part 2: Sequence Characters in Regular Expressions
### 6) What are sequence characters in regex?
- Sequence characters in regex are special symbols that represent a class of characters (like digits, words, or spaces) instead of writing them explicitly.
### 7) Common sequence characters in Python's re module
- \d → Any digit (0–9)
- \D → Any non-digit
- \w → Any word character (letters, digits, underscore)
- \W → Any non-word character
- \s → Any whitespace (space, tab, newline)
- \S → Any non-whitespace
### 8) Examples with explanation

In [9]:
import re

txt = "My contact number is 1232459876 and office contact number is 079-14568932"

##### Example 1: Find all digits in text

In [223]:
print(re.findall(r'\d+',txt))

['1232459876', '079', '14568932']


##### Example 2: Find all words

In [226]:
print(re.findall(r'\w+',txt))

['My', 'contact', 'number', 'is', '1232459876', 'and', 'office', 'contact', 'number', 'is', '079', '14568932']


##### Example 3: Find all whitespaces

In [229]:
print(re.findall(r'\s+',txt))

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']


##### Example 4: Extract non-digit characters

In [232]:
print(re.findall(r'\D+',txt))

['My contact number is ', ' and office contact number is ', '-']


### 9) Practice exercises
- Extract all words from 'Hello_123 World99 Python_3' using \w+.
- Find all numbers in 'Invoice: 555, Price: 1234, Code: 77'.
- Count how many whitespaces are in 'Python is easy to learn'.

## Part 3: Quantifiers in Regular Expressions
### 10) What are quantifiers in regex?
- Quantifiers define how many times a character or group should appear in the string.
- Common Quantifiers:
    - `*` → 0 or more times
    - `+` → 1 or more times
    - `?` → 0 or 1 time (optional)
    - `{n}` → exactly n times
    - `{n,}` → n or more times
    - `{n,m}` → between n and m times


### 11)Different types of quantifiers in Python
- Common Quantifiers:
    - `*` → 0 or more times
    - `+` → 1 or more times
    - `?` → 0 or 1 time (optional)
    - `{n}` → exactly n times
    - `{n,}` → n or more times
    - `{n,m}` → between n and m times
### 12) Practical examples with explanation

In [7]:
import re
t = "aaa abc ada"
print(re.findall(r'a*', t))        # * → 0 or more times

['aaa', '', 'a', '', '', '', 'a', '', 'a', '']


In [11]:
print(re.findall(r'a+',t))        # + → 1 or more times

['aaa', 'a', 'a', 'a']


In [27]:
tx = "color colr colour"
print(re.findall(r'colou?r',tx))    # "a?" → matches a once or not at all.

['color', 'colour']


Useful when you want to handle optional letters (like "color" vs "colour", "bat" vs "bt").

In [35]:
text = """
My favorite color is blue.
My favourite colour is red.
Some people write color, other wire colour.
"""

matches = re.findall(r'colou?r', text)

print("Matches found: ", matches)

Matches found:  ['color', 'colour', 'color', 'colour']


In [37]:
m = re.findall(r'favou?rite', text)

print("Here I found : ",m)

Here I found :  ['favorite', 'favourite']


When processing English text, you might want to match both American (color, honor, analyze) and British spellings (colour, honour, analyse).

Regex with ? makes that easy!

In [40]:
text = """
Visit our site at http://example.com
Make sure to use https://secure.com for secure browsing.
"""

matches = re.findall(r'https?', text)
print("Matches found:", matches)

Matches found: ['http', 'https']


### 13) Practice exercises
- Find all words that start with `'a'` and have at least 2 `'b'` in `'abb abbb abbbb a ab'`.
- Extract numbers with exactly 3 digits from `'123 45 6789 12 999'`.
- Match words ending with `'ing'` in `'playing run sing walking talking'`.
