

---

# üêç Python Regular Expressions (RegEx) Notes

## 1. üîπ Introduction to RegEx
- **Regular Expressions (RegEx)** are patterns used to match strings.
- Python provides the `re` module to work with RegEx.
- Common use cases:
  - Searching text
  - Validating input (emails, phone numbers, etc.)
  - Substituting text
  - Extracting specific patterns

```python
import re
```

---

## 2. üîπ Dot (`.`) Metacharacter
- Matches **any single character** except newline (`\n`).

```python
import re
pattern = r"a.b"   # 'a' followed by any char, then 'b'
print(re.match(pattern, "acb"))   # ‚úÖ Match
print(re.match(pattern, "a_b"))   # ‚úÖ Match
print(re.match(pattern, "ab"))    # ‚ùå No match
```

---

## 3. üîπ Character Classes
- Defined using square brackets `[]`.
- Matches **any one character** inside the brackets.

Examples:
```python
pattern = r"[aeiou]"   # matches any vowel
print(re.findall(pattern, "hello world"))  # ['e', 'o', 'o']

pattern = r"[0-9]"     # matches any digit
print(re.findall(pattern, "abc123"))       # ['1', '2', '3']
```

---

## 4. üîπ Special Characters in RegEx
- `\d` ‚Üí digit (0‚Äì9)
- `\D` ‚Üí non-digit
- `\w` ‚Üí word character (letters, digits, underscore)
- `\W` ‚Üí non-word character
- `\s` ‚Üí whitespace
- `\S` ‚Üí non-whitespace
- `^` ‚Üí start of string
- `$` ‚Üí end of string

```python
pattern = r"\d+"
print(re.findall(pattern, "Order 123, item 456"))  # ['123', '456']
```

---

## 5. üîπ Quantifiers
- `*` ‚Üí 0 or more
- `+` ‚Üí 1 or more
- `?` ‚Üí 0 or 1
- `{n}` ‚Üí exactly n times
- `{n,}` ‚Üí n or more times
- `{n,m}` ‚Üí between n and m times

```python
pattern = r"a+"
print(re.findall(pattern, "aaabbc"))  # ['aaa']
```

---

## 6. üîπ More Metacharacters
- `|` ‚Üí OR operator
- `()` ‚Üí Grouping
- `(?: )` ‚Üí Non-capturing group
- `\b` ‚Üí Word boundary
- `\B` ‚Üí Not a word boundary

```python
pattern = r"(cat|dog)"
print(re.findall(pattern, "I love cats and dogs"))  # ['cat', 'dog']
```

---

## 7. üîπ Finding All Matches
### `findall()`
- Returns **all matches** as a list.

```python
pattern = r"\d+"
print(re.findall(pattern, "Room 101, Floor 5"))  # ['101', '5']
```

### `finditer()`
- Returns an **iterator of match objects**.

```python
pattern = r"\d+"
for match in re.finditer(pattern, "Room 101, Floor 5"):
    print(match.group(), "at position", match.start())
```

---

## 8. üîπ Substituting the Pattern (`sub()`)
- Replaces matched text with another string.

```python
pattern = r"\d+"
text = "Order number 12345"
new_text = re.sub(pattern, "XXX", text)
print(new_text)  # "Order number XXX"
```

---

## 9. üîπ The `compile()` Function
- Pre-compiles a regex pattern for reuse.
- Improves performance when using the same pattern multiple times.

```python
pattern = re.compile(r"\d+")
print(pattern.findall("Room 101, Floor 5"))  # ['101', '5']
print(pattern.findall("Order 123, item 456"))  # ['123', '456']
```

---

## 10. üîπ Exercise: Finding Valid Emails
### Pattern for email validation:
- Format: `username@domain.extension`
- Username: letters, digits, dots, underscores
- Domain: letters, digits
- Extension: 2‚Äì4 letters

```python
pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}"

emails = [
    "test@example.com",
    "hello.world@domain.org",
    "invalid-email@domain",
    "user@site.co.in"
]

for email in emails:
    if re.match(pattern, email):
        print(f"‚úÖ Valid: {email}")
    else:
        print(f"‚ùå Invalid: {email}")
```

---

# ‚úÖ Summary
- **Dot (`.`)** matches any character.
- **Character classes** match sets of characters.
- **Special characters** like `\d`, `\w`, `\s` simplify matching.
- **Quantifiers** control repetition.
- **findall() / finditer()** extract matches.
- **sub()** replaces matches.
- **compile()** optimizes regex usage.
- **Exercise:** Validating emails with regex.

---




In [1]:
message= " the current python version what i m using is  python 3.13.5 and there are many other version also which are   python 2.7 , python 3.11 , python 3.12, python 3.14"

In [2]:
print("python "in message)

True


In [3]:
print("3.13.5" in message)
print("3.13" in message)
print("3.5" in message)

True
True
True


In [4]:
print(message.find("3.13.5"))# it will return the index of the first occurrence of the string

54


In [5]:
print(message.find("python"))# it will return the index of the first occurrence of the string

13


In [6]:
#regular expression is available in re module
import re
#is provide the series of meta characters and pattern matching

In [7]:
# syntax for the "re.search()" is "re.search(pattern, string, flags=0)" flags is optional, and pattern is the regular expression pattern to search for, and string is the string to search in.
# flag is use to modify the behavior of the search.

In [8]:
print(message)

 the current python version what i m using is  python 3.13.5 and there are many other version also which are   python 2.7 , python 3.11 , python 3.12, python 3.14


In [9]:
match_object=re.search("python", message,)#it returns the match object if the pattern is found with span and match (where to where the match is found)

In [10]:
print(match_object)

<re.Match object; span=(13, 19), match='python'>


In [11]:
print(message[13:19])

python


In [12]:
message 

' the current python version what i m using is  python 3.13.5 and there are many other version also which are   python 2.7 , python 3.11 , python 3.12, python 3.14'

In [13]:
import re

message = 'the current python version what i m using is python 3.13.5 and there are many other version also which are python 2.7 , python 3.11 , python 3.12, python 3.14'

match_object = re.search(r"[0-9][0-9]", message)
print(match_object.group())  # Output: 13

13


In [14]:
matches = re.findall(r"[0-9][0-9]", message)
print(matches)  # Output: ['13', '11', '12', '14']

['13', '11', '12', '14']


In [15]:
match_object = re.search(r"[0-9][0-9]", "my house number is 92")
print(match_object)

<re.Match object; span=(19, 21), match='92'>


In [16]:
# use of the metacharater 

match_object = re.search(r"[0-9].[0-9][0-9]",message)
print(match_object)




<re.Match object; span=(52, 56), match='3.13'>


In [17]:


message = "The temperature range today is 23.45 degrees Celsius."
match_object = re.search(r"[0-9][0-9].[0-9][0-9]", message)
print(match_object.group())  # Output: 23.45

23.45


In [18]:
message = "the current python version what i m using is  python 3.13.5 and there are many other version also which are   python 12.7 , python 123.11 , python 3.12, python 3.14"
match_object = re.search(r"[0-9].[0-9][0-9]",message)
print(match_object)
match_object = re.search(r"[0-9][0-9].[0-9][0-9]",message)
print(match_object)
match_object = re.search(r"[0-9][0-9][0-9].[0-9][0-9]",message)
print(match_object)


<re.Match object; span=(53, 57), match='3.13'>
<re.Match object; span=(132, 137), match='23.11'>
<re.Match object; span=(131, 137), match='123.11'>


In [19]:
s1= " python is a programming language "
 
pat = r"old \new"# r make sures that the backslash is not an escape character
print(f"{pat} with r in front of it")
pat = "old \new"
print(f"{pat} without r in front of it")

old \new with r in front of it
old 
ew without r in front of it


In [20]:
s1= "Python is a programming language "

pat=r"[A-z][a-z][a-z][a-z][a-z][a-z]"
match_obj=re.match(pat,s1)
print(match_obj)

<re.Match object; span=(0, 6), match='Python'>


In [21]:
# \d and \D  is used to match the(\d) digit and (\D)non digit character

import re
s1= "python is a programming language . python3.13 is the current version of python"

pat=r"[a-z][a-z][a-z][a-z][a-z][a-z]\D"
match_obj=re.search(pat,s1)
print(match_obj)


<re.Match object; span=(0, 7), match='python '>


In [22]:
import re
s1= "python is a programming language . python3.13 is the current version of python"

pat=r"[a-z][a-z][a-z][a-z][a-z][a-z]\d"
match_obj=re.search(pat,s1)
print(match_obj)

<re.Match object; span=(35, 42), match='python3'>


In [23]:
import re
s1= "python is a programming language . python3.13 is the current version of python"

pat=r"[a-z[a-z][a-z][a-z]\d"
match_obj=re.search(pat,s1)
print(match_obj)

<re.Match object; span=(38, 42), match='hon3'>


In [24]:
s2 = """
hi there
how are you
"""
# \s is use to match any whitespace character


pat = r"[a-zA-Z][a-zA-Z][a-zA-Z]\s"
print(re.search(pat,s2))




<re.Match object; span=(6, 10), match='ere\n'>


In [25]:


text = "Room 101, user_123! Hello World"

# \d ‚Üí digit
print(re.findall(r"\d", text))  # ['1', '0', '1', '1', '2', '3']

# \D ‚Üí non-digit
print(re.findall(r"\D", text))  # ['R', 'o', 'o', 'm', ' ', ',', ' ', 'u', 's', 'e', 'r', '_', '!', ' ', 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']

# \w ‚Üí word character
print(re.findall(r"\w", text))  # ['R', 'o', 'o', 'm', '1', '0', '1', 'u', 's', 'e', 'r', '_', '1', '2', '3', 'H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']

# \W ‚Üí non-word character
print(re.findall(r"\W", text))  # [' ', ',', ' ', '!', ' ']

# \s ‚Üí whitespace
print(re.findall(r"\s", text))  # [' ', ' ', ' ', ' ']

# \S ‚Üí non-whitespace
print(re.findall(r"\S", text))  # ['R', 'o', 'o', 'm', '1', '0', '1', ',', 'u', 's', 'e', 'r', '_', '1', '2', '3', '!', 'H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']

# ^ ‚Üí start of string
print(re.match(r"Room", text))  # <re.Match object; span=(0, 4), match='Room'>

# $ ‚Üí end of string
print(re.search(r"World$", text))  # <re.Match object; span=(25, 30), match='World'>

['1', '0', '1', '1', '2', '3']
['R', 'o', 'o', 'm', ' ', ',', ' ', 'u', 's', 'e', 'r', '_', '!', ' ', 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd']
['R', 'o', 'o', 'm', '1', '0', '1', 'u', 's', 'e', 'r', '_', '1', '2', '3', 'H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
[' ', ',', ' ', '!', ' ', ' ']
[' ', ' ', ' ', ' ']
['R', 'o', 'o', 'm', '1', '0', '1', ',', 'u', 's', 'e', 'r', '_', '1', '2', '3', '!', 'H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
<re.Match object; span=(0, 4), match='Room'>
<re.Match object; span=(26, 31), match='World'>


In [26]:

text = "aaabbc 123456"

# * ‚Üí 0 or more
print(re.findall(r"lo*", "hello"))  # ['lo']

# + ‚Üí 1 or more
print(re.findall(r"a+", text))  # ['aaa']

# ? ‚Üí 0 or 1
print(re.findall(r"b?", text))  # ['b', 'b', '', '']

# {n} ‚Üí exactly n times
print(re.findall(r"\d{3}", text))  # ['123', '456']

# {n,} ‚Üí n or more times
print(re.findall(r"\d{2,}", text))  # ['123456']

# {n,m} ‚Üí between n and m times
print(re.findall(r"\d{2,4}", text))  # ['1234', '56']

['l', 'lo']
['aaa']
['', '', '', 'b', 'b', '', '', '', '', '', '', '', '', '']
['123', '456']
['123456']
['1234', '56']


In [27]:
# example of the findall function and finditer function

text = "Python versions: 3.13, 2.7, 3.11, 3.12, 3.14"

# Find all version numbers
pattern = r"\d+\.\d+"
matches = re.findall(pattern, text)

print("findall result:", matches)
# Output: ['3.13', '2.7', '3.11', '3.12', '3.14']

findall result: ['3.13', '2.7', '3.11', '3.12', '3.14']


In [28]:


text = "Python versions: 3.13, 2.7, 3.11, 3.12, 3.14"

pattern = r"\d+\.\d+"
matches = re.finditer(pattern, text)

print("finditer result:")
for match in matches:
    print("Matched:", match.group(), "at position:", match.start(), "-", match.end())

finditer result:
Matched: 3.13 at position: 17 - 21
Matched: 2.7 at position: 23 - 26
Matched: 3.11 at position: 28 - 32
Matched: 3.12 at position: 34 - 38
Matched: 3.14 at position: 40 - 44


 

---

## üß© What is `re.sub()`?
- `re.sub()` is part of Python‚Äôs **`re`** (regular expression) module.  
- It is used to **replace** parts of a string that match a regular expression pattern with a new string.  
- Think of it as **search + replace**, but with regex power.

---

## ‚öôÔ∏è Syntax
```python
re.sub(pattern, repl, string, count=0, flags=0)
```

### Parameters:
- **`pattern`** ‚Üí The regex pattern to search for.
- **`repl`** ‚Üí The replacement string (or function).
- **`string`** ‚Üí The input string where replacements happen.
- **`count`** ‚Üí Maximum number of replacements (default = 0 ‚Üí replace all).
- **`flags`** ‚Üí Optional regex flags (like `re.IGNORECASE`).

---

## üñ•Ô∏è Example 1: Simple Replacement
```python
import re

text = "I have 2 apples and 3 bananas."
result = re.sub(r"\d", "#", text)

print(result)
```

### üîé Explanation:
- Pattern `\d` ‚Üí matches any digit.  
- Replacement `"#"` ‚Üí replaces each digit with `#`.  
- Output:
```
I have # apples and # bananas.
```

---

## üñ•Ô∏è Example 2: Replace Multiple Digits
```python
import re

text = "My phone number is 12345."
result = re.sub(r"\d+", "XXXXX", text)

print(result)
```

### üîé Explanation:
- Pattern `\d+` ‚Üí matches one or more digits.  
- Replacement `"XXXXX"` ‚Üí replaces the whole number.  
- Output:
```
My phone number is XXXXX.
```

---

## üñ•Ô∏è Example 3: Using a Function as Replacement
```python
import re

def censor(match):
    return "*" * len(match.group())

text = "Password123 is secret."
result = re.sub(r"[A-Za-z0-9]+", censor, text)

print(result)
```

### üîé Explanation:
- Pattern `[A-Za-z0-9]+` ‚Üí matches words/numbers.  
- Replacement function ‚Üí replaces each match with `*` of same length.  
- Output:
```
*********** ** ******
```

---

## üñ•Ô∏è Example 4: Limit Replacements
```python
import re

text = "cat cat cat"
result = re.sub(r"cat", "dog", text, count=2)

print(result)
```

### üîé Explanation:
- Pattern `"cat"` ‚Üí matches the word "cat".  
- Replacement `"dog"` ‚Üí replaces only first **2 occurrences**.  
- Output:
```
dog dog cat
```

---

‚úÖ So, `re.sub()` is your go-to tool for **regex-based search and replace** in Python.  



In [None]:
# sub() function is used to replace the matched string with the given string
# syntx for the sub() function is  sub(pattern, repl, string,)
s1= 'sunday, monday , tuesday, wednesday, thursday, friday, saturday'
pattern = r"monday"
sub_obj = re.sub(pattern, "mon", s1)
print(sub_obj)

sunday, mon , tuesday, wednesday, thursday, friday, saturday


In [30]:
print(s1)

sunday, monday , tuesday, wednesday, thursday, friday, saturday


In [33]:
text = "sunday is a holiday"
pat = "sunday "
repl = "monday"


result=re.sub(pat, repl, text)
print(result)

mondayis a holiday


In [34]:
s1= 'sunday, monday , tuesday, wednesday, thursday, friday, saturday'

pattern= r"monday"
replace= "friday"

re.sub(pattern,replace,s1)

'sunday, friday , tuesday, wednesday, thursday, friday, saturday'

In [35]:
s1= 'sunday, monday , tuesday, wednesday, thursday, friday, saturday'

pattern= r"monday"
replace= "friday"

re.sub(pattern,replace,s1, count=1)

'sunday, friday , tuesday, wednesday, thursday, friday, saturday'

In [43]:
s1= 'sunday, monday , tuesday, wednesday, thursday, friday, saturday'

pattern= r"monday"
replace= "friday"

result=re.sub(pattern,replace,s1, count=1)
print(result)

s1= 'sunday, monday ,friday , friday ,  tuesday, wednesday, thursday, friday, saturday'
pattern= r"friday"
replace= "fry"

result=re.sub(pattern,replace,s1, count= 2)
print(result)

sunday, friday , tuesday, wednesday, thursday, friday, saturday
sunday, monday ,fry , fry ,  tuesday, wednesday, thursday, friday, saturday


In [45]:
# compile function in python is use to compile the regular expression pattern into a regular expression object, which can be used for matching using its match(), search(), and findall() methods.

# lets import random contact number 
phone="alice 1234567890 bob 9876543210 charlie 1122334455 "

# lets create a pattern to match the phone number
pattern=re.compile(r"\d{10}")

# lets find all the phone number
match=pattern.findall(phone)
print(match)

# lets compile the name 
name_pattern=re.compile(r"\w+")
name_match=name_pattern.findall(phone)
print(name_match)


['1234567890', '9876543210', '1122334455']
['alice', '1234567890', 'bob', '9876543210', 'charlie', '1122334455']
