##A Regular Expression or RegEx is a special sequence of characters that uses a search pattern to find a string or set of strings.

It can detect the presence or absence of a text by matching it with a particular pattern and also can split a pattern into one or more sub-patterns.

#1. Introduction:Regular Expressions (regex) are patterns used to match strings, providing a powerful tool for text searching, manipulation, and validation in Python.
They are implemented through the built-in re module.



In [None]:
# importing re module
import re



    start() method returns the starting index of the matched substring
    end() method returns the ending index of the matched substring
    span() method returns a tuple containing the starting and the ending index of the matched substring


In [None]:
import re

s = 'indus university'

match = re.search(r'indus', s)

print('Start Index:', match.start())
print('End Index:', match.end())
print('Span:',match.span())


Start Index: 0
End Index: 5
Span: (0, 5)


#Note: Here r character (r’portal’) stands for raw, not regex.
The raw string is slightly different from a regular string, it won’t interpret the \ character as an escape character. This is because the regular expression engine uses \ character for its own escaping purpose.

# 2. Metacharacters in Regex
Metacharacters are special symbols that control how the pattern is interpreted.

Metacharacters
Metacharacters are the characters with special meaning.

To understand the RE analogy, Metacharacters are useful and important. They will be used in functions of module re. Below is the list of metacharacters.


#**\\-Used to drop the special meaning of character following it**

#**[ ]-Represent a character class**

#**^-Matches the beginning**

#**$-Matches the end**

#**.-Matches any character except newline**

#**|-Means OR (Matches with any of the characters separated by it.**

#**?-Matches zero or one occurrence**

#**\*-Any number of occurrences (including 0 occurrences)**

#**+-One or more occurrences**

#**{}-Indicate the number of occurrences of a preceding regex to match.**

#**()-Enclose a group of Regex**

Let’s discuss each of these metacharacters in detail:

| Metacharacter | Description                                      | Example                                            |       |                          |
| ------------- | ------------------------------------------------ | -------------------------------------------------- | ----- | ------------------------ |
| `.`           | Matches **any character** except newline         | `a.c` → matches `abc`, `axc`                       |       |                          |
| `^`           | Matches **start** of string                      | `^hello` matches `hello world` but not `say hello` |       |                          |
| `$`           | Matches **end** of string                        | `world$` matches `hello world`                     |       |                          |
| `[]`          | Matches **any single character** inside brackets | `[aeiou]` matches vowels                           |       |                          |
| `[^ ]`        | Matches any character **not** in brackets        | `[^0-9]` matches non-digits                        |       |                          |
| `*`           | Matches **0 or more** occurrences                | `go*` matches `g`, `go`, `goo`                     |       |                          |
| `+`           | Matches **1 or more** occurrences                | `go+` matches `go`, `goo` but not `g`              |       |                          |
| `?`           | Matches **0 or 1** occurrence                    | `colou?r` matches `color` or `colour`              |       |                          |
| `{m}`         | Matches exactly **m** occurrences                | `\d{4}` matches a 4-digit number                   |       |                          |
| `{m,n}`       | Matches **m to n** occurrences                   | `\d{2,4}` matches 2–4 digit numbers                |       |                          |
| `()`          | Groups patterns                                  | `(ab)+` matches `ab`, `abab`                       |       |                          |
| \`            | \`                                               | Logical OR                                         | \`cat | dog`matches`cat`or`dog\` |


Dot (.) : ['abc', 'axc', 'adc']
Caret (^): []


Dollar ($): ['ld']
Square Brackets []: ['a', 'e']


#1. \ – Backslash
##The backslash (\) makes sure that the character is not treated in a special way. This can be considered a way of escaping metacharacters.

For example, if you want to search for the dot(.) in the string then you will find that dot(.) will be treated as a special character as is one of the metacharacters (as shown in the above table). So for this case, we will use the backslash(\) just before the dot(.) so that it will lose its specialty. See the below example for a better understanding.

Example:
The first search (re.search(r'.', s)) matches any character, not just the period, while the second search (re.search(r'\.', s)) specifically looks for and matches the period character.



In [None]:
import re

s = 'indusuni.versity'

# without using \
match = re.search(r'.', s)
print(match)

# using \
match = re.search(r'\.', s)
print(match)


<re.Match object; span=(0, 1), match='i'>
<re.Match object; span=(8, 9), match='.'>


#2. [  ] – Square Brackets
##Square Brackets ([ ]) represent a character class consisting of a set of characters that we wish to match. For example, the character class [abc] will match any single a, b, or c.

We can also specify a range of characters using – inside the square brackets. For example,

[0, 3] is sample as [0123]
[a-c] is same as [abc]
We can also invert the character class using the caret(^) symbol. For example,

[^0-3] means any number except 0, 1, 2, or 3
[^a-c] means any character except a, b, or c
Example:
In this code, you’re using regular expressions to find all the characters in the string that fall within the range of ‘a’ to ‘m’. The re.findall() function returns a list of all such characters. In the given string, the characters that match this pattern are: ‘c’, ‘k’, ‘b’, ‘f’, ‘j’, ‘e’, ‘h’, ‘l’, ‘d’, ‘g’.


In [None]:
import re

string = "The quick brown fox jumps over the lazy dog"
pattern = "[a-m]"
result = re.findall(pattern, string)

print(result)


['h', 'e', 'i', 'c', 'k', 'b', 'f', 'j', 'm', 'e', 'h', 'e', 'l', 'a', 'd', 'g']


#/^ – Caret
##Caret (^) symbol matches the beginning of the string i.e. checks whether the string starts with the given character(s) or not. For example –  

^g will check if the string starts with g such as geeks, globe, girl, g, etc.
^ge will check if the string starts with ge such as geeks, geeksforgeeks, etc.
Example:
This code uses regular expressions to check if a list of strings starts with “The”. If a string begins with “The,” it’s marked as “Matched” otherwise, it’s labeled as “Not matched”.

In [None]:
import re
regex = r'^The'
strings = ['The quick brown fox', 'The lazy dog', 'A quick brown fox']
for string in strings:
	if re.match(regex, string):
		print(f'Matched: {string}')
	else:
		print(f'Not matched: {string}')


Matched: The quick brown fox
Matched: The lazy dog
Not matched: A quick brown fox


#4. $ – Dollar

##Dollar($) symbol matches the end of the string i.e checks whether the string ends with the given character(s) or not. For example-

s$ will check for the string that ends with a such as ends, s, etc.

This code uses a regular expression to check if the string ends with “World!”. If a match is found, it prints “Match found!” otherwise, it prints “Match not found”.

In [None]:
import re

string = "Hello World!"
pattern = r"World!$"

match = re.search(pattern, string)
if match:
	print("Match found!")
else:
	print("Match not found.")


Match found!


#5. . – Dot
##Dot(.) symbol matches only a single character except for the newline character (\n).
For example –  

a.b will check for the string that contains any character at the place of the dot such as acb, acbd, abbb, etc
.. will check if the string contains at least 2 characters
Example:
This code uses a regular expression to search for the pattern “brown.fox” within the string. The dot (.) in the pattern represents any character. If a match is found, it prints “Match found!” otherwise, it prints “Match not found”.

In [None]:
import re

string = "The quick brown fox jumps over the lazy dog."
pattern = r"brown.fox"

match = re.search(pattern, string)
if match:
	print("Match found!")
else:
	print("Match not found.")


Match found!


#6. | – Or
##Or symbol works as the or operator meaning it checks whether the pattern before or after the or symbol is present in the string or not. For example –  

    a|b will match any string that contains a or b such as acd, bcd, abcd, etc.


In [None]:
import re

pattern = r'a|b'
strings = ['acd', 'bcd', 'abcd', 'xyz']

for s in strings:
    if re.search(pattern, s):
        print(f"'{s}' matches the pattern '{pattern}'")
    else:
        print(f"'{s}' does not match the pattern '{pattern}'")


'acd' matches the pattern 'a|b'
'bcd' matches the pattern 'a|b'
'abcd' matches the pattern 'a|b'
'xyz' does not match the pattern 'a|b'


#7. ? – Question Mark

##The question mark ? is a quantifier in regular expressions that indicates that the preceding element should be matched zero or one time.
It allows you to specify that the element is optional, meaning it may occur once or not at all. For example,

    ab?c will be matched for the string ac, acb, dabc but will not be matched for abbc because there are two b. Similarly, it will not be matched for abdc because b is not followed by c.

In [None]:
import re

pattern = r'ab?c'
strings = ['ac', 'abc', 'abbc', 'abdc', 'dabc']

for s in strings:
    if re.search(pattern, s):
        print(f"'{s}' matches the pattern '{pattern}'")
    else:
        print(f"'{s}' does not match the pattern '{pattern}'")


'ac' matches the pattern 'ab?c'
'abc' matches the pattern 'ab?c'
'abbc' does not match the pattern 'ab?c'
'abdc' does not match the pattern 'ab?c'
'dabc' matches the pattern 'ab?c'


#8. * – Star

#The star * symbol matches zero or more occurrences of the regex preceding the * symbol.
    ab*c will be matched for the string ac, abc, abbbc, dabc, etc. but will not be matched for abdc because b is not followed by c.


In [None]:
import re

pattern = r'ab*c'
strings = ['ac', 'abc', 'abbbc', 'abdc', 'dabc']

for s in strings:
    if re.search(pattern, s):
        print(f"'{s}' matches the pattern '{pattern}'")
    else:
        print(f"'{s}' does not match the pattern '{pattern}'")


'ac' matches the pattern 'ab*c'
'abc' matches the pattern 'ab*c'
'abbbc' matches the pattern 'ab*c'
'abdc' does not match the pattern 'ab*c'
'dabc' matches the pattern 'ab*c'


#9. + Plus

##The plus + symbol matches one or more occurrences of the regex preceding the + symbol.
    ab+c will be matched for the string abc, abbc, dabc, but will not be matched for ac, abdc, because there is no b in ac and b, is not followed by c in abdc.


In [None]:
import re

pattern = r'ab+c'
strings = ['ac', 'abc', 'abbc', 'abdc', 'dabc']

for s in strings:
    if re.search(pattern, s):
        print(f"'{s}' matches the pattern '{pattern}'")
    else:
        print(f"'{s}' does not match the pattern '{pattern}'")


'ac' does not match the pattern 'ab+c'
'abc' matches the pattern 'ab+c'
'abbc' matches the pattern 'ab+c'
'abdc' does not match the pattern 'ab+c'
'dabc' matches the pattern 'ab+c'


# 10. {m, n} – Braces

##Braces match any repetitions of the preceding regex from m to n (both inclusive).

The {m, n} syntax in regular expressions is known as a quantifier. It specifies the minimum and maximum number of times the preceding element (a character, group, or character class) must occur for a match to be found. The values m and n are integers where:

    m is the minimum number of occurrences.
    n is the maximum number of occurrences.

The match is inclusive, meaning it will match if the preceding element occurs at least m times and at most n times.
Examples:

    Pattern: a{2,4}
        This pattern will match strings where the character a appears at least 2 times and at most 4 times consecutively.
        It will match aa, aaa, and aaaa but not a or aaaaa.


In [None]:
import re

# Define the pattern
pattern = r'a{2,4}'

# List of test strings
strings = ['aaab', 'baaaac', 'gaad', 'abc', 'bc', 'aaaaa']

# Iterate over each string and check for matches
for s in strings:
    match = re.search(pattern, s)
    if match:
        print(f"'{s}' matches the pattern '{pattern}'")
    else:
        print(f"'{s}' does not match the pattern '{pattern}'")


'aaab' matches the pattern 'a{2,4}'
'baaaac' matches the pattern 'a{2,4}'
'gaad' matches the pattern 'a{2,4}'
'abc' does not match the pattern 'a{2,4}'
'bc' does not match the pattern 'a{2,4}'
'aaaaa' matches the pattern 'a{2,4}'


Output Explanation:

    'aaab'
        The substring aaa contains three as, which is between 2 and 4.
        Therefore, it matches the pattern.

    'baaaac'
        The substring aaaa contains four as, which is between 2 and 4.
        Therefore, it matches the pattern.

    'gaad'
        The substring aa contains two as, which is the minimum required.
        Therefore, it matches the pattern.

    'abc'
        The substring a contains only one a, which is less than the minimum required.
        Therefore, it does not match the pattern.

    'bc'
        There is no a present in the string.
        Therefore, it does not match the pattern.

    'aaaaa'
        The substring aaaaa contains five as, which is more than the maximum allowed.
        Therefore, it does not match the pattern.

#11. (<regex>) – Group

##The group symbol (<regex>) is used to group sub-patterns.

In [None]:
import re

pattern = r'(a|b)cd'
strings = ['acd', 'abcd', 'gacd', 'xyz']

for s in strings:
    if re.search(pattern, s):
        print(f"'{s}' matches the pattern '{pattern}'")
    else:
        print(f"'{s}' does not match the pattern '{pattern}'")


'acd' matches the pattern '(a|b)cd'
'abcd' matches the pattern '(a|b)cd'
'gacd' matches the pattern '(a|b)cd'
'xyz' does not match the pattern '(a|b)cd'


In [None]:
import re

# 1. Dot (.) matches any character except newline
pattern = r"a.c"
text = "abc axc bac adc"
matches = re.findall(pattern, text)
print("Dot (.) :", matches)
# Output: ['abc', 'axc', 'a_c', 'adc']
# Explanation: The dot matches any one character between 'a' and 'c'.

# 2. Caret (^) matches the start of a string
pattern = r"^hello"
text = "world hello"
matches = re.findall(pattern, text)
print("Caret (^):", matches)
# Output: ['hello']
# Explanation: Matches only if 'hello' is at the beginning.

In [None]:

# 3. Dollar ($) matches the end of a string
pattern = r"ld$"
text = "hello world"
matches = re.findall(pattern, text)
print("Dollar ($):", matches)
# Output: ['world']
# Explanation: Matches 'world' only if it appears at the end.

# 4. Square brackets [] matches any single character inside
pattern = r"[a,e,i,o,u]"
text = "apple129"
matches = re.findall(pattern, text)
print("Square Brackets []:", matches)
# Output: ['a', 'e']
# Explanation: Matches vowels 'a' and 'e'.

In [None]:

# 5. Negated set [^ ] matches characters NOT inside
pattern = r"[^0-9]"
text = "a1b2c3"
matches = re.findall(pattern, text)
print("Negated Set [^ ]:", matches)
# Output: ['a', 'b', 'c']
# Explanation: Matches all non-digit characters.

# 6. Asterisk (*) matches 0 or more occurrences
pattern = r"o*"
text = "ga god good ggggooo"
matches = re.findall(pattern, text)
print("Asterisk (*):", matches)
# Output: ['g', 'go', 'goo', 'gooo']
# Explanation: Matches 'g' followed by 0 or more 'o's.

Negated Set [^ ]: ['a', 'b', 'c']
Asterisk (*): ['', '', '', '', 'o', '', '', '', 'oo', '', '', '', '', '', '', 'ooo', '']


In [None]:

# 7. Plus (+) matches 1 or more occurrences
pattern = r"god"
text = "g god good godod"
matches = re.findall(pattern, text)
print("Plus (+):", matches)
# Output: ['god', 'goo', 'gooo']
# Explanation: Requires at least one 'o' after 'g'.

# 8. Question mark (?) matches 0 or 1 occurrence
pattern = r"er?or"
text = "eror,error,eor"
matches = re.findall(pattern, text)
print("Question Mark (?):", matches)
# Output: ['color', 'colour']
# Explanation: The 'u' is optional.

Plus (+): ['god', 'god']
Question Mark (?): ['eror', 'eor']


In [None]:

# 9. Curly braces {m} exact match count
pattern = r"\d{4}"
text = "The a044 is 2024"
matches = re.findall(pattern, text)
print("Curly {m}:", matches)
# Output: ['2024']
# Explanation: Matches exactly 4 digits.

# 10. Curly braces {m,n} range match count
pattern = r"\d{2,4}"
text = "abc 123 12345234"
matches = re.findall(pattern, text)
print("Curly {m,n}:", matches)
# Output: ['12', '123', '1234', '1234']
# Explanation: Matches between 2 and 4 digits.

Curly {m}: ['2024']
Curly {m,n}: ['123', '1234', '5234']


In [None]:

# 11. Parentheses () groups patterns
pattern = r"(ab)+"
text = "ab abab ababab"
matches = re.findall(pattern, text)
print("Grouping ():", matches)
# Output: ['ab', 'ab', 'ab']
# Explanation: Matches 'ab' one or more times.

# 12. Pipe (|) logical OR
pattern = r"cat|dog"
text = "I love cat"
matches = re.findall(pattern, text)
print("Logical OR (|):", matches)
# Output: ['cat', 'dog']
# Explanation: Matches either 'cat' or 'dog'.

Grouping (): ['ab', 'ab', 'ab']
Logical OR (|): ['cat']


#3. Special Sequences

Special sequences do not match for the actual character in the string instead it tells the specific location in the search string where the match must occur. It makes it easier to write commonly used patterns.

| Sequence | Matches            | Description                                                                                    | Example                                | Matches In                                 |
| -------- | ------------------ | ---------------------------------------------------------------------------------------------- | -------------------------------------- | ------------------------------------------ |
| `\A`     | Start of string    | Matches only at the **beginning** of the string (unlike `^`, which can work in multiline mode) | `re.search(r"\AHello", "Hello World")` | ✅ Matches `"Hello"` at start               |
| `\b`     | Word boundary      | Matches the position between a word character and a non-word character                         | `re.findall(r"\bcat\b", "A cat sat")`  | ✅ Matches `"cat"`, not `"scatter"`         |
| `\B`     | Non-word boundary  | Matches position **not** at a word boundary                                                    | `re.findall(r"\Bcat\B", "scattered")`  | ✅ Matches `"cat"` inside `"scattered"`     |
| `\d`     | Digit (0–9)        | Matches any single digit                                                                       | `re.findall(r"\d", "Room 101")`        | ✅ Matches `['1', '0', '1']`                |
| `\D`     | Non-digit          | Matches any single character **not** a digit                                                   | `re.findall(r"\D", "A1B2")`            | ✅ Matches `['A', 'B']`                     |
| `\s`     | Whitespace         | Matches spaces, tabs, newlines                                                                 | `re.findall(r"\s", "Hello World")`     | ✅ Matches `[' ']`                          |
| `\S`     | Non-whitespace     | Matches any non-whitespace character                                                           | `re.findall(r"\S", "Hi!")`             | ✅ Matches `['H', 'i', '!']`                |
| `\w`     | Word character     | Matches letters, digits, underscore (`_`)                                                      | `re.findall(r"\w", "Hi_123!")`         | ✅ Matches `['H', 'i', '_', '1', '2', '3']` |
| `\W`     | Non-word character | Matches any character **not** a letter, digit, or underscore                                   | `re.findall(r"\W", "Hi_123!")`         | ✅ Matches `['!']`                          |
| `\Z`     | End of string      | Matches only at the **end** of the string (unlike `$`, which can match before a newline)       | `re.search(r"World\Z", "Hello World")` | ✅ Matches `"World"` at end                 |


In [None]:
import re

# 1. \A - Matches only at the start of the string
text = "Hello World"
match = re.search(r"\AHello", text)
print(r"\A :", match.group() if match else None)
# Output: Hello
# Explanation: \A matches 'Hello' only if it's at the very beginning.

# 2. \b - Word boundary
text = "A cat acat"
matches = re.findall(r"\bcat\b", text)
print(r"\b :", matches)
# Output: ['cat']
# Explanation: \b matches 'cat' as a whole word, not part of another word.


\A : Hello
\b : ['cat']


In [None]:

# 3. \B - Non-word boundary
text = "scattered"
matches = re.findall(r"\Bcat\B", text)
print(r"\B :", matches)
# Output: ['cat']
# Explanation: \B matches 'cat' only when it's inside another word.

# 4. \d - Digit (0–9)
text = "Room 101"
matches = re.findall(r"\d", text)
print(r"\d :", matches)
# Output: ['1', '0', '1']
# Explanation: \d matches each digit individually.


\B : ['cat']
\d : ['1', '0', '1']


In [None]:
# 5. \D - Non-digit
text = "A1B2  ,s vjs 78"
matches = re.findall(r"\D", text)
print(r"\D :", matches)
# Output: ['A', 'B']
# Explanation: \D matches any character that is not a digit.

# 6. \s - Whitespace
text = "Hello World hhkj jkh"
matches = re.findall(r"\s", text)
print(r"\s :", matches)
# Output: [' ']
# Explanation: \s matches spaces, tabs, and newlines.

\D : ['A', 'B', ' ', ' ', ',', 's', ' ', 'v', 'j', 's', ' ']
\s : [' ', ' ', ' ']


In [None]:

# 7. \S - Non-whitespace
text = "Hi   gfgh  !"
matches = re.findall(r"\S", text)
print(r"\S :", matches)
# Output: ['H', 'i', '!']
# Explanation: \S matches all characters except spaces.

# 8. \w - Word character
text = "Hi_123!"
matches = re.findall(r"\w", text)
print(r"\w :", matches)
# Output: ['H', 'i', '_', '1', '2', '3']
# Explanation: \w matches letters, numbers, and underscore.

\S : ['H', 'i', 'g', 'f', 'g', 'h', '!']
\w : ['H', 'i', '_', '1', '2', '3']


In [None]:

# 9. \W - Non-word character
text = "Hi_123!"
matches = re.findall(r"\W", text)
print(r"\W :", matches)
# Output: ['!']
# Explanation: \W matches characters that are not letters, digits, or underscore.

# 10. \Z - End of string
text = "Hello World"
match = re.search(r"World\Z", text)
print(r"\Z :", match.group() if match else None)
# Output: World
# Explanation: \Z matches 'World' only if it's at the very end.


\W : ['!']
\Z : World


#\A-Matches if the string begins with the given character.

In [None]:
import re

# \A - Matches if the string begins with the given character
pattern_A = r'\Aahmedabad'
strings_A = ['ahmedabad is great', 'ahmedabad rocks', 'welcome to ahmedabad']
print("Pattern \\A")
for s in strings_A:
    print(f"'{s}':", re.search(pattern_A, s) is not None)



Pattern \A
'ahmedabad is great': True
'ahmedabad rocks': True
'welcome to ahmedabad': False


#**\b**-Matches if the word begins or ends with the given character.


\bword checks for the beginning of the word, and word\b checks for the end of the word.

In [None]:
# \b - Matches if the word begins or ends with the given character
pattern_b_start = r'\bcity'
pattern_b_end = r'city\b'
strings_b = ['cityscape', 'cityline', 'capital city']
print("\nPattern \\b (start)")
for s in strings_b:
    print(f"'{s}':", re.search(pattern_b_start, s) is not None)

print("\nPattern \\b (end)")
for s in strings_b:
    print(f"'{s}':", re.search(pattern_b_end, s) is not None)




Pattern \b (start)
'cityscape': True
'cityline': True
'capital city': True

Pattern \b (end)
'cityscape': False
'cityline': False
'capital city': True


#\B-It is the opposite of \b; the string should not start or end with the given regex.

In [None]:
# \B - Matches if the word does not start or end with the given character
pattern_B = r'\Bcity'
strings_B = ['metropolis', 'diverse city', 'cityscape', 'cityline']
print("\nPattern \\B")
for s in strings_B:
    print(f"'{s}':", re.search(pattern_B, s) is not None)




Pattern \B
'metropolis': False
'diverse city': False
'cityscape': False
'cityline': False


#\d

Matches any decimal digit. Equivalent to the set class [0-9].

In [None]:
# \d - Matches any decimal digit
pattern_d = r'\d'
strings_d = ['123', 'abc1', 'no digits here']
print("\nPattern \\d")
for s in strings_d:
    print(f"'{s}':", re.search(pattern_d, s) is not None)




Pattern \d
'123': True
'abc1': True
'no digits here': False


#\D

Matches any non-digit character. Equivalent to the set class [^0-9].

In [None]:
# \D - Matches any non-digit character
pattern_D = r'\D'
strings_D = ['abc', 'a1b2', '123']
print("\nPattern \\D")
for s in strings_D:
    print(f"'{s}':", re.search(pattern_D, s) is not None)




Pattern \D
'abc': True
'a1b2': True
'123': False


#\s

Matches any whitespace character.

In [None]:
# \s - Matches any whitespace character
pattern_s = r'\s'
strings_s = ['hello world', 'a b c', 'nospaces']
print("\nPattern \\s")
for s in strings_s:
    print(f"'{s}':", re.search(pattern_s, s) is not None)




Pattern \s
'hello world': True
'a b c': True
'nospaces': False


#\S

Matches any non-whitespace character.

In [None]:
# \S - Matches any non-whitespace character
pattern_S = r'\S'
strings_S = ['hello_world', 'abc 123', '    ']
print("\nPattern \\S")
for s in strings_S:
    print(f"'{s}':", re.search(pattern_S, s) is not None)


Pattern \S
'hello_world': True
'abc 123': True
'    ': False


#\w

Matches any alphanumeric character. Equivalent to the class [a-zA-Z0-9_].

In [None]:
# \w - Matches any alphanumeric character
pattern_w = r'\w'
strings_w = ['abc123', 'City_42', '!@#']
print("\nPattern \\w")
for s in strings_w:
    print(f"'{s}':", re.search(pattern_w, s) is not None)


Pattern \w
'abc123': True
'City_42': True
'!@#': False


#\W

Matches any non-alphanumeric character.

In [None]:
# \Z - Matches if the string ends with the given character
pattern_Z = r'end\Z'
strings_Z = ['the end', 'in the end', 'ending']
print("\nPattern \\Z")
for s in strings_Z:
    print(f"'{s}':", re.search(pattern_Z, s) is not None)


Pattern \Z
'the end': True
'in the end': True
'ending': False


#RegEx Functions

#The re module contains many functions that help us to search a string for a match.
#1. re.findall()

##Finds and returns all matching occurrences in a list.

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

Finding all occurrences of a pattern

This code uses a regular expression (\d+) to find all the sequences of one or more digits in the given string. It searches for numeric values and stores them in a list. In this example, it finds and prints the numbers “123456789” and “987654321” from the input string.

In [None]:
import re

pattern = r'\d+'
text = "123456789 and 987654321 are numbers"

matches = re.findall(pattern, text)
print(matches)

['123456789', '987654321']


#2. re.compile()

##Compiles a regular expression pattern into a pattern object.
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.

Example 1:

The code uses a regular expression pattern [a-e] to find and list all lowercase letters from ‘a’ to ‘e’ in the input string “Aye, said Mr. Gibenson Stark”. The output will be ['e', 'a', 'd', 'b', 'e'], which are the matching characters.



In [None]:
import re
p = re.compile('[a-e]')

print(p.findall("Hey, said Mr. Gibenson Stark"))


['e', 'a', 'd', 'b', 'e', 'a']


In [None]:
import re

pattern = re.compile(r'\d+')
text = "There are 42 apples and 23 oranges."

matches = pattern.findall(text)
print(matches)

['42', '23']


#Example: Set class [\s,.] will match any whitespace character,  ‘,’,  or, ‘.’ .

The code uses regular expressions to find and list all single digits and sequences of digits in the given input strings. It finds single digits with \d and sequences of digits with \d+.

In [None]:
import re
p = re.compile('\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

p = re.compile('\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))


['1', '1', '4', '1', '8', '8', '6']
['11', '4', '1886']


  p = re.compile('\d')
  p = re.compile('\d+')


#Example :

The code uses regular expressions to find and list word characters, sequences of word characters, and non-word characters in input strings. It provides lists of the matched characters or sequences.

In [None]:
import re

p = re.compile('\w')
print(p.findall("He said * in some_lang."))

p = re.compile('\w+')
print(p.findall("I went to him at 11 A.M., he said *** in some_language."))

p = re.compile('\W')
print(p.findall("he said *** in some_language."))


['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_', 'l', 'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said', 'in', 'some_language']
[' ', ' ', '*', '*', '*', ' ', ' ', '.']


  p = re.compile('\w')
  p = re.compile('\w+')
  p = re.compile('\W')


#3. re.split()

Splits the string by the occurrences of a character or a pattern.

In [None]:
import re

pattern = r'\s+'
text = "Split this text into words"

split_text = re.split(pattern, text)
print(split_text)


['Split', 'this', 'text', 'into', 'words']


In [None]:
from re import split

print(split('\W+', 'Words, words , Words'))
print(split('\W+', "Word's words Words"))
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))


['Words', 'words', 'Words']
['Word', 's', 'words', 'Words']
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']


  print(split('\W+', 'Words, words , Words'))
  print(split('\W+', "Word's words Words"))
  print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
  print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))


#4. re.sub()

Replaces all occurrences of a character or pattern with a replacement string.

In [None]:
import re

pattern = r'\d+'
text = "There are 42 apples and 23 oranges."
replacement = 'XX'

result = re.sub(pattern, replacement, text)
print(result)


There are XX apples and XX oranges.


#5. re.escape()

Escapes special characters in a pattern.

Returns string with all non-alphanumerics backslashed, this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

re.escape() is used to escape special characters in a string, making it safe to be used as a pattern in regular expressions. It ensures that any characters with special meanings in regular expressions are treated as literal characters.

In [None]:
import re
print(re.escape("This is Awesome even 1 AM"))
print(re.escape("I Asked what is this [a-9], he said \t ^WoW"))


This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\],\ he\ said\ \	\ \^WoW


In [None]:
import re

text = "This is a test. $100 cost!"
escaped_text = re.escape(text)
print(escaped_text)

This\ is\ a\ test\.\ \$100\ cost!


#6. re.search()

Searches for the first occurrence of a character or pattern.

In [None]:
import re

pattern = r'\d+'
text = "There are 42 apples and 23 oranges."

match = re.search(pattern, text)
if match:
    print(f"Found match: {match.group()}")
else:
    print("No match found")


Found match: 42


In [None]:
import re
regex = r"([a-zA-Z]+) (\d+)"

match = re.search(regex, "I was born on June 24")
if match != None:
	print ("Match at index %s, %s" % (match.start(), match.end()))
	print ("Full match: %s" % (match.group(0)))
	print ("Month: %s" % (match.group(1)))
	print ("Day: %s" % (match.group(2)))

else:
	print ("The regex pattern does not match.")


Match at index 14, 21
Full match: June 24
Month: June
Day: 24


#SETS

A Set is a set of characters enclosed in ‘[  ]’ brackets. Sets are used to match a single character in the set of characters specified between brackets. Below is the list of Sets:

A "Set" in regular expressions is a construct used to specify a group of characters, any one of which can match a single character in the input string. These sets are enclosed in square brackets []. Here's a list of common sets and their uses:
Common Sets in Regular Expressions

    Literal Character Set: [abc]
        Matches: Any one of the characters a, b, or c.
        Example: The pattern [aeiou] matches any single vowel.

    Range of Characters: [a-z], [A-Z], [0-9]
        Matches: Any character within the specified range.
        Example:
            [a-z] matches any lowercase letter.
            [A-Z] matches any uppercase letter.
            [0-9] matches any digit.

    Negation: [^abc]
        Matches: Any character except a, b, or c.
        Example: The pattern [^0-9] matches any character that is not a digit.

    Combined Sets: [a-zA-Z]
        Matches: Any character that is a lowercase or uppercase letter.
        Example: [a-zA-Z] matches any letter regardless of case.

    Special Characters in Sets:
        Dash (-): Inside a set, a dash is used to specify a range unless it is at the beginning or end of the set, where it is treated as a literal character.
            Example: [a-z0-9-] matches any lowercase letter, digit, or the hyphen -.
        Caret (^): At the start of a set, it negates the set, matching any character not in the set.
        Example: [^a-z] matches any character that is not a lowercase letter.
        Bracket ([]): A closing bracket ] can be included in the set by placing it first (after an optional ^) or escaping it with a backslash.
            Example: [abc\]] matches a, b, c, or ].

    Predefined Character Classes (often used in place of sets):
        \d: Matches any digit, equivalent to [0-9].
        \D: Matches any non-digit, equivalent to [^0-9].
        \w: Matches any word character (alphanumeric + underscore), equivalent to [a-zA-Z0-9_].
        \W: Matches any non-word character, equivalent to [^a-zA-Z0-9_].
        \s: Matches any whitespace character, equivalent to [ \t\n\r\f\v].
        \S: Matches any non-whitespace character, equivalent to [^ \t\n\r\f\v].

Example Uses:

    Matching a Single Vowel:
        Pattern: [aeiou]
        Example String: "apple"
        Matches: a, e

    Matching a Digit or a Hyphen:
        Pattern: [0-9-]
        Example String: "8-3"
        Matches: 8, -, 3

    Matching Any Character Except a Newline:
        Pattern: [^\n]
        Example String: "Hello World\n"
        Matches: All characters in "Hello World" except the newline character.

#The regular expressions and Python scripts to match binary strings for each of the provided sets.


1. 0 or 11 or 101

Regular Expression: 0|11|101

In [None]:
import re

pattern = re.compile(r'0|11|101')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('0'))    # True
print(match_string('11'))   # True
print(match_string('101'))  # True
print(match_string('10'))    # False


True
True
True
False


#Only 0s

Regular Expression: 0*

In [None]:
import re

pattern = re.compile(r'0*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('000'))  # True
print(match_string('0'))    # True
print(match_string('01'))    # False


True
True
False


#3. All binary strings

Regular Expression: (0|1)*

In [None]:
import re

pattern = re.compile(r'(0|1)*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('0101'))  # True
print(match_string(''))      # True
print(match_string('1102'))  # False


True
True
False


#All binary strings except empty string

Regular Expression: (0|1)(0|1)*

In [None]:
import re

pattern = re.compile(r'(0|1)+')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('0'))    # True
print(match_string('101'))  # True
print(match_string(''))     # False


True
True
False


#Begins with 1, ends with 1

Regular Expression: 1(0|1)*1

In [None]:
import re

pattern = re.compile(r'1(0|1)*1')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('101'))  # True
print(match_string('11'))  # True
print(match_string('110'))  # False
print(match_string('0'))    # False


True
True
False
False


#6. Ends with 00

Regular Expression: (0|1)*00

Python Script:

In [None]:
import re

pattern = re.compile(r'(0|1)*00')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('10000'))  # True
print(match_string('00'))     # True
print(match_string('010'))    # False


True
True
False


#7. Contains at least three 1s

Regular Expression: (0|1)*1(0|1)*1(0|1)*1

In [None]:
import re

pattern = re.compile(r'(0|1)*1(0|1)*1(0|1)*1(0|1)*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('1110'))   # True
print(match_string('101011')) # True
print(match_string('0110'))  # False


True
True
False


#Contains at least three consecutive 1s

Regular Expression: (0|1)*111(0|1)*

In [None]:
import re

pattern = re.compile(r'(0|1)*111(0|1)*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('111'))   # True
print(match_string('01111011'))# True
print(match_string('1001'))  # False


True
True
False


In [None]:
#9. Contains the substring 110

#Regular Expression: (0|1)*110(0|1)*
import re

pattern = re.compile(r'(0|1)*110(0|1)*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('110'))   # True
print(match_string('0110'))  # True
print(match_string('10101')) # False


True
True
False


# #Doesn't contain the substring 110

Regular Expression: (0|10)*1*

In [None]:
import re

pattern = re.compile(r'(0|10)*1*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('0'))    # True
print(match_string('01'))   # True
print(match_string('1111000'))  # False



True
True
False


#11. Contains at least two 0s but not consecutive 0s

Regular Expression: `(101(0+011*))*

In [None]:
import re

pattern = re.compile(r'(1*0(1+0)+1*)')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('010'))  # True
print(match_string('10101'))  # True
print(match_string('1010011'))   # False


True
True
False


#12. Has at least 3 characters, and the third character is 0

Regular Expression: (0|1)(0|1)0(0|1)*

In [None]:
import re

pattern = re.compile(r'(0|1)(0|1)0(0|1)*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('100'))  # True
print(match_string('001'))  # True
print(match_string('10011'))   # False


True
False
True


#13. Number of 0s is a multiple of 3

Regular Expression: 1*|(1*01*01*01)*

In [None]:
import re

pattern = re.compile(r'1*|(1*01*01*01)*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('000'))    # True
print(match_string('000000')) # True
print(match_string('01'))     # False


False
False
False


#14. Starts and ends with the same character

Regular Expression: 1(0|1)*1|0(0|1)*0

In [None]:
import re

pattern = re.compile(r'1(0|1)*1|0(0|1)*0')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('101'))  # True
print(match_string('010'))  # True
print(match_string('11'))   # True
print(match_string('10'))   # False


True
True
True
False


#15. Odd length

Regular Expression: (0|1)((0|1)(0|1))*

In [None]:
import re

pattern = re.compile(r'(0|1)((0|1)(0|1))*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('1'))    # True
print(match_string('101'))  # True
print(match_string('1100')) # False


True
True
False


In [None]:
#16. Starts with 0 and has odd length, or starts with 1 and has even length

#Regular Expression: 0((0|1)(0|1))*|1(0|1)((0|1)(0|1))*

In [None]:
import re

pattern = re.compile(r'0((0|1)(0|1))*|1(0|1)((0|1)(0|1))*')

def match_string(s):
    return bool(pattern.fullmatch(s))

# Test
print(match_string('0'))       # True
print(match_string('010'))     # True
print(match_string('1100'))    # True
print(match_string('111'))     # True
print(match_string('1'))       # False


True
True
True
False
False


In [None]:
import re

# Regular expression pattern to match binary strings with an odd number of '1's
pattern = re.compile(r'(0*(10*10*)*10*)')

def has_odd_number_of_ones(s: str) -> bool:
    return bool(pattern.fullmatch(s))

# Test cases
for test in ['101', '1101', '1110', '1001', '0000', '1111','1']:
    print(f"'{test}': {has_odd_number_of_ones(test)}")


'101': False
'1101': True
'1110': True
'1001': False
'0000': False
'1111': False
'1': True


In [None]:
import re

# Regular expression pattern to match binary strings with an even number of '1's
pattern = re.compile(r'^(0*(10*10*)*0*)$')

def has_even_number_of_ones(s: str) -> bool:
    return bool(pattern.fullmatch(s))

# Test cases
for test in ['101', '1101', '1110', '1001', '0000', '1111', '1']:
    print(f"'{test}': {has_even_number_of_ones(test)}")


'101': True
'1101': False
'1110': False
'1001': True
'0000': True
'1111': True
'1': False


In [None]:
import re

# Regular expression pattern to match binary strings with an even number of '0's
pattern = re.compile(r'^(1*01*01*)*$')

def has_even_number_of_zeros(s: str) -> bool:
    return bool(pattern.fullmatch(s))

# Test cases
for test in ['1010', '1100', '1001', '1111', '0000', '0101', '1', '0']:
    print(f"'{test}': {has_even_number_of_zeros(test)}")


'1010': True
'1100': True
'1001': True
'1111': False
'0000': True
'0101': True
'1': False
'0': False


In [None]:
import re

# Regular expression pattern to match binary strings with an even number of '0's and an odd number of '1's
pattern = re.compile(r'^(1*(01*01*)*1*)$')

def has_even_zeros_odd_ones(s: str) -> bool:
    return bool(pattern.fullmatch(s))

# Test cases
for test in ['1010', '1101', '1110', '1001', '0000', '1111', '1', '0']:
    print(f"'{test}': {has_even_zeros_odd_ones(test)}")


'1010': True
'1101': False
'1110': False
'1001': True
'0000': True
'1111': True
'1': True
'0': False
