# Regular Expressions in Python

Regular expressions (regex) are a powerful tool for matching patterns in text. Python provides the `re` module to work with regular expressions. Here are some common operations you can perform with regex in Python:

## Importing the `re` Module

To use regular expressions in Python, you need to import the `re` module:

```python
import re
```

## Basic Functions

### `re.search()`

Searches for the first occurrence of the pattern in the string.

```python
match = re.search(r'\d+', 'The price is 100 dollars')
if match:
    print(match.group())  # Output: 100
```

### `re.findall()`

Finds all occurrences of the pattern in the string.

```python
matches = re.findall(r'\d+', 'There are 3 cats, 4 dogs, and 5 birds')
print(matches)  # Output: ['3', '4', '5']
```

### `re.sub()`

Replaces occurrences of the pattern with a replacement string.

```python
result = re.sub(r'\d+', 'number', 'There are 3 cats, 4 dogs, and 5 birds')
print(result)  # Output: There are number cats, number dogs, and number birds
```

### `re.split()`

Splits the string by occurrences of the pattern.

```python
result = re.split(r'\s+', 'Split this sentence into words')
print(result)  # Output: ['Split', 'this', 'sentence', 'into', 'words']
```

## Special Characters

- `.`: Matches any character except a newline.
- `^`: Matches the start of the string.
- `$`: Matches the end of the string.
- `*`: Matches 0 or more repetitions of the preceding pattern.
- `+`: Matches 1 or more repetitions of the preceding pattern.
- `?`: Matches 0 or 1 repetition of the preceding pattern.
- `{m,n}`: Matches between m and n repetitions of the preceding pattern.

## Character Classes

Stop executing after Space

- `\d`: Matches any digit.
- `\D`: Matches any non-digit.
- `\w`: Matches any alphanumeric character.
- `\W`: Matches any non-alphanumeric character.
- `\s`: Matches any whitespace character.
- `\S`: Matches any non-whitespace character.
- `\b`: Defines the Boundry.

Regular expressions are a versatile tool for text processing and can be used for tasks such as validation, parsing, and string manipulation.

```python

### Example of `re.match()`

The `re.match()` function attempts to match a pattern at the beginning of a string. If the pattern is found at the start of the string, it returns a match object; otherwise, it returns `None`.

#### Example

```python
import re

pattern = r'\d+'
string = '123abc456'

match = re.match(pattern, string)
if match:
    print('Match found:', match.group())  # Output: Match found: 123
else:
    print('No match')
```

````

``` python
import re 

string = "The quick brown fox jumps over the lazy dog"
pattern = "quick"

# search for the pattern
match = re.search(pattern, string)
match1 = re.search(pattern, string, re.IGNORECASE)
match2 = re.search(pattern, string, re.IGNORECASE | re.MULTILINE)

```

``` python

import re

# Dot (.)
pattern = r"a.b"
text = "acb aab a.b"
matches = re.findall(pattern, text)
print("Dot (.) matches:", matches)  # Output: ['acb', 'aab']

# Caret (^)
pattern = r"^Hello"
text = "Hello world! Hello again!"
matches = re.findall(pattern, text)
print("Caret (^) matches:", matches)  # Output: ['Hello']

# Dollar ($)
pattern = r"world!$"
text = "Hello world! Hello again!"
matches = re.findall(pattern, text)
print("Dollar ($) matches:", matches)  # Output: ['world!']

# Asterisk (*)
pattern = r"ab*"
text = "a ab abb abbb"
matches = re.findall(pattern, text)
print("Asterisk (*) matches:", matches)  # Output: ['a', 'ab', 'abb', 'abbb']

# Plus (+)
pattern = r"ab+"
text = "a ab abb abbb"
matches = re.findall(pattern, text)
print("Plus (+) matches:", matches)  # Output: ['ab', 'abb', 'abbb']

# Question Mark (?)
pattern = r"ab?"
text = "a ab abb abbb"
matches = re.findall(pattern, text)
print("Question Mark (?) matches:", matches)  # Output: ['a', 'ab', 'ab', 'ab']

# Braces ({})
pattern = r"ab{2,3}"
text = "a ab abb abbb abbbb"
matches = re.findall(pattern, text)
print("Braces ({}) matches:", matches)  # Output: ['abb', 'abbb']

# Square Brackets ([])
pattern = r"[aeiou]"
text = "hello world"
matches = re.findall(pattern, text)
print("Square Brackets ([]) matches:", matches)  # Output: ['e', 'o', 'o']

# Backslash (\)
pattern = r"\d"
text = "There are 2 apples and 5 oranges."
matches = re.findall(pattern, text)
print("Backslash (\\) matches:", matches)  # Output: ['2', '5']

# Pipe (|)
pattern = r"cat|dog"
text = "I have a cat and a dog."
matches = re.findall(pattern, text)
print("Pipe (|) matches:", matches)  # Output: ['cat', 'dog']

# Parentheses (())
pattern = r"(ab)+"
text = "abab ab ababab"
matches = re.findall(pattern, text)
print("Parentheses (()) matches:", matches)  # Output: ['ab', 'ab']

````

## Common RegEx Patterns

Pattern	Description	Example Match

\d	Matches any digit (0-9)	"abc123" → 123

\D	Matches any non-digit	"abc123" → abc

\w	Matches any word character (A-Z, a-z, 0-9, _)	"hello_123" → "hello_123"

\W	Matches any non-word character	"hello! world?" → "! ?"

\s	Matches any whitespace character (space, tab, newline)	"Hello World" → " "

\S	Matches any non-whitespace character	"Hello World" → "HelloWorld"

.	Matches any character except a newline	"abc" → "a", "b", "c"

^	Matches the start of the string	"Hello" → Matches "H"

$	Matches the end of the string	"World!" → Matches "!"

*	Matches 0 or more occurrences	"ba*" matches "b", "ba", "baaa"

+	Matches 1 or more occurrences	"ba+" matches "ba", "baa" but not "b"

?	Matches 0 or 1 occurrence	"ba?" matches "b", "ba"

{n}	Matches exactly n occurrences	"\d{3}" matches "123"

{n,}	Matches at least n occurrences	"\d{2,}" matches "12", "123", "1234"

{n,m}	Matches between n and m occurrences	"\d{2,4}" matches "12", "123", "1234"

[...]	Matches any character inside brackets	"[aeiou]" matches "a", "e", "i"

[^...]	Matches any character not inside brackets	"[^aeiou]" matches any non-vowel

`(x	y)`	Matches x or y

In [None]:
## Greedy and Non-Greedy Expressions

### Greedy Expressions

Greedy expressions in regular expressions try to match as much text as possible. They expand the match as far as they can go while still allowing the overall pattern to match.

#### Example


In [None]:
import re

# Greedy expression example
greedy_pattern = r'<.*>'
text = '<div>Some content</div><div>More content</div>'
greedy_matches = re.findall(greedy_pattern, text)
print("Greedy matches:", greedy_matches)  # Output: ['<div>Some content</div><div>More content</div>']

# Non-greedy expression example
non_greedy_pattern = r'<.*?>'
non_greedy_matches = re.findall(non_greedy_pattern, text)
print("Non-greedy matches:", non_greedy_matches)  # Output: ['<div>', '</div>', '<div>', '</div>']

In [None]:
import re

# List of class names
class_names = ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "George",
               "Nancy", "Oscar", "Paul", "Quincy", "Rachel", "Steve", "Tom", "Zara"]

# Regex patterns
pattern_group1 = r"^[A-Ma-m]"  # Names starting with A-M (case insensitive)
pattern_group2 = r"^[N-Zn-z]"  # Names starting with N-Z (case insensitive)

# Divide names using regex
group1 = [name for name in class_names if re.match(pattern_group1, name)]
group2 = [name for name in class_names if re.match(pattern_group2, name)]

# Print the groups
print("Group 1 (A-M):", group1)
print("Group 2 (N-Z):", group2)

In [None]:
# Regular Expressionfor any pattern not starting with 'b' or 'c' 'd & not ending with "rst" 
import re

# Regular expression pattern
pattern = r"^(?![bcd]).*(?!.*rst$)"
pattern2 = r"^[^bcd]...[^r-t]$"
# Test strings
test_strings = [
    "apple",    # Should match
    "banana",   # Should not match (starts with 'b')
    "cherry",   # Should not match (starts with 'c')
    "date",     # Should not match (starts with 'd')
    "forest",   # Should not match (ends with 'rst')
    "grape",    # Should match
    "mango",    # Should match
    "orange",   # Should match
    "pqrst",    # Should not match (ends with 'rst')
    "strawberry" # Should match
]

# Check each string against the pattern
for string in test_strings:
    if re.match(pattern, string):
        print(f"'{string}' matches the pattern")
    else:
        print(f"'{string}' does not match the pattern")

print("\n")

for string in test_strings:
    if re.match(pattern2, string):
        print(f"'{string}' matches the pattern")
    else:
        print(f"'{string}' does not match the pattern")        

In [None]:
import re

# Regular expression pattern for decimal numbers
pattern = r"[-+]?\d*(\.\d+)?"
pattern2 = r"/d+space?[.,/]?\d+"

# Test strings
test_strings = [
    "123",      # Integer
    "-123",     # Negative integer
    "+123",     # Positive integer
    "123.456",  # Floating-point number
    "-123.456", # Negative floating-point number
    "+123.456", # Positive floating-point number
    ".456",     # Floating-point number without leading digits
    "-.456",    # Negative floating-point number without leading digits
    "+.456",    # Positive floating-point number without leading digits
    "abc",      # Not a number
    "123abc",   # Not a number
    "123.",     # Integer with trailing decimal point
    "-123.",    # Negative integer with trailing decimal point
    "+123.",     # Positive integer with trailing decimal point
    "1 3/4",
    "1,255,2,5,7"
]

# Check each string against the pattern
# for string in test_strings:
#     if re.fullmatch(pattern, string):
#         print(f"'{string}' matches the pattern")
#     else:
#         print(f"'{string}' does not match the pattern")

print("\n")        

for string in test_strings:
    if re.fullmatch(pattern, string):
        print(f"'{string}' matches the pattern")
    else:
        print(f"'{string}' does not match the pattern")        

In [1]:
import re 

s="aditya gupta"
p1=r"^a\w+?a$"
p2=r"^a.*?a"
#p3=r"^a^\s?a$""
matches = re.match(p2,s)
print(matches) #finds aditya

#s1="I am Going Home"    


<re.Match object; span=(0, 6), match='aditya'>
