<a href="https://colab.research.google.com/github/bhavika67/NLP/blob/main/Regular_Expression2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Regular Expression (Pattern Matching functions)

**re.match():**

*   Searches for a pattern only at the beginning of the string.
*   Returns a match object if found; otherwise, it returns None

In [None]:
import re

text = "Hello world!"
pattern = r"Hello"

match = re.match(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match found")


Match found: Hello


**re.search():**

*   Searches for a pattern anywhere in the string (not just at the start).
*   Returns a match object for the first occurrence found; otherwise, it returns None.



In [None]:
text = "Say Hello to the world!"
pattern = r"Hello"

search_result = re.search(pattern, text)
if search_result:
    print("Search found:", search_result.group())
else:
    print("No match found")


Search found: Hello


**re.findall():**

*   Returns all non-overlapping matches of the pattern in the string as a list.
*   If no matches are found, it returns an empty list



In [None]:
text = "Email addresses: alice@example.com, bob@example.org"
pattern = r"\b[\w.-]+@[\w.-]+\.\w+\b"

emails = re.findall(pattern, text)
print("All emails found:", emails)


All emails found: ['alice@example.com', 'bob@example.org']


**re.finditer():**

*   Similar to re.findall() but returns an iterator of match objects instead of a list.
*   Useful if you need more details (like position) about each match.



In [None]:
text = "The numbers are 123, 456, and 789"
pattern = r"\d+"

matches = re.finditer(pattern, text)
for match in matches:
    print("Match found:", match.group(), "at position", match.start())


Match found: 123 at position 16
Match found: 456 at position 21
Match found: 789 at position 30


**re.fullmatch():**

*   Checks if the entire string matches the pattern (not just a part).
*   Returns a match object if the full string matches; otherwise, it returns None.



In [None]:
text = "Hello123"
pattern = r"\w+"

full_match = re.fullmatch(pattern, text)
if full_match:
    print("Full match:", full_match.group())
else:
    print("No full match found")


Full match: Hello123


**re.sub():**

*   Replaces all matches of the pattern in the string with a specified replacement string.
*   Returns the modified string.



In [None]:
text = "Replace every number with 'NUM': 123 and 456"
pattern = r"\d+"

modified_text = re.sub(pattern, "NUM", text)
print("Modified text:", modified_text)


Modified text: Replace every number with 'NUM': NUM and NUM


**re.subn():**

*   The modified string.
*   The number of replacements made.



In [None]:
text = "Replace every number: 123 and 456"
pattern = r"\d+"

result = re.subn(pattern, "NUM", text)
print("Modified text:", result[0], "with", result[1], "replacements")


Modified text: Replace every number: NUM and NUM with 2 replacements


**re.split():**

*   Splits the string by occurrences of the pattern, returning a list.
*   Useful for splitting text based on custom delimiters.



In [None]:
text = "Split on commas, semicolons; or spaces"
pattern = r"[,; ]+"

split_text = re.split(pattern, text)
print("Split text:", split_text)


Split text: ['Split', 'on', 'commas', 'semicolons', 'or', 'spaces']
