In [10]:
import re

## Metacharacters

| Character | Description                                      | Example         |
|-----------|--------------------------------------------------|-----------------|
| [ ]       | A set of characters                              | "[a-m]"         |
| \         | Signals a special sequence (also for escaping)   | "\d"            |
| .         | Any character (except newline character)         | "he.o"          |
| ^         | Starts with                                      | "^hello"        |
| $         | Ends with                                        | "planets$"      |
| *         | Zero or more occurrences                         | "he.*o"         |
| +         | One or more occurrences                          | "he.+o"         |
| ?         | Zero or one occurrences                          | "he.?o"         |
| { }       | Exactly the specified number of occurrences      | "he.{2}o"       |
| \|        | Either or                                        | "fall\|stays"   |
| ( )       | Capture and group                                | "(fall\|stays)" |


### Special Sequences 

| Character | Description                                                  | Example      |
|-----------|--------------------------------------------------------------|--------------|
| \A        | Returns a match if the specified characters are at the beginning of the string | "\AThe"      |
| \b        | Returns a match where the specified characters are at the beginning or at the end of a word | "bain" or r"ain\b" |
| \B        | Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word | r"Bain" or r"ain\B" |
| \d        | Returns a match where the string contains digits (numbers from 0-9) | "\d"        |
| \D        | Returns a match where the string DOES NOT contain digits     | "\D"         |
| \s        | Returns a match where the string contains a white space character | "\s"        |
| \S        | Returns a match where the string DOES NOT contain a white space character | "\S"        |
| \w        | Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) | "\w"        |
| \W        | Returns a match where the string DOES NOT contain any word characters | "\W"        |
| \z        | Returns a match if the specified characters are at the end of the string | "Spain\z"   |


### re.findall

> returns a list containing all the matches

syntax:- `re.findall(pattern, string, flags=0)`

`pattern`: This is the regular expression pattern you want to search for.

`string`: The string in which you want to search for the pattern.

`flags` (optional): You can specify different flags using bitwise OR (|). These can alter the way the regular expressions work. Some common flags are `re.IGNORECASE`, `re.MULTILINE`, etc.


In [1]:
import re

text = "The cat chased the catnip, but the other cat was too fast."
pattern = r'cat'
found_words = re.findall(pattern, text)

print(found_words)

['cat', 'cat', 'cat']


### re.search

> returns a match object if there is a match anywhere in the string

syntax:- `re.search(pattern, string, flags=0)`

`pattern`: The regular expression pattern you want to search for.

`string`: The string in which you want to search for the pattern.

`flags` (optional): You can specify different flags using bitwise OR (|). These can alter the way the regular expressions work. Some common flags are `re.IGNORECASE`, `re.MULTILINE`, etc.


In [8]:
import re

text = "The quick brown fox jumps over the lazy dog."
pattern = r'dog'


if match:= re.search(pattern, text):
    print("Found:", match.group())
else:
    print("No match found.")

<re.Match object; span=(40, 43), match='dog'>
Found: dog


### re.split

> returns a list where the string has been split at each match

syntax:- `re.split(pattern, string, maxsplit=0, flags=0)`

`pattern`: The regular expression pattern that you want to split the string on.

`string`: The string you want to split.

`maxsplit` (optional): Specifies the maximum number of splits. The default value is 0, meaning no limit on the number of splits.

`flags` (optional): You can specify different flags using bitwise OR (|). These can alter the way the regular expressions work. Some common flags are `re.IGNORECASE`, `re.MULTILINE`, etc.


In [9]:
import re

text = "The fox,quick,brown,dog"
pattern = r','
splitted_text = re.split(pattern, text)

print(splitted_text)

['The fox', 'quick', 'brown', 'dog']


In [None]:
import re

text = "The cat chased the catnip, but the other cat was too fast."
pattern = r'cat'
replacement = 'dog'
new_text = re.sub(pattern, replacement, text)

print(new_text)

### Resources

- Lab 09
- PF8_Regular Expression PPT

## Practice Problems

Write a Python program to find all words that are at least 4 characters long in a string.

In [11]:
sentence = input("Enter a generic sentence ")
pattern = r'[a-zA-Z]{4,}'
words = re.findall(pattern, sentence)
print(words)

['hello', 'name', 'param']


Search the string to see if it starts with "The" and ends with "walk"

In [27]:
sentence = input("Enter a generic sentence ")
pattern = r'\AThe.*walk\Z'
if word:= re.search(pattern, sentence,re.IGNORECASE):
    print("The generic sentence does start with `The` and ends with `walk`")
    print("Match group:", word.group())
else:
    print("The generic sentence does not start with `The` and ends with `walk`")

The generic sentence does start with `The` and ends with `walk`
Match group: the sentence ends with walk


Replace every white-space character with a -

In [15]:
sentence = input("Enter a generic sentence ")
pattern = r'\s'
replacement = "-"
sentence = re.sub(pattern, replacement, sentence)
print(sentence)

hello-world


Write a python program to remove all whitespaces from a string

In [16]:
sentence = input("Enter a generic sentence ")
pattern = r'\s'
replacement = ""
sentence = re.sub(pattern, replacement, sentence)
print(sentence)

helloworld


Write a Python program to find all three, four, and five character words in a string.

In [20]:
sentence = input("Enter a generic sentence ")
pattern = r"\b[a-zA-Z]{3,5}\b"
words = re.findall(pattern, sentence)
print(words)

['blah', 'blehh', 'bli']


Write a Python program to separate and print the numbers in a given string.

In [24]:
sentence = input("Enter a generic sentence ")
pattern = r"\d"
words = re.findall(pattern, sentence)
print(words)

['123', '23', '1']


Write a Python program to replace all occurrences of a space, comma, or dot with a colon.

In [26]:
sentence = input("Enter a generic sentence ")
pattern = r"(\s|\.|,)"
replacement = ":"
sentence = re.sub(pattern, replacement, sentence)
print(sentence)

hello:how:are:you:
