## Regular Expressions (RE)

- Patterns to match strings.
- Enclosed in r"" notation.
- Use square brackets for character choices, e.g., [Aa].
- | for alternatives, like [Aa]pple|[Bb]anana.
- Legal variable names pattern: r"[A-Za-z_][A-Za-z_0-9]*\Z".
- Use raw strings (r"") for RE patterns to avoid conflicts with special notations.
## Common RE Notations

- . matches any character.
- [...] matches characters in brackets.
- [^...] matches characters not in brackets.
- ^ matches start of the string.
- $ matches end of the string.
- '*' matches zero or more.
- '+' matches one or more.
- {m,n} matches m to n occurrences.
- ? matches zero or one occurrence.
## Grouping and Shorthand

- Grouping with parentheses.
- Backreferences with \number.
- Shorthand notations: \d, \D, \s, \S, \w, \W.
## Matching Functions

- `re.match` for the start of a string.
- `re.search` for matching anywhere.
- `re.findall` finds all occurrences.
- `re.finditer` returns an iterator of match objects.
- `re.sub` replaces pattern matches.
## Pattern Compilation

- Precompile patterns for efficiency with re.compile.
- Flags like `re.IGNORECASE, re.MULTILINE`, and `re.DOTALL` modify matching behavior.
## Match Objects

- Returned by re.match and re.search.
- Access matched substrings and groups with methods.
- Use boolean checks like if mo: or convert with bool(mo).

In [1]:
# Match and search functions
import re
s = "Doing things, going home, staying awake, sleeping later"
re.findall(r'\w+ing\b', s)

['Doing', 'going', 'staying', 'sleeping']

In [2]:
re.findall(r'[+-]?\d+', "23 + -24 = -1")

['23', '-24', '-1']

In [4]:
s = ("if I'm not in a hurry, then I should stay. " + " On the other hand, if I leave, then I can sleep.")
# Greedy matching (.*) tries to match as many characters as possible
re.findall(r'[Ii]f (.*), then', s)

["I'm not in a hurry, then I should stay.  On the other hand, if I leave"]

In [5]:
"""
The repetition specifiers +, *, ?, and {m,n} have corresponding non-greedy versions: +?, *?, ??, and {m,n}?. 
These expressions use as few characters as possible to make the whole pattern match some substring. 
"""
s = ("if I'm not in a hurry, then I should stay. " + " On the other hand, if I leave, then I can sleep.")
# Non - Greedy matching (.*?) tries to match as many characters as possible
re.findall(r'[Ii]f (.*?), then', s)

["I'm not in a hurry", 'I leave']

In [6]:
# Functions in the re module
import re
str = "She goes where she wants to, she's a sheriff."
newstr = re.sub(r'\b[Ss]he\b', 'he', str)
print(newstr)

he goes where he wants to, he's a sheriff.


In [11]:
import re
str = """He is a timelord.
He has a Tardis."""
newstr = re.sub(r'(\b[Hh]e\b)', r'\1 (The Doctor)', str, 1)
print(newstr)

He (The Doctor) is a timelord.
He has a Tardis.


In [13]:
# Match Object

mo = re.search(r'\d+ (\d+) \d+ (\d+)', 'first 123 45 67 890 last')
if mo:
    print(mo)

<re.Match object; span=(6, 19), match='123 45 67 890'>


In [15]:
# ignore cases
# pre compile pattern for faster response 
# (?i) > re.IGNORECASE
# (?m) > re.MULTILINE
# (?s) > re.DOTALL
import re
pattern = r'hello world'
re.compile(pattern, re.MULTILINE | re.DOTALL)

re.compile(r'hello world', re.MULTILINE|re.DOTALL|re.UNICODE)

In [44]:
"""
Write function integers_in_brackets that finds from a given string all integers that are enclosed in brackets.
Example run: 
integers_in_brackets(" afd [asd] [12 ] [a34] [ -43 ]tt [+12]xxx") 
returns [12, -43, 12]. 
So there can be whitespace between the number and the brackets, 
but no other character besides those that make up the integer.

Test your function from the main function.
"""
import re
def integers_in_brackets(s):
    pattern = r'\[(-?\d+)\]'
    result = re.findall(pattern, s)
    print(result)
    return []

def main():
    result = integers_in_brackets(" afd [asd] [12 ] [a34] [ -43 ]tt [+12]xxx")
    print(result)
main()


[]
[]
