# Regulas Expressions in Python

```
[] A set of characters "[a-m]"
\  Signals a special sequence "\d"
.  Any character "he..o"
^  Begins with "^hello"
$  Ends with "plane$"
*  zero or more occurences "he.*o"
+  One or more occurences "he.+o"
?  Zero or more occurences "he.?
{} Exactly the specified number of occurences "he.{2}o"
|  Either or
() Capture and group
```

In [5]:
import re
txt = "The rain in Spain"

x = re.search("^The.*Spain$", txt)
if x : print("Matched text: ", x.group())

Matched text:  The rain in Spain


In [20]:
import re
txt = "I gave you 50 dollars, 720 cents"
x = re.findall(r"\d\d", txt) # or "\d{2}"
print(x)

['50', '72']


In [19]:
import re
txt = "hello, I gave you 50 dollars, 72 cents"
x = re.findall(r"he..o", txt) # or "he.{2}"
print(x)

['hello']


In [2]:
import re
txt = "hello, hecko, 20, heggo, heo wazzaap"
x = re.findall(r"he.*o", txt)
y = re.findall(r"he.+o", txt)
print(x)
print(y)

['hello, hecko, 20, heggo, heo']
['hello, hecko, 20, heggo, heo']


In [None]:
import re
txt = "hello, hecko, heggo, heo"
x = re.findall(r"he.*?o", txt)
y = re.findall(r"he.+?o", txt)
print(x)
print(y)

"he.*o"     ['hello', 'hecko', 'heggo', 'heo']
"he.+o"     ['hello', 'hecko', 'heggo']


In [None]:
import re
text = "abbbbba"
print(re.findall(r"a.*b", text))  
print(re.findall(r"a.*?b", text)) 

['abbbbb']
['ab']


In [None]:
import re
text = "abbbbba"
print(re.findall(r"a.+b", text)) 
print(re.findall(r"a.+?b", text)) 

['abbbbb']
['abb']


In [8]:
import re
text = "aaabaaa"
print(re.findall(r"a{1,3}", text))  
print(re.findall(r"a{1,3}?", text))  

['aaa', 'aaa']
['a', 'a', 'a', 'a', 'a', 'a']


### Trying Assignment Exercises

In [29]:
# regex for matching any 3 digit number
import re
nums = "3 48 276 1057 8542 19 652 7428 583 91 7042 163 7 8245 382 8080"
numstwo = "MCA234 48 276 BOB1057 8542 19 652 XYZ7428 583 91 7042 163 7 8245 A382 8080"

x = re.findall(r"\b\d{3}\b", nums)
y = re.findall(r"(?<!\d)\d{3}(?!\d)", numstwo)

print(x)
print(y)

['276', '652', '583', '163', '382']
['234', '276', '652', '583', '163', '382']


The `?` in the patterns `(?<!\d)` and `(?!\d)` are part of the lookbehind and lookahead assertions, respectively. These assertions are used to specify conditions that must be true for the pattern to match, but the conditions themselves are not included in the match.

Here's a breakdown of each part:

1. **Negative Lookbehind `(?<!\d)`**:
   - `?<!` is the syntax for a negative lookbehind assertion.
   - `\d` inside the lookbehind asserts that there should not be a digit before the current position.
   - So `(?<!\d)` means "match the position only if it is not preceded by a digit."

2. **Three-Digit Number `\d{3}`**:
   - `\d` matches any digit (0-9).
   - `{3}` specifies that exactly three digits should be matched.

3. **Negative Lookahead `(?!\d)`**:
   - `?!` is the syntax for a negative lookahead assertion.
   - `\d` inside the lookahead asserts that there should not be a digit after the current position.
   - So `(?!\d)` means "match the position only if it is not followed by a digit."

When combined, `(?<!\d)\d{3}(?!\d)` matches exactly three-digit numbers that are not part of longer numbers. This ensures that:
- There is no digit immediately before the three-digit number (`(?<!\d)`).
- There is no digit immediately after the three-digit number (`(?!\d)`).

Here's the whole pattern:
- `(?<!\d)`: Negative lookbehind to ensure no digit before.
- `\d{3}`: Exactly three digits.
- `(?!\d)`: Negative lookahead to ensure no digit after.

I hope this helps clarify the role of the `?` in the lookbehind and lookahead assertions! Let me know if there's anything else you'd like to know.