# Regual Expressions (Regex) CheatSheet

## MetaCharacters
```
.       - Any Character Except New Line
\d      - Digit (0-9)
\D      - Not a Digit (0-9)
\w      - Word Character (a-z, A-Z, 0-9, _)
\W      - Not a Word Character
\s      - Whitespace (space, tab, newline)
\S      - Not Whitespace (space, tab, newline)

\b      - Word Boundary
\B      - Not a Word Boundary
^       - Beginning of a String
$       - End of a String

[]      - Matches Characters in brackets
[^ ]    - Matches Characters NOT in brackets
|       - Either Or
( )     - Group
```

## Quantifiers
```
*       - 0 or More
+       - 1 or More
?       - 0 or One
{3}     - Exact Number
{3,4}   - Range of Numbers (Minimum, Maximum)

```

In [1]:
import re

In [2]:
sample = '''
abcdefghijklmnopqurtuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
Ha HaHa
MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )
coreyms.com
321-555-4321
123.555.1234
123*555*1234
800-555-1234
900-555-1234
chiragjuneja.com
Mr. Schafer
Mr Smith
Ms Davis
Mrs. Robinson
Mr. T
cat
mat
pat
bat
'''

In [3]:
urls = '''
https://www.google.com
http://coreyms.com
https://youtube.com
https://www.nasa.gov
'''

### Find Matches Literal String

In [4]:
def get_matches(pattern, text):
    matches = pattern.finditer(text)
    return [match for match in matches]

In [5]:
pattern = re.compile(r'abc')
get_matches(pattern, sample)

[<re.Match object; span=(1, 4), match='abc'>]

### MetaCharacters (Need to be escaped)

In [6]:
pattern = re.compile(r'chiragjuneja\.com')
get_matches(pattern, sample)

[<re.Match object; span=(216, 232), match='chiragjuneja.com'>]

### Match Digits

In [7]:
pattern = re.compile(r'\d\d\d.\d\d\d.\d\d\d')
get_matches(pattern, sample)

[<re.Match object; span=(151, 162), match='321-555-432'>,
 <re.Match object; span=(164, 175), match='123.555.123'>,
 <re.Match object; span=(177, 188), match='123*555*123'>,
 <re.Match object; span=(190, 201), match='800-555-123'>,
 <re.Match object; span=(203, 214), match='900-555-123'>]

### Match Characters in Brackets

In [8]:
pattern = re.compile(r'\d\d\d[-.]\d\d\d[-.]\d\d\d')
get_matches(pattern, sample)

[<re.Match object; span=(151, 162), match='321-555-432'>,
 <re.Match object; span=(164, 175), match='123.555.123'>,
 <re.Match object; span=(190, 201), match='800-555-123'>,
 <re.Match object; span=(203, 214), match='900-555-123'>]

In [9]:
pattern = re.compile(r'[89]00[-.]\d\d\d[-.]\d\d\d')
get_matches(pattern, sample)

[<re.Match object; span=(190, 201), match='800-555-123'>,
 <re.Match object; span=(203, 214), match='900-555-123'>]

### Match Characters Not in Brackets

In [10]:
pattern = re.compile(r'[^b]at')
get_matches(pattern, sample)

[<re.Match object; span=(283, 286), match='cat'>,
 <re.Match object; span=(287, 290), match='mat'>,
 <re.Match object; span=(291, 294), match='pat'>]

### Quantifiers

In [11]:
pattern = re.compile(r'\d{3}.\d{3}.\d{3}')
get_matches(pattern, sample)

[<re.Match object; span=(151, 162), match='321-555-432'>,
 <re.Match object; span=(164, 175), match='123.555.123'>,
 <re.Match object; span=(177, 188), match='123*555*123'>,
 <re.Match object; span=(190, 201), match='800-555-123'>,
 <re.Match object; span=(203, 214), match='900-555-123'>]

In [12]:
pattern = re.compile(r'Mr\.?\s[A-Z]\w*')
get_matches(pattern, sample)

[<re.Match object; span=(233, 244), match='Mr. Schafer'>,
 <re.Match object; span=(245, 253), match='Mr Smith'>,
 <re.Match object; span=(277, 282), match='Mr. T'>]

In [13]:
pattern = re.compile(r'M(r|s|rs)\.?\s[A-Z]\w*')
get_matches(pattern, sample)

[<re.Match object; span=(233, 244), match='Mr. Schafer'>,
 <re.Match object; span=(245, 253), match='Mr Smith'>,
 <re.Match object; span=(254, 262), match='Ms Davis'>,
 <re.Match object; span=(263, 276), match='Mrs. Robinson'>,
 <re.Match object; span=(277, 282), match='Mr. T'>]

In [14]:
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
get_matches(pattern, sample)

[<re.Match object; span=(233, 244), match='Mr. Schafer'>,
 <re.Match object; span=(245, 253), match='Mr Smith'>,
 <re.Match object; span=(254, 262), match='Ms Davis'>,
 <re.Match object; span=(263, 276), match='Mrs. Robinson'>,
 <re.Match object; span=(277, 282), match='Mr. T'>]

In [15]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
get_matches(pattern, urls)

[<re.Match object; span=(1, 23), match='https://www.google.com'>,
 <re.Match object; span=(24, 42), match='http://coreyms.com'>,
 <re.Match object; span=(43, 62), match='https://youtube.com'>,
 <re.Match object; span=(63, 83), match='https://www.nasa.gov'>]

In [16]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
matches = pattern.finditer(urls)
for match in matches:
    print(match.group(2))

google
coreyms
youtube
nasa


In [17]:
pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')
subbed_urls = pattern.sub(r'\2\3',urls)
print(subbed_urls)


google.com
coreyms.com
youtube.com
nasa.gov



### Find All
returns groups 

In [18]:
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s[A-Z]\w*')
pattern.findall(sample)

['Mr', 'Mr', 'Ms', 'Mrs', 'Mr']

In [19]:
pattern = re.compile(r'(Mr|Ms|Mrs)\.?\s([A-Z]\w*)')
pattern.findall(sample)

[('Mr', 'Schafer'),
 ('Mr', 'Smith'),
 ('Ms', 'Davis'),
 ('Mrs', 'Robinson'),
 ('Mr', 'T')]

### Match at the Begining of string 

In [20]:
sentence = 'Start a sentence and then bring it to an end'
pattern = re.compile(r'Start')

print(pattern.match(sentence))

print(re.match(r'sentence',sentence))

<re.Match object; span=(0, 5), match='Start'>
None


### Search for first occurance

In [21]:
sentence = 'Start a sentence and then bring it to an end'

print(re.search(r'sentence',sentence))

<re.Match object; span=(8, 16), match='sentence'>


### Flags

In [22]:
sentence = 'Start a sentence and then bring it to an end'
print(re.match(r'start', sentence))
print(re.match(r'start', sentence, re.IGNORECASE))

None
<re.Match object; span=(0, 5), match='Start'>
