# Learn basic regex
Use this massive FAQ at [Stack Overflow](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075).

Use [Regex101](https://regex101.com/) for live testing of a regex

# Python specific info
use the `r` at the start of the pattern string, that designates a python `raw` string which passes through backslashes

`re.findall` finds all

## Search
`re.search` finds just first instance. 

If match is found, a **matchobject** is returned else, **None** is returned

In [59]:
import re

string = 'an example word:cat!! another word:dog'
match = re.search(r'(word):(\w+)', string, re.IGNORECASE)

if match:                      
    print('Found... '+ match.group())
else:
    print('Did not find')

Found... word:cat


In [60]:
match

<re.Match object; span=(11, 19), match='word:cat'>

In [61]:
type(match)

re.Match

In [62]:
# starting index of the matching term
match.start()

11

In [63]:
# end index of the matching term
match.end()

19

In [64]:
match.span()

(11, 19)

In [70]:
# group(0) is the full match and works same as group()
print(match.group())
print(match.group(0))
print(match.group(1))
print(match.group(2))

word:cat
word:cat
word
cat


## Splitting with regex

In [37]:
# Term to split on
split_term = '@'

phrase = 'What is the domain name of someone with the email: hello@gmail.com'

# Split the phrase
re.split(split_term, phrase)

['What is the domain name of someone with the email: hello', 'gmail.com']

## Findall

In [43]:
demotext = '''purple alice@google.com, blah monkey bob@abc.com blah dishwasher'''

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w.-]+@[\w.-]+', demotext)
## ['alice@google.com', 'bob@abc.com']

print(len(emails), 'emails found.')

for email in emails:
    print(email)

2 emails found.
alice@google.com
bob@abc.com


In [47]:
string='''
hi there Abhinav, 21 years old, 
Bill Gates 82 years old, 
Jezz Bezos 53 yo
'''
ages = re.findall(r'\d{1,3}',string)
names = re.findall(r'[A-Z][a-z]*',string)
print(ages, names)

['21', '82', '53'] ['Abhinav', 'Bill', 'Gates', 'Jezz', 'Bezos']


In [45]:
message = 'call me at 888-444-1344 tom or at 432-334-4444 today'
phonereg = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
matchobject = phonereg.findall(message)
print(matchobject)

['888-444-1344', '432-334-4444']


# Replace
`re.sub(pat, replacement, str)` -- returns new string with all replacements,

In [58]:
string = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher' 

## \g<0> is the full match, \g<1> is group(1), \g<2> group(2) in the replacement

print(re.sub(r'([\w.-]+)@([\w.-]+)', r'\g<1>@yo-yo-dyne.com', string))

purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher


# Long Regex can be compiled separately
A `verbose regular expression` is different from a compact regular expression in two ways:

1. Whitespace is ignored. Spaces, tabs, and carriage returns are not matched as spaces, tabs, and carriage returns. They're not matched at all. (If you want to match a space in a verbose regular expression, you'll need to escape it by putting a backslash in front of it.)

2. Comments are ignored. A comment in a verbose regular expression is just like a comment in Python code: it starts with a # character and goes until the end of the line. In this case it's a comment within a multi-line string instead of within your source code, but it works the same wa

In [71]:
import re, pyperclip

# Create a regex for phone numbers
phoneRegex = re.compile(r'''
# 133-544-3333, 444-44444, (432)-445-8888, 555-0000 ext 23455, ext. 12345, x12354
(
((\d{3})|(\(\d{3}\)))?    # area code (optionl)
(\s|-|\.)                 # first separator
(\d{3})                   # first 3 digits
(\s|-|\.)                 # separator
(\d{4})                   # last 4 digits
(((ext(\.)?\s)|x)         # extension-word part (optional)
(\d{2,5}))?               # extension-number part (optional)
)
''',re.VERBOSE|re.MULTILINE)

# create a regex for emails
emailRegex = re.compile(r'''
[a-zA-Z0-9-.+]+           # name part
@                         # @
[a-zA-Z0-9-.+]+           # domain-name part
''', re.VERBOSE|re.MULTILINE)

# get the text off the clipboard
# text = pyperclip.paste()
text = '''No Starch Press, Inc.
245 8th Street
San Francisco, CA 94103 USA
Phone: 800.420.7240 or +1 415.863.9900 (9 a.m. to 5 p.m., M-F, PST)
Fax: +1 415.863.9950

Reach Us by Email
General inquiries: info@nostarch.com
Media requests: media@nostarch.com
Academic requests: academic@nostarch.com (Please see this page for academic review requests)
Help with your order: info@nostarch.com'''

# Extract the email/phone from this text
extractedPhone = phoneRegex.findall(text,re.IGNORECASE)
extractedEmail = emailRegex.findall(text,re.IGNORECASE)

# extractedPhone is a list with tuple of all matched data [('full match','group 1','g2'...),('full match','group 1'...),...]
# so we run a for loop to take the first value (the full match) of every list item and append to allPhoneNumbers
allPhoneNumbers = []
for phoneNumbers in extractedPhone:
    allPhoneNumbers.append(phoneNumbers[0])

results = '\n'.join(allPhoneNumbers)+'\n'+'\n'.join(extractedEmail)
# pyperclip.copy(results)
print(results)

800.420.7240
415.863.9900
415.863.9950
info@nostarch.com
media@nostarch.com
academic@nostarch.com
info@nostarch.com


In [74]:
allPhoneNumbers = []
for phoneNumbers in extractedPhone:
    #allPhoneNumbers.append(phoneNumbers[0])
    print(phoneNumbers)

extractedPhone

('800.420.7240', '800', '800', '', '.', '420', '.', '7240', '', '', '', '', '')
('415.863.9900', '415', '415', '', '.', '863', '.', '9900', '', '', '', '', '')
('415.863.9950', '415', '415', '', '.', '863', '.', '9950', '', '', '', '', '')


[('800.420.7240',
  '800',
  '800',
  '',
  '.',
  '420',
  '.',
  '7240',
  '',
  '',
  '',
  '',
  ''),
 ('415.863.9900',
  '415',
  '415',
  '',
  '.',
  '863',
  '.',
  '9900',
  '',
  '',
  '',
  '',
  ''),
 ('415.863.9950',
  '415',
  '415',
  '',
  '.',
  '863',
  '.',
  '9950',
  '',
  '',
  '',
  '',
  '')]

# Using regex to check phone number

In [88]:
phone_check = re.compile(r'[^0-9\s\-()+]')

phone = input ("Please, enter your phone: ")

while phone_check.search(phone):
    print("Please enter your phone correctly!")
    phone = input ("Please, enter your phone: ")

Please, enter your phone: fsa
Please enter your phone correctly!
Please, enter your phone: +91-9535059840


# Using regex on a file
```python
# Open file
f = open('test.txt', 'r')

# Feed the file text into findall(); it returns a list of all the found strings
strings = re.findall(r'some pattern', f.read())
```