# Regular expression (regex or regexp) 
is a sequence of characters that define a search pattern. Usually such patterns are used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. 

https://en.wikipedia.org/wiki/Regular_expression

## Operator Description
* . ---- Any single character except the newline \n.
* ? ---- 0 or 1 occurrence of the pattern on the left
* \+ ---- 1 or more occurrences of the pattern on the left
* \*  ---- 0 or more occurrences of the pattern on the left
* \w  ---- Any number or letter (\ W - everything except a letter or number)
* \d ---- Any digit [0-9] (\ D - everything except the digit)
* \s ---- Any whitespace character (\ S is any non-whitespace character)
* \b ---- Word boundary
* [..] ---- One of the characters in brackets ([^ ..] - any character except those in brackets)
* \ ---- Escaping special characters (\. Means a dot or \+ is a plus sign)
* ^ and $ ---- Beginning and end of line, respectively
* {n, m} ---- From n to m occurrences ({, m} - from 0 to m)
* a|b ---- Matches a or b
* () ---- Groups an expression and returns the found text
* \t, \n, \r ---- Tab, newline, and carriage returns respectively

In [1]:
%load_ext nb_black

<IPython.core.display.Javascript object>

In [2]:
import re

<IPython.core.display.Javascript object>

In [3]:
line = "Hello, HSE and friends! 9How7 789 are you?"

<IPython.core.display.Javascript object>

In [4]:
?re.findall

<IPython.core.display.Javascript object>

In [5]:
print(re.findall(".", line))  # all symbols

['H', 'e', 'l', 'l', 'o', ',', ' ', 'H', 'S', 'E', ' ', 'a', 'n', 'd', ' ', 'f', 'r', 'i', 'e', 'n', 'd', 's', '!', ' ', '9', 'H', 'o', 'w', '7', ' ', '7', '8', '9', ' ', 'a', 'r', 'e', ' ', 'y', 'o', 'u', '?']


<IPython.core.display.Javascript object>

In [6]:
print(re.findall("\d+", line))

['9', '7', '789']


<IPython.core.display.Javascript object>

In [7]:
print(re.findall("^\w+", line))

['Hello']


<IPython.core.display.Javascript object>

In [8]:
print(re.sub("\w+\?$", "<-END->", line))

Hello, HSE and friends! 9How7 789 are <-END->


<IPython.core.display.Javascript object>

In [9]:
print(re.findall("[a-gA-Z0-9]+", line))

['He', 'HSE', 'a', 'd', 'f', 'e', 'd', '9H', '7', '789', 'a', 'e']


<IPython.core.display.Javascript object>

In [10]:
with_date = "Today is 18.09.2021 Thursday 18-09-21 2021-10-12"

<IPython.core.display.Javascript object>

In [11]:
re.findall("\d{2,4}[\.-]\d{2}[\.-](?:\d{2})+|Thursday", with_date)

['18.09.2021', 'Thursday', '18-09-21', '2021-10-12']

<IPython.core.display.Javascript object>

### Assigments

In [12]:
emails = "me@me.me, you@you.hse.ru? \
qwerty@qwerty.com! and asdasd@asdsd asdfg.asdfdg@company.com, newaccount11@gmail.com"

<IPython.core.display.Javascript object>

In [13]:
re.findall("(?:\w+\.*)*[@]\w+(?:\.\w+)+", emails)

['me@me.me',
 'you@you.hse.ru',
 'qwerty@qwerty.com',
 'asdfg.asdfdg@company.com',
 'newaccount11@gmail.com']

<IPython.core.display.Javascript object>

In [None]:
dates = 'Days 01.01.1901 and 2019-09-19! New 10 dates are here: \
errors at host 127.0.0.1 at 23.12.2018 and 23.12.2018T19:00:00'

### sub all vowels to '*'

In [14]:
regex_is_vowels = 'is a sequence of characters that define a search pattern. \
Usually such patterns are used by string searching algorithms for "find"\
or "find and replace" operations on strings, or for input validation.\
It is a technique developed in theoretical computer science and formal language theory.'
re.sub(r"[auoeyiAUOEYI]", "*", regex_is_vowels)

'*s * s*q**nc* *f ch*r*ct*rs th*t d*f*n* * s**rch p*tt*rn. *s**ll* s*ch p*tt*rns *r* *s*d b* str*ng s**rch*ng *lg*r*thms f*r "f*nd"*r "f*nd *nd r*pl*c*" *p*r*t**ns *n str*ngs, *r f*r *np*t v*l*d*t**n.*t *s * t*chn*q** d*v*l*p*d *n th**r*t*c*l c*mp*t*r sc**nc* *nd f*rm*l l*ng**g* th**r*.'

<IPython.core.display.Javascript object>

### split by punctuation marks

In [16]:
regex_split = "sad%ksakjd:sadsa;SDSD123!11242:xxxx?s,!"
re.split("!", regex_split)

['sad%ksakjd:sadsa;SDSD123', '11242:xxxx?s,', '']

<IPython.core.display.Javascript object>

### if valid phone number print YES else NO

In [17]:
seven = re.compile("7")

<IPython.core.display.Javascript object>

In [18]:
re.findall(seven, "71231231231")

['7']

<IPython.core.display.Javascript object>

In [19]:
is_phones = [
    "+71231231231",
    "89999999999",
    "8-923-123-21-23",
    "8999131231981230214",
    "213X123s213",
    "+79192318123212",
    "+7-92873-12331",
    "+19271312341",
    "+7881221233o1",
]
for phone in is_phones:
    print(phone)
# re.match

+71231231231
89999999999
8-923-123-21-23
8999131231981230214
213X123s213
+79192318123212
+7-92873-12331
+19271312341
+7881221233o1


<IPython.core.display.Javascript object>