<a href="https://colab.research.google.com/github/Andrewerr/Ghost-WebServer/blob/master/REGEX.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Regular expression (regex or regexp) 
is a sequence of characters that define a search pattern. Usually such patterns are used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. 

https://en.wikipedia.org/wiki/Regular_expression

## Operator Description
* . ---- Any single character except the newline \n.
* ? ---- 0 or 1 occurrence of the pattern on the left
* \+ ---- 1 or more occurrences of the pattern on the left
* \*  ---- 0 or more occurrences of the pattern on the left
* \w  ---- Any number or letter (\ W - everything except a letter or number)
* \d ---- Any digit [0-9] (\ D - everything except the digit)
* \s ---- Any whitespace character (\ S is any non-whitespace character)
* \b ---- Word boundary
* [..] ---- One of the characters in brackets ([^ ..] - any character except those in brackets)
* \ ---- Escaping special characters (\. Means a dot or \+ is a plus sign)
* ^ and $ ---- Beginning and end of line, respectively
* {n, m} ---- From n to m occurrences ({, m} - from 0 to m)
* a|b ---- Matches a or b
* () ---- Groups an expression and returns the found text
* \t, \n, \r ---- Tab, newline, and carriage returns respectively

In [0]:
import re

In [2]:
re.

SyntaxError: ignored

In [0]:
line = 'Hello, my friends! How are you?'

In [4]:
print(re.findall('.',line)) # all symbols

['H', 'e', 'l', 'l', 'o', ',', ' ', 'm', 'y', ' ', 'f', 'r', 'i', 'e', 'n', 'd', 's', '!', ' ', 'H', 'o', 'w', ' ', 'a', 'r', 'e', ' ', 'y', 'o', 'u', '?']


In [5]:
print(re.findall('\w',line))

['H', 'e', 'l', 'l', 'o', 'm', 'y', 'f', 'r', 'i', 'e', 'n', 'd', 's', 'H', 'o', 'w', 'a', 'r', 'e', 'y', 'o', 'u']


In [6]:
print(re.findall('\w*',line))

['Hello', '', '', 'my', '', 'friends', '', '', 'How', '', 'are', '', 'you', '', '']


In [7]:
print(re.findall('\w+',line))

['Hello', 'my', 'friends', 'How', 'are', 'you']


In [17]:
print(re.findall('\w+',line))

['Hello', 'my', 'friends', 'How', 'are', 'you']


In [25]:
print(re.findall('\w+\?$',line))

['you?']


In [0]:
print(re.findall('\w\w',line))

In [0]:
with_date = 'Today is 19.09.2019 Thursday'

In [0]:
re.findall('\d{2}\.\d{2}\.\d{4}',with_date)

### Assigments

In [42]:
emails = 'me@me.me, you@you.hse.ru? \
qwerty@qwerty.com! and asdfg.asdfdg@company.com, newaccount11@gmail.com'
print(re.findall("([A-Z|a-z|0-9|\w|\.]+@[A-Z|a-z|0-9|\w|\.]+)+",emails))

['me@me.me', 'you@you.hse.ru', 'qwerty@qwerty.com', 'asdfg.asdfdg@company.com', 'newaccount11@gmail.com']


In [0]:
re.findall('@me',emails)

In [95]:
dates = 'Days 01.01.1901 and 2019-09-19! New 10 dates are here: \
errors at host 127.0.0.1 at 23.12.2018 and 23.12.2018T19:00:00'
print(re.findall("([\d{2}\d{4}]+[\.\-]\d{2}[\.\-][\d{2}\d{4}]+)+",dates))

['01.01.1901', '2019-09-19', '23.12.2018', '23.12.2018']


### sub all vowels to '*'

In [99]:
regex_is_vowels = 'is a sequence of characters that define a search pattern. \
Usually such patterns are used by string searching algorithms for "find"\
or "find and replace" operations on strings, or for input validation.\
It is a technique developed in theoretical computer science and formal language theory.'
re.sub(r'[aeioyu]','*',regex_is_vowels)

'*s * s*q**nc* *f ch*r*ct*rs th*t d*f*n* * s**rch p*tt*rn. Us**ll* s*ch p*tt*rns *r* *s*d b* str*ng s**rch*ng *lg*r*thms f*r "f*nd"*r "f*nd *nd r*pl*c*" *p*r*t**ns *n str*ngs, *r f*r *np*t v*l*d*t**n.It *s * t*chn*q** d*v*l*p*d *n th**r*t*c*l c*mp*t*r sc**nc* *nd f*rm*l l*ng**g* th**r*.'

### split by punctuation marks

In [107]:
regex_split = 'sad%ksakjd:sadsa;SDSD123!11242:xxxx?s,!'
re.split("[\.!,;:\?]+",regex_split)

['sad%ksakjd', 'sadsa', 'SDSD123', '11242', 'xxxx', 's', '']

### if valid phone number print YES else NO

In [126]:
is_phones = ['+71231231231','89999999999',\
             '8-923-123-21-23','8999131231981230214','213X123s213',\
            '+79192318123212','+7-92873-12331','+19271312341','+7881221233o1']
for phone in is_phones:
    r=re.match("\+?(\d\-|\d){11}$",phone)
    if r:
      print("%s is valid"%phone)
    else:
      print("%s is invalid"%phone)

+71231231231 is valid
89999999999 is valid
8-923-123-21-23 is valid
8999131231981230214 is invalid
213X123s213 is invalid
+79192318123212 is invalid
+7-92873-12331 is valid
+19271312341 is valid
+7881221233o1 is invalid
