# Regular Expressions

1. referred as 'regex' or 'regexp'
2. used for matching strings of text, such as perticular characters, words, or patterns of characters.

# Regular Expression Quick Guide

1. ^ Matches the beginning of a line

2. $ Matches the end of the line

3. . Matches any character

4. \s Matches whitespace

5. \S Matches any non-whitespace character

6. '*' Repeats a character zero or more times

7. *? Repeats a character zero or more times (non-greedy)

8. '+' Repeats a character one or more times

9. +? Repeats a character one or more times (non-greedy)

10. [aeiou] Matches a single character in the listed set

11. [^XYZ] Matches a single character not in the listed set

12. [a-z0-9] The set of characters can include a range

13. ( Indicates where string extraction is to start

14. ) Indicates where string extraction is to end

# Regular Expression with Examples

In Python,
* need to import a library using "import re"
* re.findall() <-- extract portions of a string that match the regex 

In [None]:
import re
re.findall('a','this is a aa and aaa')

In [None]:
 re.findall('a+','this is a aa and aaa')

In [None]:
 re.findall('a.','this is a aa and aaa abc')

In [None]:
#beginning of line 

re.findall('^t','this is a aa and aaa, that')
# re.findall('^t...','this is a aa and aaa, that')

In [None]:
#end of line 

re.findall('t$','this is a aa and aaa, that')
re.findall('...t$','this is a aa and aaa, that')

* abc{2}      matches a string that has ab followed by 2 c
* abc{2,}     matches a string that has ab followed by 2 or more c
* abc{2,5}    matches a string that has ab followed by 2 up to 5 c

In [None]:
# re.findall('abc{2}','a b c abc abcc abccc abcccc')
# re.findall('abc{2,}','a b c abc abcc abccc abcccc')
# re.findall('abc{2,5}','a b c abc abcc abccc abcccc abcccccc')

* a(bc)*      matches a string that has a followed by zero or more copies of the sequence bc
* a(bc){2,5}  matches a string that has a followed by 2 up to 5 copies of the sequence bc

In [None]:
# re.findall('a(bc)','a b c abc abcc abccc abcbc abccbcccbccc')
# re.findall('a(bc){2,5}','a b c abc abcc abccc abcbc abccbcccbccc')
# re.search('a(bc){2,5}','a b c abc abcc abccc abcbc abccbcccbccc abcbc')

OR operator — | or []

In [None]:
# re.findall('(abc)','this is a b c d not abcd')

# re.findall('(a|b|c)','this is a b c d not abcd')

# re.findall('[abc]','this is a b c d not abcd')

# re.findall('[a|b|c]','this is a b c d not abcd')



Character classes — \d \w \s and .

* \d - only numbers
* \D - except number

* \w - select alphanumeric characters & underscore
* \W - except alphanumeric characters & underscore

* \s - select all spaces (includes tabs and line breaks)
* \S - ecxept spaces

In [None]:
# re.findall('\d','a b c abc 1 2 3 123')
# re.findall('\D','a b c abc 1 2 3 123')

In [None]:
# re.findall('[a-zA-Z0-9_]','a b c abc 1 2 3 _ ! @ # $ A B C _')
# re.findall('\w','a b c abc 1 2 3 _ ! @ # $')
# re.findall('\W','a b c abc 1 2 3 _ ! @ # $')

In [None]:
# re.findall('\s','a b c abc 1 2 3 _ ! @ # $')
# re.findall('\S','a b c abc 1 2 3 _ ! @ # $')

? - question mark

In [None]:
#select a with or without b
#re.findall('ab?','a abab abc ac ad cd')




#d(?=r) - matches a d only if is followed by r, but r will not be part of the overall regex match
#re.findall('d(?=r)','third drone')  # - in this it will select d of drone

#(?<=r)d - matches a d only if is preceded by an r, but r will not be part of the overall regex match
#re.findall('(?<=r)d','third drone')  # - in this it will select d of third

#d(?!r) - matches a d only if is not followed by r, but r will not be part of the overall regex 
#re.findall('d(?!r)','third drone')  # - in this it will select d of third

#(?<!r)d - matches a d only if is not preceded by an r, but r will not be part of the overall regex match
#re.findall('(?<!r)d','third drone')  # - in this it will select d of drone




#non greedy - stop after one match

# re.findall('^f.*:','from:from: amit.r@from.com')
# re.findall('^f.*:?','from:from: amit.r@from.com')

^ - start with or not in

In [None]:
# re.findall('^(this)','this is a car')

# re.findall('[^this]','this is a car')

Boundaries — \b and \B -  performs a "whole words only" search

In [None]:
#re.findall('\babc\b','aabc abc abcc')

miscellaneous

In [None]:
re.findall('th[ai][st]', 'this or that')

# References

1. https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285
2. https://regex101.com/r/

#   <------------- thank you  ----------------->