## Regex

- $findall$ - Returns a list containing all matches
- $search$ - Returns a match object if there is match anywhere in the string
- $split$ - Returns a list where the string has been split at each match
- $sub$ - Replaces one or many matches with string


- \s -> space character
- \S -> except space character
- \d -> any digit
- \D -> except digit
- \w -> any word character (alphanumeric [a-zA-Z0-9])
- \W -> any character except word (special characters [^a-zA-Z0-9]) 
- $.$ -> every character

- [] : a set of characters [a-m]
- \ : signals a special sequence (can also be used to escape speical characters "\d")
- $.$ : any character (except newline character) "he..o"
- ^ : starts with "^hello"
- [insert dollar sign] : ends with "world$"
- [insert * sign] : zero or more occorunces " "aix*"
- [insert + sign] : one or more occurence "aix+"
- {} : Exactly the specified number of occurence al{2}
- | : either or "falls|stays"

In [2]:
import re
help(re)

Help on module re:

NAME
    re - Support for regular expressions (RE).

DESCRIPTION
    This module provides regular expression matching operations similar to
    those found in Perl.  It supports both 8-bit and Unicode strings; both
    the pattern and the strings being processed can contain null bytes and
    characters outside the US ASCII range.
    
    Regular expressions can contain both special and ordinary characters.
    Most ordinary characters, like "A", "a", or "0", are the simplest
    regular expressions; they simply match themselves.  You can
    concatenate ordinary characters, so last matches the string 'last'.
    
    The special characters are:
        "."      Matches any character except a newline.
        "^"      Matches the start of the string.
        "$"      Matches the end of the string or just before the newline at
                 the end of the string.
        "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
                 Greedy 

In [7]:
txt = "This is a string where you can find a list of words in the text"
# x = re.findall("\s",txt)
x = re.findall("\S+",txt)
x

['This',
 'is',
 'a',
 'string',
 'where',
 'you',
 'can',
 'find',
 'a',
 'list',
 'of',
 'words',
 'in',
 'the',
 'text']

In [8]:
txt = "How to use search in regular expression"
x = re.search("\s",txt)
x

<re.Match object; span=(3, 4), match=' '>

In [12]:
txt = "How to use search in regular expression"
x = re.search("\s",txt)
#x.start()
x.end()

4

In [23]:
txt = "I got 95% in my exam"
x = re.search("\d+%",txt)
x

<re.Match object; span=(6, 9), match='95%'>

In [24]:
txt = "How to substitute value in regular expression"
x = re.sub("\s","%20",txt)
x

'How%20to%20substitute%20value%20in%20regular%20expression'

In [31]:
pattern = '^\D+'
test_string = "python regular expressions learn with 22 examples"
result = re.match(pattern,test_string)
if result:
    print("Success: ",result)
else:
    print("No success: ",result)

Success:  <re.Match object; span=(0, 38), match='python regular expressions learn with '>


In [32]:
string = 'if you are 2 see that 4th row 5th column and 77 records'
pattern = '\d+'
result = re.findall(pattern,string)
print(result)

['2', '4', '5', '77']


In [33]:
string = 'find the employee details in 40th row 2nd file and 30th column % & *'
pattern = '\d+'
result = re.findall(pattern,string)
print(result)

['40', '2', '30']


In [34]:
string = 'yes, i found in 2nd row 5th column \
and 10 duplicate records $ with %'
pattern = '\W'
result = re.findall(pattern,string)
print(result)

[',', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '$', ' ', ' ', '%']
