# documentation link:<br>
[https://docs.python.org/3/library/re.html](https://docs.python.org/3/library/re.html)

## special charaters
### .,*(){}-?\
* 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" 
* \ if you want to treat any special character as normal character in pattern then you you use "\" befor any special character

In [15]:
x = """
Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.
"""

import re
re.findall(r'\.',x)

['.', '.', '.', '.']

In [21]:
x = """pakistan zinda bad
poland is wounderfull country
united state is powerfull country in the world
"""
x

'pakistan zinda bad\npoland is wounderfull country\nunited state is powerfull country in the world\n'

In [22]:
re.findall('^p.*',x)

['pakistan zinda bad']

In [27]:
re.findall('^p.*',x,re.MULTILINE)

['pakistan zinda bad', 'poland is wounderfull country']

In [29]:
x = """pakistan zinda bad:
poland is wounderfull country:
    hello world
united state is powerfull country in the world
    abcd
"""

re.findall('.*:$',x, re.MULTILINE)

['pakistan zinda bad:', 'poland is wounderfull country:']

In [30]:
re.findall('^    .*',x, re.MULTILINE)

['    hello world', '    abcd']

In [34]:
x = """pakistan zinda bad:
poland is wounderfull country:
    hello world
united state is powerfull country in the world
    abcd
"""
re.findall('pa*',x)

['pa', 'p', 'p']

In [35]:
x = """pakistan zinda bad:
poland is wounderfull country:
    hello world
united state is powerfull country in the world
    abcd
"""
re.findall('pa+',x)

['pa']

In [36]:
x = """pakistan zinda bad:
poland is wounderfull country:
    hello world
united state is powerfull country in the world
    abcd
"""
re.findall('pa?',x)

['pa', 'p', 'p']

In [50]:
x = """

<!DOCTYPE html>


    <html itemscope itemtype="https://schema.org/QAPage" class="html__responsive " lang="en">

    <head>

        <title>python regex findall and multiline - Stack Overflow</title>
    </head>
    <body class="question-page unified-theme">



    </body>
    </html>

"""
print(re.findall("<.*>",x))
len(re.findall("<.*>",x))

['<!DOCTYPE html>', '<html itemscope itemtype="https://schema.org/QAPage" class="html__responsive " lang="en">', '<head>', '<title>python regex findall and multiline - Stack Overflow</title>', '</head>', '<body class="question-page unified-theme">', '</body>', '</html>']


8

In [51]:
print(re.findall("<.*?>",x))
len(re.findall("<.*?>",x))

['<!DOCTYPE html>', '<html itemscope itemtype="https://schema.org/QAPage" class="html__responsive " lang="en">', '<head>', '<title>', '</title>', '</head>', '<body class="question-page unified-theme">', '</body>', '</html>']


9

In [53]:
print(re.findall("<.+?>",x))
len(re.findall("<.+>",x))

['<!DOCTYPE html>', '<html itemscope itemtype="https://schema.org/QAPage" class="html__responsive " lang="en">', '<head>', '<title>', '</title>', '</head>', '<body class="question-page unified-theme">', '</body>', '</html>']


8

In [54]:
print(re.findall("<.??>",x))
len(re.findall("<.??>",x))

[]


0

In [56]:
x = """
aaa
aaaaaa
bbbb aaaaaaaaa
cccc aaaaaaaaaaaa
"""
re.findall('a{3}',x)

['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']

In [58]:
x = """
aaa
aaaaaa
bbbb aaaaaaaaa
cccc aaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
"""
re.findall('a{5,12}',x)

['aaaaaa', 'aaaaaaaaa', 'aaaaaaaaaaaa', 'aaaaaaaaaaaa', 'aaaaaaaaaaaa']

In [62]:
# \b = boundary mean word starting and with space seprator
x = """
aaa
aaaaaa
bbbb aaaaaaaaa
cccc aaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
"""
re.findall(r'\ba{5,12}\b',x)

['aaaaaa', 'aaaaaaaaa', 'aaaaaaaaaaaa']

In [67]:
# \b = boundary mean word starting and with space seprator
x = """
aaaaaa

"""
re.findall('a{3,5}',x)

['aaaaa']

In [76]:
# \b = boundary mean word starting and with space seprator
x = """
aaaaaaa

"""
re.findall('a{3,5}?',x)

['aaa', 'aaa']

In [79]:
x = """
Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special * sequence; special sequences are discussed below.
"""

re.findall('\*',x)

['*', '*']

In [85]:
x = """
pakistan zinda bad
pai
pia
"""

re.findall("[pia]+",x)

['pa', 'i', 'a', 'i', 'a', 'a', 'pai', 'pia']

In [87]:
x = """
pakistan zinda bad 007
pai 178
pia 90
"""

re.findall("[0-9]+",x)

['007', '178', '90']

In [88]:
x = """
pakistan zinda bad 007
pai 178
pia 90
"""

re.findall("[0-9][0-9]",x)

['00', '17', '90']

In [89]:
x = """
pakistan zinda bad 007
pai 178
pia 90
"""

re.findall("[0-9]{2}",x)

['00', '17', '90']

In [93]:
x = """
pakistan zinda bad 007
pai 178
pia 90 
"""

re.findall(r"\b[0-9]{3}\b",x)

['007', '178']

In [96]:
x = """
Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special * sequence; special sequences are discussed below.
"""

re.findall('[*.?]',x)

['*', '?', '*', '.']

In [103]:
x = """
pakistan zinda bad
we love our country
"""
re.findall(r'pak.*\n|count+',x)

['pakistan zinda bad\n', 'count']

In [104]:
x = """
12:02:17 From asghar ibraheem CNC-012105 To Everyone : sublist
12:05:08 From syed ibad ur rehman To Everyone : PIAIC-166678
12:05:11 From Abdul Qadar To Everyone : PIAIC172941
"""

re.findall('From.* To Everyone',x)

['From asghar ibraheem CNC-012105 To Everyone',
 'From syed ibad ur rehman To Everyone',
 'From Abdul Qadar To Everyone']

In [105]:
x = """
12:02:17 From asghar ibraheem CNC-012105 To Everyone : sublist
12:05:08 From syed ibad ur rehman To Everyone : PIAIC-166678
12:05:11 From Abdul Qadar To Everyone : PIAIC172941
"""

re.findall('From(.*) To Everyone',x)

[' asghar ibraheem CNC-012105', ' syed ibad ur rehman', ' Abdul Qadar']

In [107]:
x = """
12:02:17 From asghar ibraheem CNC-012105 To Everyone : sublist
12:05:08 From syed ibad ur rehman To Everyone : PIAIC-166678
12:05:11 From Abdul Qadar To Everyone : PIAIC172941
"""

re.findall('From(.*) To Everyone.*([0-9]{5,6})',x)

[(' syed ibad ur rehman', '66678'), (' Abdul Qadar', '72941')]

In [111]:
x = """
12:02:17 From asghar ibraheem CNC-012105 To Everyone : sublist
12:05:08 From syed ibad ur rehman To Everyone : PIAIC-166678
12:05:11 From Abdul Qadar To Everyone : PIAIC172941
"""

re.findall('(...)',x)

['12:',
 '02:',
 '17 ',
 'Fro',
 'm a',
 'sgh',
 'ar ',
 'ibr',
 'ahe',
 'em ',
 'CNC',
 '-01',
 '210',
 '5 T',
 'o E',
 'ver',
 'yon',
 'e :',
 ' su',
 'bli',
 '12:',
 '05:',
 '08 ',
 'Fro',
 'm s',
 'yed',
 ' ib',
 'ad ',
 'ur ',
 'reh',
 'man',
 ' To',
 ' Ev',
 'ery',
 'one',
 ' : ',
 'PIA',
 'IC-',
 '166',
 '678',
 '12:',
 '05:',
 '11 ',
 'Fro',
 'm A',
 'bdu',
 'l Q',
 'ada',
 'r T',
 'o E',
 'ver',
 'yon',
 'e :',
 ' PI',
 'AIC',
 '172',
 '941']

In [112]:
x = """
12:02:17 From asghar ibraheem CNC-012105 To Everyone : sublist
12:05:08 From syed ibad ur rehman To Everyone : PIAIC-166678
12:05:11 From Abdul Qadar To Everyone : PIAIC172941
"""

re.findall('(?...)',x)

error: unknown extension ?. at position 1

# class 10 

{m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 'a' characters, while a{3,5}? will only match 3 characters.

In [3]:
import re

x = 'pakistan aaaaaa'
re.findall('a{3,5}?',x)

['aaa', 'aaa']

In [4]:

x = 'pakistan aaaaaa'
re.findall('a{3,5}',x)

['aaaaa']

In [6]:
x = 'pakistan * abc * xyz.'

re.findall('\*',x)

['*', '*']

In [7]:
re.findall('\.',x)

['.']

In [8]:
re.findall('[*.]',x)

['*', '*', '.']

In [10]:
x = 'abcdefghijklmnopqrstuvwxyz 23 a32 33 . - 23 - 233'

re.findall('[a-z]',x)

['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 'a']

In [11]:
x = 'abcdefghijklmnopqrstuvwxyz 23 a32 33 . - 23 - 233'

re.findall('[a\-z]',x)

['a', 'z', 'a', '-', '-']

In [13]:
x = 'abcdefghijklmnopqrstuvwxyz 23 a32 33 . - 23 - 233'

re.findall('[-az]',x)

['a', 'z', 'a', '-', '-']

In [14]:
x = 'abcdefghijklmnopqrstuvwxyz 23 a32 33 . - 23 - 233'

re.findall('[az-]',x)

['a', 'z', 'a', '-', '-']

Special characters lose their special meaning inside sets. For example, [(+*)] will match any of the literal characters '(', '+', '*', or ')'.

In [18]:
x = "a . b, c* d- x+ (aa) [ssd]"

re.findall('[().+*\[\]]',x)

['.', '*', '+', '(', ')', '[', ']']

In [20]:
x = """Special characters lose their special meaning inside sets. For example, [(+*)] will match any of the literal characters '(', '+', '*', or ')'."""

re.findall("\w+",x)

['Special',
 'characters',
 'lose',
 'their',
 'special',
 'meaning',
 'inside',
 'sets',
 'For',
 'example',
 'will',
 'match',
 'any',
 'of',
 'the',
 'literal',
 'characters',
 'or']

In [22]:
x = """Special characters lose their special meaning inside sets. For example, [(+*)] will match any of the literal characters '(', '+', '*', or ')'."""

print(len(re.findall("\s+",x)))
re.findall("\s+",x)

22


[' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ']

# ^ vs [^]

In [24]:
x = "x = 'abcdefghijklmnopqrstuvwxyz 23 a32 33 . - 23 - 2337 20 5 8'"
re.findall("[^38a-z]+",x)

[" = '", ' 2', ' ', '2 ', ' . - 2', ' - 2', '7 20 5 ', "'"]

In [25]:
x = "x = 'abcdefghijklmnopqrstuvwxyz 23 a32 33 . - 23 - 2337 20 5 8'"
re.findall('a|b',x)

['a', 'b', 'a']

(?P<name>...)

In [29]:
x = "0315-29638211 cnic 4220175767604"

re.findall("([0-9]{4}-[0-9]{7}).*([0-9]{13})",x)

[('0315-2963821', '4220175767604')]

In [33]:
x = "0315-29638211 cnic 4220175767604"

a1 = re.search("([0-9]{4}-[0-9]{7}).*([0-9]{13})",x)
print(a1.group(1))
print(a1.group(2))

0315-2963821
4220175767604


In [39]:
x = "0315-29638211 cnic 4220175767604"

a1 = re.search("(?P<mob>[0-9]{4}-[0-9]{7})",x)
a1.group('mob')

'0315-2963821'

In [41]:
x = "0315-29638211 cnic 4220175767604"

a1 = re.search("(?P<mob>[0-9]{4}-[0-9]{7}).*(?P<cnic>[0-9]{13})",x)
print(a1.group('mob'))
print(a1.group('cnic'))

0315-2963821
4220175767604


In [43]:
x = "0315-29638211 cnic 4220175767604"
p1 = re.compile("(?P<mob>[0-9]{4}-[0-9]{7})")

a1 = re.search("(?P=).*(?P<cnic>[0-9]{13})",x)
print(a1.group('mob'))
print(a1.group('cnic'))

error: unknown group name 'p1' at position 4

In [50]:
import re
m = re.search('(?<=...)def', 'abcdefe')
m.group(0)

'def'

In [51]:
m = re.search(r'(?<=-)\w+', 'spam-egg')
m.group(0)

'egg'

# \b

In [54]:
x = """
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string. This means that r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'.
"""
re.findall(r'\bstring.*?beginning\b',x)


['string, but only at the beginning']

In [56]:
x = "0315-29638211 cnic 4220175767604"
p1 = re.compile("(?P<mob>[0-9]{4}-[0-9]{7})")

a1 = re.findall(r"\d{13}",x)
a1

['4220175767604']