<h3>Quantifiers</h3>

In the examples, so far, each of the meta characters and special characters we have studied, matches just one character at a time.  In this notebook, we will study a special type of metacharacter (a quantifier) that immediately follows a portion of a regex (referred to as an element) and indicates how many times that portion must occur for the match to succeed. The figure below lists the different quantifiers we will look at.

![Quantifiers.png](attachment:Quantifiers.png)

In [1]:
import re

In [2]:
'''
In this example, the <regex> will match each alphabet and return a list with as many elements
as there are matches.
'''
string = "line"
print('With [a-z]', re.findall('[a-z]', string))

With [a-z] ['l', 'i', 'n', 'e']


In [3]:
'''
Similarly, in this example, the <regex> will match each digit and return a list with as many elements
as there are matches.
'''
string = "4S285D"
print('With \d ', re.findall('\d', string))

With \d  ['4', '2', '8', '5']


In [4]:
'''
We can repeat an element multiple times as many times that we wish the match to occur.
In the example below, a match will occur for every two lower case alphabets occurring together.
'''
string = "lines"
print(re.findall('[a-z][a-z]', string))

['li', 'ne']


In [5]:
'''
Similarly, in this example, the <regex> will match any occurrence of two digits occurring together.
''' 
string = "49S258D"
print(re.findall('\d\d', string))

['49', '25']


In [6]:
'''
Quantifiers are used to match a preceding element multiple times.

* specifies that the preceding element should occur zero or more times.
'''
string1 = "9ab567A"
string2 = "a9784bnghdjB"
string3 = "_678knl8L"
print('String 1', re.findall('[a-z]*\d', string1))
print('String 2', re.findall('[a-z]*\d', string2))
print('String 3', re.findall('[a-z]*\d', string3))

String 1 ['9', 'ab5', '6', '7']
String 2 ['a9', '7', '8', '4']
String 3 ['6', '7', '8', 'knl8']


In [7]:
'''
Quantifiers are used to match a preceding element multiple times.

+ specifies that the preceding element should occur one or more times.
'''
string1 = "9ab567A"
string2 = "a9784ababajB"
string3 = "_678knl8L"
print('String 1', re.findall('[a-z]+\d', string1))
print('String 2', re.findall('[a-z]+\d', string2))
print('String 3', re.findall('[a-z]+\d', string3))

String 1 ['ab5']
String 2 ['a9']
String 3 ['knl8']


In [2]:
'''
Quantifiers are used to match a preceding element multiple times.

? specifies that the preceding element should occur zero or one time.
'''
string1 = "9ab567A"
string2 = "a9784ababajB"
string3 = "_678knl8L"
print('String 1', re.findall('[a-z]?\d', string1))
print('String 2', re.findall('[a-z]?\d', string2))
print('String 3', re.findall('[a-z]*\d', string3))

String 1 ['9', 'b5', '6', '7']
String 2 ['a9', '7', '8', '4']
String 3 ['6', '7', '8', 'l8']


In [4]:
'''
{n} specifies that the preceding element should occur exactly n times
'''
string1 = "9ab567A"
string2 = "a9784ababajB"
string3 = "_678knl8L"
print('String 1', re.findall('[a-z]{2}\d', string1))
print('String 2', re.findall('[a-z]{2}\d', string2))
print('String 3', re.findall('[a-z]{2}\d', string3))

String 1 ['ab5']
String 2 []
String 3 ['nl8']


In [5]:
'''
{m, n} specifies that the preceding element should occur exactly between m and n times
here 1 or 2 is acceptable
'''
string1 = "9ab567A"
string2 = "a9784ababajB"
string3 = "_678knl8L"
print('String 1', re.findall('[a-z]{1,2}\d', string1))
print('String 2', re.findall('[a-z]{1,2}\d', string2))
print('String 3', re.findall('[a-z]{1,2}\d', string3))

String 1 ['ab5']
String 2 ['a9']
String 3 ['nl8']


<h3>Problem Statement</h3>
Let us suppose that we are searching for a pattern in the following sequence:  

1. 2 word characters,  

2. followed by 2 or 3 digits,  

3. followed by 0 or more lowercase alphabets,  

4. ending with one uppercase alphabet

We will define the regex as follows:
1.  2 word characters $\rightarrow$ `\w\w`  
2.  followed by 2 or 3 digits $\rightarrow$ `\d{2,3}
3.  followed by 0 or more lowercase alphabets $\rightarrow$ `[a-z]*`
4.  ending with one uppercase alphabet$\rightarrow$ `[A-Z]$`

Combining the above together, we get the following regex
`\w\w\d{2,3}[a-z]*[A-Z]$`

In [3]:
string1 = "ab567A"  #Match
print('String 1', re.findall('\w\w\d{2,3}[a-z]*[A-Z]$', string1))

String 1 ['ab567A']


In [3]:
string2 = "ab567A123"  #Not a Match
print('String 2', re.findall('\w\w\d{2,3}[a-z]*[A-Z]$', string2))

String 1 ['ab567A']


In [6]:
string3 = "f9784bnghdjB"#Match
print('String 3', re.findall('\w\w\d{2,3}[a-z]*[A-Z]$', string3))

String 3 ['f9784bnghdjB']


In [None]:
string4 = "_b678knlL" #Match
print('String 4', re.findall('\w\w\d{2,3}[a-z]*[A-Z]$', string4))

In [None]:
string5 = "f97bnghdjB" # Not a match
print('String 5', re.findall('\w\w\d{2,3}[a-z]*[A-Z]$', string5))