<h1 align="center">Python regular expression</h1>

<center><img src="https://learnbyexample.github.io/images/books/pyregex_example.png" alt="Regex example" /></center>

# <center>re module functions</center>

In [None]:
import re

### The function definitions are given below:

- re.search(pattern, string, flags=0)
- re.compile(pattern, flags=0)
- re.sub(pattern, repl, string, count=0, flags=0)
- re.escape(pattern)
- re.split(pattern, string, maxsplit=0, flags=0)
- re.findall(pattern, string, flags=0)
- re.finditer(pattern, string, flags=0)
- re.subn(pattern, repl, string, count=0, flags=0)

### **1. Example for ``re.search``**


In [None]:
sentence = 'This is a sample string'

In [None]:
# check if 'sentence' contains the pattern described by RE argument
type(re.search(r'is', sentence))

_sre.SRE_Match

In [None]:
re.search(r'b.*d', 'abc ac adc abbbc')

<_sre.SRE_Match object; span=(1, 9), match='bc ac ad'>

In [None]:
match = re.search(r'\b(is).*(ple)', sentence)

In [None]:
match.group()

'is a sample'

In [None]:
match.groups()

('is', 'ple')

In [None]:
# ignore case while searching for a match
bool(re.search(r'this', sentence, flags=re.I))

True

In [None]:
bool(re.search(r'xyz', sentence))

False

In [None]:
# re.search output can be directly used in conditional expressions
if re.search(r'ring', sentence):
    print('True')

True


In [None]:
words = ['The effort', 'flee', 'facade', 'oddball', 'rat', 'tool']
[w for w in words if re.search(r'\b\w*(\w)\1\w*\b', w)]

['The effort', 'flee', 'oddball', 'tool']

In [None]:
words = ['effort', 'flee', 'facade', 'oddball', 'rat', 'tool']
[w for w in words if re.search(r'\b\w*(\w)\1\w*(\w)\2\w*\b', w)]

['oddball']

In [None]:
test = re.findall(r'\d{2, 3')

TypeError: ignored

### Use raw byte strings if input is of byte data type

In [None]:
bool(re.search(rb'is', b'This is a sample string'))

## Difference between string and line anchors

### - string anchors

In [None]:
bool(re.search(r'^par$', 'spare\npar\ndare'))


### - line anchors

In [None]:
bool(re.search(r'^par$', 'spare\npar\ndare', flags=re.M))

### **1. Example for ``re.findall``**

In [None]:
re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')

In [None]:
re.findall(r'\b0*[1-9]\d{1,3,}.\d{1,3}\b', '0501 035 154 12 26 98234')

In [None]:
re.findall(r'(x*):(y*)', 'xx:yyy x: x:yy :y')

In [None]:
re.findall(r'\b\w*(?:ách|au)\b', 'bao giờ mới hết cách ly để anh em ta gặp nhau')

### **2. Example for ``re.split``**

In [None]:
re.split(r'\d+', 'Sample123string42with777numbers')

In [None]:
re.split(r'[\d\s]+', '**1\f2\n3star\t7 77\r**')

In [None]:
re.split(r'(\d+)', 'Sample123string42with777numbers')

In [None]:
re.split(r'(hand(?:y|ful))', '123handed42handy777handful500')

In [None]:
ff = re.findall(r'\b\w*(\w)\1\w*\b', words[0])
print(ff)

### **3. Example for ``re.finditer``**


In [None]:
# numbers < 350
m_iter = re.finditer(r'[0-9]+', '45 349 651 593 4 204')
[m for m in m_iter]

In [None]:
[m[0] for m in m_iter if int(m[0]) < 350]

In [None]:
# start and end+1 index of each matching portion
m_iter = re.finditer(r'ab+c', 'abc ac adc abbbc')
[m for m in m_iter]

In [None]:
for m in m_iter:
     print(m.span())

### **4. Example for ``re.sub``**


In [None]:
ip_lines = "catapults\nconcatenate\ncat"

In [None]:
print(re.sub(r'^', r'* ', ip_lines, flags=re.M))


In [None]:
# replace 'par' only at start of word
re.sub(r'\bpar', r'X', 'par spar apparent spare part')



In [None]:
# same as: r'part|parrot|parent'
re.sub(r'par(en|ro)?t', r'X', 'par part parrot parent')


In [None]:
# remove first two columns where : is delimiter
re.sub(r'\A([^:]+:){2}', r'', ' foo:123:bar:baz', count=1)

In [None]:
# backreferencing in replacement section
# remove any number of consecutive duplicate words separated by space
re.sub(r'\b(\w+)(\s+\1)+\b', r'\1', 'aa a aa a a 42 f_1 f_1 f_13.14')

In [None]:
# add something around the matched strings
re.sub(r'(\d+)', r'(\g<0>0)', '52 apples and 31 mangoes')


In [None]:
# swap words that are separated by a comma
re.sub(r'(\w+),(\w+)', r'\2,\1', 'good,bad 42,24')


### **5. Quick test**
Cho chuỗi sau:
``Sau khi dịch covid-19 bùng phát giá dầu giảm 10.5%, giá điện giảm 15%, giá khẩu trang tăng 300%``
Trích xuất phần trăm tăng và giảm

In [None]:
text = u'Sau khi dịch covid-19 bùng phát giá dầu giảm 10.5%, giá điện giảm 15%, giá khẩu trang tăng 300%'

In [None]:
# liệt kê danh sách phần trăm giảm


In [None]:
# liệt kê danh sách phần trăm tăng


In [None]:
tt = u'tăng từ 10%. tăng 10.5%. giảm còn 10%'
