# REGEX ( Regular Expression)

* A regular expression is a sequence of characters that uses a search pattern to find a string or set of strings ie., ‘find and replace’ –like operations.

* It can detect the presence or absence of a text by matching it with a particular pattern and splitting a pattern into one or more sub-patterns.

* Python provides a re-module that supports the use of regex in Python.

* Its primary function is to offer a search that takes a regular expression and a string. Here, it either returns the first match or else none.

* It is so powerful that it can extract a particular information from a text. So we can make our own Web Crawlers and scrappers in python.


In [None]:
#import re module 
import re

In [None]:
text = "Welcome to the world of Python. Welcome you all."
text1 = "Hello all, Welcome you to this show"

In [None]:
# In compile() - pass the string / pattern that we want to search
result = re.compile("Welcome")

#### search()
Returns None (if the pattern doesn’t match), or a re.MatchObject contains information about the matching part of the string.

In [None]:
# search() - to find/search a string or a pattern 
ans = result.search(text)
ans

In [None]:
# findall() - returns all the matching pattern as a list
print(result.findall(text))
print(result.findall(text1))

In [None]:
a = 'This is an example for regular expression'

result = re.search('example',a)

# result -> match object

# find the start index of a string
result.start()
print('Start Index:', result.start())

# find the end index of that string
print('End Index:', result.end())

# find the start and end index of that string
print('End Index:', result.span())

In [None]:
# to search a pattern/string 
re.search('regular',a)

In [None]:
a[23:30]

##### Beginning of the String
The ^ character chooses the beginning of a string.

In [None]:
sen = "The ability of a digital computer to perform tasks commonly associated with intelligent beings."


In [None]:
# Beginning of String
a = re.search(r'^The', sen)
print('Beginning of String:', a)

##### Ending of the String
The $ character chooses the end of a string.

In [None]:
match = re.search(r'beings.$', sen)
print('End of String:', match)

##### Character Classes
Character classes allow you to match a single set of characters with a possible set of characters. It is given within the square brackets

In [None]:
# Example 1
a = "Example: This is best example for Regular Expression"

print(re.findall(r'[Ee]xample', a))

In [None]:
# Example 2
text = 'take this time to the next level from a ashok@datamites.com productivity\
             standpoint Time each info@ernet.co.in time you send an pradeep@gmail.com email'

match = re.findall(r'[Tt]ime', text)
match

##### Range
The range provides the flexibility to match a text with the help of a range pattern such as a range of numbers(0 to 9), a range of characters (A to Z), and so on. The hyphen within the character class represents the range.

In [None]:
text = 'The cyber security has become one of the most important ascept of the business. \
$550 million has been invested research. 12.45.65.78 is one the most spammed inject ips. \
ask@web.com. $600 million wasted. save safe.'

In [None]:
print('Hello',re.search(r'[a-z A-Z]', text))

##### Negation
It will look for a match except for the inverted character or range of inverted characters mentioned in the character class.

In [None]:
a = "Hello Harish, how are you? Are you coming to the party?"
print(re.findall(r'H[^e]', a))

In [None]:
print(re.search(r'H[^a]', a))

In [None]:
print(re.search(r'c[^o]', a))

##### Any Character
The . character represents any single character outside a bracketed character class.

In [None]:
sen

In [None]:
print(re.search(r'p.rf..m', sen))

##### Optional Character
We can specify optional characters using the ? character which allows a character or character class either to present once or else not to occur.

In [None]:
se = "to spell humour, some use humour and some use humor"

In [None]:
print('humour',re.findall(r'humo?ur', se)) 

In [None]:
print(re.findall(r'humou?r', se)) 

In [None]:
text = 'The cyber security has become one of the most important ascept of the business. \
$550 million has been invested research. 12.45.65.78 is a most spammed inject ips. \
ask@web.com. $600 million wasted. save safe sale.'

In [None]:
re.findall('sa.e',text)

In [None]:
p = re.compile('\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

In [None]:
p = re.compile('\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

In [None]:
# \d+ will match a group on [0-9], group of one or greater size
re.findall('\d+',text)

In [None]:
re.findall('\$550',text)

In [None]:
# Finds either of the pattern
re.findall('safe|saze',text)

In [None]:
text

In [None]:
re.findall('[a-z]+',text)

In [None]:
text

In [None]:
# search for email ids
re.findall('[a-z0-9]+@[a-z0-9]+.com',text)

In [None]:
# \w is equivalent to [a-zA-Z0-9_].
p = re.compile('\w')
print(p.findall("He said * in some_lang."))
 
# \w+ matches to group of alphanumeric character.
p = re.compile('\w+')
print(p.findall("I went to him at 11 A.M., he \
said *** in some_language."))
 
# \W matches to non alphanumeric characters.
p = re.compile('\W')
print(p.findall("he said *** in some_language."))

In [None]:
# split the string by a character or a pattern
print(re.split('\W+', text))

In [None]:
print(re.split('\d+', text))

In [None]:
print(re.split('\W+', text))

In [None]:
# https://www.geeksforgeeks.org/regular-expression-python-examples-set-1/

##### split()

In [None]:
sen1 = "Take $ 100 and buy 20 oranges 12 Apples, 50 bananas and 30 Guavas. Keep remaining $ with you"

In [None]:
re.split('\d', sen1)

In [None]:
print(re.split('\W+', "Mohan house is near"))
print(re.split('\W+', "Mohan's house is near"))
print(re.split('\W+', 'Neeta, Raghu and Harish are playing'))

In [None]:
print(re.split('\d+', 'Complete it on or before 15th Jan 2023. The exam will be on 25th Jan 2023', 3))

In [None]:
print(re.split('[a-f]+', 'Come on Neeta.. Cheer up!'))
print(re.split('[a-f]+', 'Come on Neeta.. Cheer up!', flags=re.IGNORECASE))

#### sub()

- It is used to find the substring in the given string and replace it with a new string

In [None]:
print(re.sub('am', '#-', 'Ramya came to the CAMPUS yesterday',
             flags=re.IGNORECASE))

In [None]:
print(re.sub('am', '&', 'Ramya came to the CAMPUS yesterday'))

In [None]:
print(re.sub('y$', 'ies', 'emergency'))

#### subn()
Similar to sub() except that it tells the count of replacement made and the new string

In [None]:
print(re.subn('am', '#-', 'Ramya came to the CAMPUS yesterday',
             flags=re.IGNORECASE))

#### escape()
It returns a string with BackSlash before every Non-Alphanumeric Character

In [None]:
print(re.escape("Hurry up! It's 5pm. We don't have time." ))

#### group()
It returns the part of the string for which the patterns match.

In [None]:
sen = "Look into this problem"
 
# here x is the match object
res = re.search(r"\D{3} t", sen)
 
print(res.group())

#### match()
It tries to match the pattern to whole string. If match is found, it returns the match object else none.

In [None]:
sen = "Engagement is on April 5 and wedding is on May 26"
a = re.match("([a-zA-Z]+) (\d+)", sen)
a

In [None]:
sen1 = "Jan 26"
a = re.match("([a-zA-Z]+) (\d+)", sen1)
a.group()