### <center>Regular Expressions</center>

#### If we want to represent a group of Strings according to a particular format/pattern then we should go for Regular Expressions. i.e Regualr Expressions is a declarative mechanism to represent a group of Strings accroding to particular format/pattern.
#### Eg 1: We can write a regular expression to represent all mobile numbers
#### Eg 2: We can write a regular expression to represent all mail ids.
#### The main important application areas of Regular Expressions are
#### 1. To develop validation frameworks/validation logic
#### 2. To develop Pattern matching applications (ctrl-f in windows, grep in UNIX etc)
#### 3. To develop Translators like compilers, interpreters etc
#### 4. To develop digital circuits
#### 5. To develop communication protocols like TCP/IP, UDP etc.
#### We can develop Regular Expression Based applications by using python module: re
#### This module contains several inbuilt functions to use Regular Expressions very easily in our applications.

#### <center>1. compile()</center>
#### re module contains compile() function to compile a pattern into RegexObject.

In [None]:
pattern = re.compile("ab")

#### <center>2. finditer()</center>
#### Returns an Iterator object which yields Match object for every Match
#### matcher = pattern.finditer("abaababa")
#### On Match object we can call the following methods.
#### 1. start() Returns start index of the match
#### 2. end()  Returns end+1 index of the match
#### 3. group() Returns the matched string

In [3]:
import re 
count=0

pattern=re.compile("ab")
matcher=pattern.finditer("abaababa")

for match in matcher:
    count+=1
    print(match.start(),"...",match.end(),"...",match.group())
print("The number of occurrences: ",count)

0 ... 2 ... ab
3 ... 5 ... ab
5 ... 7 ... ab
The number of occurrences:  3


#### Note: We can pass pattern directly as argument to finditer() function.

In [4]:
import re
 
count=0
matcher=re.finditer("ab","abaababa")

for match in matcher:
    count+=1
    print(match.start(),"...",match.end(),"...",match.group())
print("The number of occurrences: ",count) 

0 ... 2 ... ab
3 ... 5 ... ab
5 ... 7 ... ab
The number of occurrences:  3


#### <center>3. Character classes</center>
#### We can use character classes to search a group of characters
#### 1. [abc]===>Either a or b or c
#### 2. [^abc] ===>Except a and b and c
#### 3. [a-z]==>Any Lower case alphabet symbol
#### 4. [A-Z]===>Any upper case alphabet symbol
#### 5. [a-zA-Z]==>Any alphabet symbol
#### 6. [0-9] Any digit from 0 to 9
#### 7. [a-zA-Z0-9]==>Any alphanumeric character
#### 8. [^a-zA-Z0-9]==>Except alphanumeric characters(Special Characters)

In [7]:
import re
matcher=re.finditer("x","a7b@k9z")
for match in matcher:
    print(match.start(),"......",match.group()) 

#### <center>4. Pre defined Character classes</center>
#### \s  Space character
#### \S  Any character except space character
#### \d  Any digit from 0 to 9
#### \D  Any character except digit
#### \w  Any word character [a-zA-Z0-9]
#### \W  Any character except word character (Special Characters)
#### .  Any character including special characters

In [8]:
import re

matcher=re.finditer("x","a7b k@9z")
for match in matcher:
    print(match.start(),"......",match.group()) 

#### <center>5. Qunatifiers</center>
#### We can use quantifiers to specify the number of occurrences to match.
#### a Exactly one 'a'
#### a+ Atleast one 'a'
#### a* Any number of a's including zero number
#### a?  Atmost one 'a' ie either zero number or one number
#### a{m} Exactly m number of a's
#### a{m,n}  Minimum m number of a's and Maximum n number of a's

In [12]:
import re
matcher=re.finditer("x","abaabaaab")
for match in matcher:
    print(match.start(),"......",match.group()) 

#### Note:
#### 1. ^x It will check whether target string starts with x or not
#### 2. x$  It will check whether target string ends with x or not

#### <center>6. Important functions of re module</center>
#### 1. match()
#### 2. fullmatch()
#### 3. search()
#### 4.findall()
#### 5.finditer()
#### 6. sub()
#### 7.subn()
#### 8. split()
#### 9. compile()

#### <center>1. match()</center>
#### We can use match function to chck the given pattern at beginning of target string.
#### If the match is available then we will get Match object, otherwise we will get None.


In [15]:
import re

s=input("Enter pattern to check: ")
m=re.match(s,"abcabdefg")

if m!= None:
    print("Match is available at the beginning of the String")
    print("Start Index:",m.start(), "and End Index:",m.end())

else:
    print("Match is not available at the beginning of the String") 

Enter pattern to check: abc
Match is available at the beginning of the String
Start Index: 0 and End Index: 3


#### <center>2. fullmatch()</center>
#### We can use fullmatch() function to match a pattern to all of target string. i.e complete string should be matched according to given pattern.
#### If complete string matched then this function returns Match object otherwise it returns None.

In [17]:
import re
s=input("Enter pattern to check: ")
m=re.fullmatch(s,"ababab")

if m!= None:
    print("Full String Matched")
else:
    print("Full String not Matched") 

Enter pattern to check: ababab
Full String Matched


#### <center>3. search()</center>
#### We can use search() function to search the given pattern in the target string.
#### If the match is available then it returns the Match object which represents first occurrence of the match.
#### If the match is not available then it returns None

In [18]:
import re
s=input("Enter pattern to check: ")
m=re.search(s,"abaaaba")
if m!= None:
    print("Match is available")
    print("First Occurrence of match with start index:",m.start(),"and end index:",m.end())

else:
    print("Match is not available") 

Enter pattern to check: aaa
Match is available
First Occurrence of match with start index: 2 and end index: 5


In [19]:
import re
s=input("Enter pattern to check: ")
m=re.search(s,"abaaaba")
if m!= None:
    print("Match is available")
    print("First Occurrence of match with start index:",m.start(),"and end index:",m.end())

else:
    print("Match is not available") 

Enter pattern to check: bbb
Match is not available


#### <center>4. findall()</center>
#### To find all occurrences of the match.
#### This function returns a list object which contains all occurrences.

In [20]:
import re
l=re.findall("[0-9]","a7b9c5kz")
print(l)

['7', '9', '5']


#### <center>5. finditer()</center>
#### Returns the iterator yielding a match object for each match.
#### On each match object we can call start(), end() and group() functions.

In [21]:
import re

itr=re.finditer("[a-z]","a7b9c5k8z")

for m in itr:
    print(m.start(),"...",m.end(),"...",m.group()) 

0 ... 1 ... a
2 ... 3 ... b
4 ... 5 ... c
6 ... 7 ... k
8 ... 9 ... z


#### <center>6. sub()</center>
#### sub means substitution or replacement
#### re.sub(regex,replacement,targetstring)
#### In the target string every matched pattern will be replaced with provided replacement.

In [22]:
import re
s=re.sub("[a-z]","#","a7b9c5k8z")
print(s) 

#7#9#5#8#


#### <center>7. subn()</center>
#### It is exactly same as sub except it can also returns the number of replacements.
#### This function returns a tuple where first element is result string and second element is number of replacements.
#### (resultstring, number of replacements)

In [24]:
import re
t=re.subn("[a-z]","#","a7b9c5k8z")
print(t)
print("The Result String:",t[0])
print("The number of replacements:",t[1]) 

('#7#9#5#8#', 5)
The Result String: #7#9#5#8#
The number of replacements: 5


#### <center>8. split()</center>
#### If we want to split the given target string according to a particular pattern then we should go for split() function.
#### This function returns list of all tokens.

In [25]:
import re
l=re.split(",","sunny,bunny,chinny,vinny,pinny")
print(l)
for t in l:
    print(t) 

['sunny', 'bunny', 'chinny', 'vinny', 'pinny']
sunny
bunny
chinny
vinny
pinny


In [26]:
import re
l=re.split("\.","www.durgasoft.com")
for t in l:
    print(t) 

www
durgasoft
com


#### <center>9. ^ symbol</center>
#### We can use ^ symbol to check whether the given target string starts with our provided pattern or not.
#### Eg: res=re.search("^Learn",s)
#### if the target string starts with Learn then it will return Match object,otherwise returns None.

In [27]:
import re

s="Learning Python is Very Easy"
res=re.search("^Learn",s)

if res != None:
    print("Target String starts with Learn")
else:
    print("Target String Not starts with Learn")

Target String starts with Learn


#### <center> 10. Dollar symbol</center>
#### We can use dollar symbol to check whether the given target string ends with our provided pattern or not
#### Eg: res=re.search("Easy dollar",s) 
#### If the target string ends with Easy then it will return Match object,otherwise returns None.

In [28]:
import re

s="Learning Python is Very Easy"
res=re.search("Easy$",s)

if res != None:
    print("Target String ends with Easy")

else:
    print("Target String Not ends with Easy") 

Target String ends with Easy


#### Note: If we want to ignore case then we have to pass 3rd argument re.IGNORECASE for search() function.
#### Eg: res = re.search("easy$",s,re.IGNORECASE)

In [31]:
import re

s="Learning Python is Very Easy"
res=re.search("easy$",s,re.IGNORECASE)

if res != None:
    print("Target String ends with Easy by ignoring case")

else:
        print("Target String Not ends with Easy by ignoring case") 

Target String ends with Easy by ignoring case


#### <center>App1: Write a Regular Expression to represent all Yava language identifiers</center>
#### Rules:
#### 1. The allowed characters are a-z,A-Z,0-9,#
#### 2. The first character should be a lower case alphabet symbol from a to k
#### 3. The second character should be a digit divisible by 3
#### 4. The length of identifier should be atleast 2.
#### [a-k][0369][a-zA-Z0-9#]*


#### <center>App2: Write a python program to check whether the given string is Yava language identifier or not?</center>

In [32]:
import re

s=input("Enter Identifier:")
m=re.fullmatch("[a-k][0369][a-zA-Z0-9#]*",s)

if m!= None:
    print(s,"is valid Yava Identifier")
 
else:
    print(s,"is invalid Yava Identifier") 

Enter Identifier:a6kk9z##
a6kk9z## is valid Yava Identifier


#### <center>App3: Write a Regular Expression to represent all 10 digit mobile numbers</center>
#### Rules:
#### 1. Every number should contains exactly 10 digits
#### 2. The first digit should be 7 or 8 or 9
#### [7-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
#### or
#### [7-9][0-9]{9}
#### or
#### [7-9]\d{9}

#### <center>App4: Write a Python Program to check whether the given number is valid mobile number or not?</center>


In [33]:
import re

n=input("Enter number:")
m=re.fullmatch("[7-9]\d{9}",n)

if m!= None:
    print("Valid Mobile Number")
else:
    print("Invalid Mobile Number") 

Enter number:9898989898
Valid Mobile Number


#### <center>App5: Write a python program to extract all mobile numbers present in input.txt where numbers are mixed with normal text data</center>

In [None]:
import re
f1=open("input.txt","r")
f2=open("output.txt","w")

for line in f1:
    list=re.findall("[7-9]\d{9}",line)

for n in list:
    f2.write(n+"\n")
print("Extracted all Mobile Numbers into output.txt")

f1.close()
f2.close()

#### <center>7. Web Scraping by using Regular Expressions</center>
#### The process of collecting information from web pages is called web scraping. In web scraping to match our required patterns like mail ids, mobile numbers we can use regular expressions.


In [None]:
import re,urllib
import urllib.request

sites="google rediff".split()
print(sites)

for s in sites:
    print("Searching...",s)
    u=urllib.request.urlopen("http://"+s+".com")
    text=u.read()
    title=re.findall("<title>.*</title>",str(text),re.I)
    print(title[0])

#### <center>Eg: Program to get all phone numbers of redbus.in by using web scraping and regular expressions</center>

In [None]:
import re,urllib
import urllib.request

u=urllib.request.urlopen("https://www.redbus.in/info/contactus")
text=u.read()
numbers=re.findall("[0-9-]{7}[0-9-]+",str(text),re.I)

for n in numbers:
    print(n) 

#### <center>Q. Write a Python Program to check whether the given mail id is valid gmail id or not?</center>

In [None]:
import re

s=input("Enter Mail id:")
m=re.fullmatch("\w[a-zA-Z0-9_.]*@gmail[.]com",s)

if m!=None:
    print("Valid Mail Id");

else:
    print("Invalid Mail id") 

#### <center>Q. Write a python program to check whether given car registration number is valid Telangana State Registration number or not?</center>

In [None]:
import re

s=input("Enter Vehicle Registration Number:")
m=re.fullmatch("TS[012][0-9][A-Z]{2}\d{4}",s)

if m!=None:
    print("Valid Vehicle Registration Number");
else:
    print("Invalid Vehicle Registration Number") 

#### <center>Q. Python Program to check whether the given mobile number is valid OR not (10 digit OR 11 digit OR 12 digit)</center>

In [None]:
import re

s=input("Enter Mobile Number:")
m=re.fullmatch("(0|91)?[7-9][0-9]{9}",s)

if m!=None:
    print("Valid Mobile Number");
else:
    print("Invalid Mobile Number") 