## Regular_Expressions(RegEx)

- A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

- RegEx can be used to check if a string contains the specified search pattern.

### RegEx Module

- Python has a built-in package called re, which can be used to work with Regular Expressions.

- Import the re module:

In [52]:
import re

In [55]:
import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
x

<re.Match object; span=(0, 17), match='The rain in Spain'>

In [58]:
import re

txt = "The rain in Spain Europe"
x = re.search("^The.*Spain$", txt)
x

In [59]:
import re

txt = "The rain in Spain Europe"
x = re.search("^The.*Europe$", txt)
x

<re.Match object; span=(0, 24), match='The rain in Spain Europe'>

In [60]:
import re

txt = "The rain in Spain Europe"
x = re.search("^the.*Europe$", txt)
x

In [62]:
import re

txt = "The Europe"
x = re.search("^The.*Europe$", txt)
x

<re.Match object; span=(0, 10), match='The Europe'>

In [65]:
import re

txt = "The rain in Spain,--------_________ Europe"
x = re.search("^The.*Europe$", txt)
x

<re.Match object; span=(0, 42), match='The rain in Spain,--------_________ Europe'>

## Function	Description
- findall	------------------ Returns a list containing all matches
- search -------------------	Returns a Match object if there is a match anywhere in the string
- split --------------	Returns a list where the string has been split at each match
- sub -----------	Replaces one or many matches with a string

In [4]:
import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

['ai', 'ai']


In [5]:
import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

[]


## Metacharacters
- Metacharacters are characters with a special meaning:

- Character	Description	Example	Try it
- []	A set of characters	"[a-m]"	
- \	Signals a special sequence (can also be used to escape special characters)	"\d"	
- .	Any character (except newline character)	"he..o"	
- ^	Starts with	"^hello"	
- $	Ends with	"planet$"	
- *	Zero or more occurrences	"he.*o"	
- +	One or more occurrences	"he.+o"	
- ?	Zero or one occurrences	"he.?o"	
- {}	Exactly the specified number of occurrences	"he.{2}o"	
- |	Either or	"falls|stays"	
- ()	Capture and group

The first white-space character is located in position: 3


## Character	Description	Example	Try it
- \A	Returns a match if the specified characters are at the beginning of the string	"\AThe"	
- \b	Returns a match where the specified characters are at the beginning or at the end of a word
- (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\bain"
   r"ain\b"	
- \B	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
- (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\Bain"
   r"ain\B"	
- \d	Returns a match where the string contains digits (numbers from 0-9)	"\d"	
- \D	Returns a match where the string DOES NOT contain digits	"\D"	
- \s	Returns a match where the string contains a white space character	"\s"	
- \S	Returns a match where the string DOES NOT contain a white space character	"\S"	
 - \w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the          underscore         _ character)	"\w"	
- \W	Returns a match where the string DOES NOT contain any word characters	"\W"	
- \Z	Returns a match if the specified characters are at the end of the string	"Spain\Z"	
       Sets
- A set is a set of characters inside a pair of square brackets [] with a special meaning:

## Set	Description	Try it
[arn]	Returns a match where one of the specified characters (a, r, or n) are present	
[a-n]	Returns a match for any lower case character, alphabetically between a and n	
[^arn]	Returns a match for any character EXCEPT a, r, and n	
[0123]	Returns a match where any of the specified digits (0, 1, 2, or 3) are present	
[0-9]	Returns a match for any digit between 0 and 9	
[0-5][0-9]	Returns a match for any two-digit numbers from 00 and 59	
[a-zA-Z]	Returns a match for any character alphabetically between a and z, lower case OR upper case	
[+]	In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

## The search() Function
- The search() function searches the string for a match, and returns a Match object if there is a match.

- If there is more than one match, only the first occurrence of the match will be returned:

In [24]:
import re

txt = "The rain in Spain"
x = re.search("\s", txt)
print("x",x)
print("The first white-space character is located in position:", x.start())

x <re.Match object; span=(3, 4), match=' '>
The first white-space character is located in position: 3


In [66]:
import re

txt = "Therain in Spain"
x = re.search("\s", txt)
print(x.start())

7


In [67]:
import re

txt = "The rain in Spain"
x = re.search("India", txt)
print(x)

None


In [20]:
import re

txt = "The rain in Spain"
x = re.search("rain", txt)
print(x)

<re.Match object; span=(4, 8), match='rain'>



## The split() Function
- The split() function returns a list where the string has been split at each match:

In [25]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

['The', 'rain', 'in', 'Spain']


You can control the number of occurrences by specifying the maxsplit parameter:

In [27]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

['The', 'rain in Spain']


In [68]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt, 2)
print(x)

['The', 'rain', 'in Spain']


## The sub() Function
- The sub() function replaces the matches with the text of your choice:

In [28]:
import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

The9rain9in9Spain


You can control the number of replacements by specifying the count parameter:

In [71]:
import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt,1)
print(x)

The9rain in Spain


## Match Object
- A Match Object is an object containing information about the search and the result.

In [36]:
import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

<re.Match object; span=(5, 7), match='ai'>


### The Match object has properties and methods used to retrieve information about the search, and the result:

- .span() returns a tuple containing the start-, and end positions of the match.
- .string returns the string passed into the function
- .group() returns the part of the string where there was a match

In [38]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS", txt)
print(x.span())

(12, 13)


In [79]:
import re

txt = "The rain in Spain"
x = re.search(r"\br", txt)
print(x.span())

(4, 5)


In [77]:
import re

txt = "The rain in Spain"
x = re.search("Spain$", txt)
print(x.span())

(12, 17)


<re.Match object; span=(12, 17), match='Spain'>

In [42]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

(12, 17)


In [41]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

The rain in Spain


### Print the string passed into the function:

In [43]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

The rain in Spain



### Print the part of the string where there was a match.

### The regular expression looks for any words that starts with an upper case "S":

In [81]:
import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print("group------",x.group())
print("string------",x.string)

group------ Spain
string------ The rain in Spain


In [83]:
import re

pattern = '^a...s$'
test_string = 'a+-/s'
result = re.match(pattern, test_string)

if result:
  print("Search successful.")
else:
  print("Search unsuccessful.")	

print(result)

Search successful.
<re.Match object; span=(0, 5), match='a+-/s'>


In [47]:

# Program to extract numbers from a string

import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'

result = re.findall(pattern, string) 
print(result)

['12', '89', '34']


In [85]:

import re

string = 'Twelve:12 Eighty nine:89'
pattern = '\d+'

result = re.split(pattern, string) 
print(result)

# Output: ['Twelve:', ' Eighty nine:', '.']

['Twelve:', ' Eighty nine:', '']


In [49]:

import re

string = 'Twelve:12 Eighty nine:89 Nine:9.'
pattern = '\d+'

# maxsplit = 1
# split only at the first occurrence
result = re.split(pattern, string, 1) 
print(result)

# Output: ['Twelve:', ' Eighty nine:89 Nine:9.']

['Twelve:', ' Eighty nine:89 Nine:9.']


In [86]:

# Program to remove all whitespaces
import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = '-'

new_string = re.sub(pattern, replace, string) 
print(new_string)

abc-12de-23-f45-6
