### Regex
- A RegEx or regular expression, is a sequence of characters that forms a search pattern
- RegEx can be used to check if a string contains the specifies search pattern
#### Module
- Python has a built-in package called `re`, which can be used to work with regular expressions

### Regex functions
- `re.findall` Returns a list containing all matches
- `re.search` Returns a match object if there is a match anywhere in the string
- `re.split` Returns a list where the string has been split at each match
- `re.sub` Returns one or many matches with a string

In [1]:
import re

1. The findall() function
- Returns a list containing all matches

In [3]:
txt = 'The rain in Spain'
search = re.findall('ai',txt)
print(search)

['ai', 'ai']


In [6]:
txt = 'The totl bill was $130 for lunch Eshaan, Parth, Srushti and Aditi need to pay the amoutn equally'
re.findall('\d',txt)

['1', '3', '0']

In [10]:
txt = 'The total bill was $120 for lucnh, Eshaan, Parth, srushti and Aditi need to pay the amount equally. They arrived at the restaurant at 4.30 pm to 7.30pm and ordered 6 dishes and 2 bottles of wine'
# if we just put\$ then it won't return the amount and only the dollar sign, + means both conditions are find
re.findall('\$+\d+',txt)

['$120']

In [11]:
re.findall('\d+\.+\d+',txt) # will check if the pattern is occurrening once or more.

['4.30', '7.30']

In [13]:
re.findall('\d+\.+\d+\s*[pm]+',txt)

['4.30 pm', '7.30pm']

2. The search() function
- The `search()` function searches the string for a match and returns a match object if there's a match. If there is more than one match only the first occurence of the match will be returned

In [15]:
txt = 'The rain in Spain'
search = re.search('\s',txt)
print('The first whte-space character is located in position: ',search.start())

The first whte-space character is located in position:  3


- If no matches are found the value None is returned

In [16]:
txt = 'The rain in Spain'
search = re.search('Portugal',txt)
print(search)

None


3. The split() function
- The spit() function returns a list where the string is spit at each match

In [17]:
txt = 'The rain in Spain'
search = re.split('\s',txt)
print(search)

['The', 'rain', 'in', 'Spain']


You can control the number of occurences by specifying the maxsplit parameter

In [18]:
txt = 'The rain in Spain'
re.split('\s',txt,1)

['The', 'rain in Spain']

In [19]:
txt = 'The rain in Spain'
re.split('\s',txt,2)

['The', 'rain', 'in Spain']

4. The sub() function
- The sub() function replaces the math with the text of our choice

In [20]:
txt = 'The rain in Spain'
re.sub('\s','_',txt)

'The_rain_in_Spain'

You can control the number of replacements by specifying the maxsplit parameter

In [22]:
txt = 'The rain in Spain'
re.sub('\s','_',txt,2)

'The_rain_in Spain'

### Match Object
A match objet is an object containing information about the search and the result
- If there's no match, the value **None** will be returned, instead of the Match object

In [23]:
txt = 'The rain in Spain'
re.search('ai',txt)

<re.Match object; span=(5, 7), match='ai'>

The match object has properties and methods used to retrieve information about the search and the result:
- `.span()` returns a tuple containing the start and end positions of the match
- `.group()` returns the part of the string where there was a match

In [26]:
txt = 'The rain in Spain'
search = re.search(r'\bS\w+',txt)

In [27]:
# span()
search.span()

(12, 17)

In [29]:
search.group()

'Spain'