<a href="https://colab.research.google.com/github/HHansi/Applied-AI-Course/blob/main/NLP/Regular_Expressions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Regular Expressions</h1>

A regular expression is a special sequence of characters that helps you match other strings, using a specialised syntax held in a pattern. 


To use regular expressions in Python you need to import <i>re</i> module. 

[re — Regular expression operations documentation](https://docs.python.org/3/library/re.html)

In [None]:
import re

# Functions and descriptions
- match - Return a Match object if characters at the beginning of string match the regular expression
- fullmatch - Return a Match object if the whole string matches the regular expression
- search - Return a Match object corresponding to the first match.
- findall - Return a list containing all matches.
- sub - Replaces one or more matches with the given string
- split - Returns a list where the string has been split at each match

## Two Methods

In [None]:
text = "abc."

In [None]:
pattern = "...\."  # define the pattern of regular expression

x = re.match(pattern, text)
print(x)

<re.Match object; span=(0, 4), match='abc.'>


The same example with more efficient code:

In [None]:
pattern = "...\."  # define the pattern of regular expression
pat = re.compile(pattern) # compile pattern into regular expression object

x = pat.match(text)
print(x)

<re.Match object; span=(0, 4), match='abc.'>


**Note:** Using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

## Examples

### Simple Pattern

In [None]:
text = "abc."

pattern = "...\."  # define the pattern of regular expression
pat = re.compile(pattern) # compile pattern into regular expression object

x = pat.match(text)
print(f'match output: {x}\n')

x = pat.fullmatch(text)
print(f'fullmatch output: {x}\n')

x = pat.search(text)
print(f'search output: {x}\n')

x = pat.findall(text)
print(f'findall output: {x}\n')

x = pat.sub("PERCENTAGE", text) # For sub, we need to input a string as a replacement for the matches in addition to the original text.
print(f'sub output: {x}\n')

x = pat.split(text)
print(f'split output: {x}\n')

match output: <re.Match object; span=(0, 4), match='abc.'>

fullmatch output: None

search output: <re.Match object; span=(0, 4), match='abc.'>

findall output: ['abc.']

sub output: PERCENTAGE def

split output: ['', ' def']



### Percentages

In [None]:
pattern = "\d+\.?\d*%"  # pattern to match percentages (e.g. 8.7%, 2.4%, 20%)
pat = re.compile(pattern)

In [None]:
text = "Monthly GDP grew by 8.7% in June 2020 as lockdown measures eased, following upwardly revised growth of 2.4% in May and a record fall of 20% in April 2020."

x = pat.match(text)
print(f'match output: {x}\n')

x = pat.fullmatch(text)
print(f'fullmatch output: {x}\n')

x = pat.search(text)
print(f'search output: {x}\n')

x = pat.findall(text)
print(f'findall output: {x}\n')

x = pat.sub("PERCENTAGE", text) # For sub, we need to input a string as a replacement for the matches in addition to the original text.
print(f'sub output: {x}\n')

x = pat.split(text)
print(f'split output: {x}\n')

match output: None

fullmatch output: None

search output: <re.Match object; span=(20, 24), match='8.7%'>

findall output: ['8.7%', '2.4%', '20%']

sub output: Monthly GDP grew by PERCENTAGE in June 2020 as lockdown measures eased, following upwardly revised growth of PERCENTAGE in May and a record fall of PERCENTAGE in April 2020.

split output: ['Monthly GDP grew by ', ' in June 2020 as lockdown measures eased, following upwardly revised growth of ', ' in May and a record fall of ', ' in April 2020.']



<b>Exercise 1</b> <br>
Remove links in the following tweets using regular expressions.

In [None]:
tweets = ["This is a much better tool than some I have come across http://www.tweepular.com",
          "says Finally, Im home.  http://plurk.com/p/rr121",
          "http://twitpic.com/66xlm -  hate when my PARKED car gets hit",
          "I want it BACK NOW!: http://bit.ly/PP1WZ",
          "https://t.co/OAsMV2N3oZ HALFTIME WATCH"]