# Regular Expression (Regex) 

- A regular expression or RegEx is a special text string that helps to find patterns in data. 
- A RegEx can be used to check if some pattern exists in a different data type.

In [None]:
# The "re" module
import re

### Methods in <i>re</i> module
To find a pattern, we use different set of regex characters that allows to search for a match in a string
- *re.compile(r"", re.I)*: actively compile pattern to RegexObject (re.Pattern), re.I is cate ignore
- *re.match()*: searches only in the beginning of the first string and returns matched objects if found, else returns None. 
- *re.search()*: Returns a match object if there is one anywhere in the string, including multiline strings. 
- *re.findall()*: Returns a list containing all matches. 
- *re.split()*: Takes a string, splits it at the match points, returns a list.
- *re.sub()*: Replaces one or manhh matches within a string.

### RegexObject (one-time compile, multiple-time use)

- compile to regexObject by using re.compile(regex) 
- a compiled version of traditional pattern which `re` use internal every time you call function. 
- It is much faster
- It is bigger in length, but professional
- Have some methods: .search(), .match(), .findall(), .sub(), ...

In [None]:
# pattern.match(string)
# pattern is the part that we use it to find, string is the text that we look for a pattern

import re
from re import Pattern

text: str = "I love to teach python and javascript"

pattern: Pattern[str] = re.compile("I love to teach", re.I)

match = pattern.match(text)

print(match)
if match is not None:
    span: tuple[int, int] = match.span() 
    print(span) 

    start, end = span # unpacking
    print(start, end)

    substring: str = text[start:end] # slicing

    print(substring) 

<re.Match object; span=(0, 15), match='I love to teach'>
(0, 15)
0 15
I love to teach


In [1]:
# pattern.search(string) 

from re import Pattern
import re

text: str = """Python is the most beautiful language that a human being has ever created. I recommend python for a first programming language"""

pattern: Pattern[str] = re.compile("first", re.I) 
match = pattern.search(text)
print(match) 

if match is not None:
    span: tuple[int, int] = match.span()
    print(span) 

    start, end = span 
    substring: str = text[start:end] 
    print(substring) 


<re.Match object; span=(100, 105), match='first'>
(100, 105)
first


In [None]:
# pattern.findall(string) 

import re
from re import Pattern

text: str = """Python is the most beautiful language that a human being has ever created. I recommend python for a first programming language"""

pattern: Pattern[str] = re.compile("language", re.I) 

matches = pattern.findall(text) 
print(matches) 

['language', 'language']


In [None]:
# pattern.sub("string_to_replace", text)

import re
from re import Pattern 

text: str = """Python is the most beautiful language that a human being has ever created. I recommend python for a first programming language"""

pattern: Pattern[str] = re.compile("Python|python", re.I)

match_replaced: str = pattern.sub("JavaScript", text)

print(match_replaced)

JavaScript is the most beautiful language that a human being has ever created. I recommend JavaScript for a first programming language


In [9]:
import re
from re import Pattern

txt = '''I am teacher and  I love teaching.
There is nothing as rewarding as educating and empowering people.
I found teaching more interesting than any other jobs.
Does this motivate you to be a teacher?'''

pattern: Pattern[str] = re.compile("\n") 

matches: list[str] = pattern.split(txt) 

print(matches) 

['I am teacher and  I love teaching.', 'There is nothing as rewarding as educating and empowering people.', 'I found teaching more interesting than any other jobs.', 'Does this motivate you to be a teacher?']


##### Writing RegEx Patterns

* []:  A set of characters
  - [a-c] means, a or b or c
  - [a-z] means, any letter from a to z
  - [A-Z] means, any character from A to Z
  - [0-3] means, 0 or 1 or 2 or 3
  - [0-9] means any number from 0 to 9
  - [A-Za-z0-9] any single character, that is a to z, A to Z or 0 to 9
- \\:  uses to escape special characters
  - \d means: match where the string contains digits (numbers from 0-9)
  - \D means: match where the string does not contain digits
- . : any character except new line character(\n)
- ^: starts with
  - r'^substring' eg r'^love', a sentence that starts with a word love
  - r'[^abc] means not a, not b, not c.
- $: ends with
  - r'substring$' eg r'love$', sentence  that ends with a word love
- *: zero or more times
  - r'[a]*' means a optional or it can occur many times.
- +: one or more times
  - r'[a]+' means at least once (or more)
- ?: zero or one time
  - r'[a]?' means zero times or once
- {3}: Exactly 3 characters
- {3,}: At least 3 characters
- {3,8}: 3 to 8 characters
- |: Either or
  - r'apple|banana' means either apple or a banana
- (): Capture and group

<img src="images/regex.png">