## Introduction to Regular Expression

Regex (also called Regular Expression) is a mini-language that looks encrypted and mysterious at first. 
It is a wildcard that helps in parsing through strings and matching exact patterns in a text. If you frequently find yourself manually scanning documents or parsing substrings just to identify text patterns, you might find regular expressions particularly useful. Especially in data science and data engineering, they can assist in a wide spectrum of tasks, from wrangling data to qualifying and categorizing it.

## Regex in Python

#### Importing the Library

In [2]:
import re

### Literal Characters in Regex

In [2]:
txt = 'Cats and dogs'

In [8]:
re.match('Cats',txt) # match() function returns a match object if the text is found, else it returns a None character
                         

<_sre.SRE_Match object; span=(0, 4), match='Cats'>

### Special Characters in Regex

There are roughly eleven special characters in Regex with special meanings. <br>
1. Period (.): Matches any single character except single line character. <br>
2. Lowercase w (\w): Matches any single letter, digit or underscore. <br>
3. Uppercase w (\W): Matches any character not a part of lowercase w. <br>
4. Lowercase s (\s): Matches a single whitespace character. <br>
5. Uppercase s (\S): Matches any character not part of lowercase s. <br>
6. Lowercase t (\t): Matches a single tab character. <br>
7. Lowercase n (\n): Matches new line. <br>
8. Lowercase d (\d): Matches decimal digit 0-9. <br>
9. Caret (^): Matches a pattern at the start of the string. <br>
10. Dollar ($): Matches a pattern at the end of string. <br>
11. Square brackets ([]): 



In [1]:
txt = 'We are learning Regular Expressions (Regex)!'

In [17]:
re.search('\w+',txt)

'We'

In [20]:
re.findall('\W',txt)

[' ', ' ', ' ', ' ', ' ', '(', ')', '!']

In [22]:
re.findall('Regular',txt)

['Regular']

In [28]:
re.findall('^We+',txt)

['We']

In [32]:
re.findall('[A-z]+',txt)

['We', 'are', 'learning', 'Regular', 'Expressions', 'Regex']

In [35]:
re.findall('[A-z\s*]+?Regular',txt)

['We are learning Regular']

###  Wildcards in Regex

1. (+): Checks for one or more character to its left. <br>
2. (*): Checks for zero or more characters to is left. <br>
3. (?): Checks for exactly zero or one characters to its left. <br>
4. {x}: Repeat exactly x number of times. <br>
5. {x,}: Repeat atleast x times or more. <br>
6. {x,y}: Repeat atleast x times but not more than y. <br>

In [39]:
txt2 = 'Phone Number:+917564861236, Fax: A8955PA'

In [43]:
re.findall('\+91[0-9]{10}',txt2)

['+917564861236']

### Functions in Regex

1. re.search(pattern, string, flags=0) <br>
2. re.match(pattern, string, flags=0) <br>
3. re.findall(pattern, string, flags=0) <br>
4. re.sub(pattern, repl, string, count=0, flags=0) <br>
5. re.compile(pattern, flags=0) <br>


###### 1. re.match(pattern,string,flags = 0)

Finds a match if it occurs at the beginning of the string. If no match is found, it returns None.

###### 2 re.search(pattern, string,flags = 0) <br>
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern


###### 3. re.findall(patteren, strings, flags = 0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. 

###### 4. re.sub(pattern,repl, string, count = 0, flags = 0)

This function helps to replace substrings in a particular string.

###### 5. re.compile(pattern, flags = 0)
If you want to use the same regular expression more than once, you can compile it into a regular expression object, which can be used for matching using its match() and search() methods.