# RegEx

A Regular Expression (RegEx) is a sequence of characters that defines a search pattern.

For Example,

```md
^a...s$

```

The above code defines a RegEx pattern. The pattern is: any five letter string starting with `a` and ending with `s`.

## python module for Regular Expression

python has `re` module to deal with regular expression

In [2]:
# import re
import re

In [6]:
# checking if there is match for `^a...s$`

pattern = '^a...s$'
string = "abbas"

result = re.match(pattern, string)

if result:
    print('matched')
else:
    print('No match')

matched


### Python re.match method

re.match --> checks if there is match for given pattern 

* `if match found` --> returns `match` object
* otherwise ---> returns `None`

## Specify Pattern Using RegEx


To create Regular Expression Patterns we use `MetaCharacters`

### MetaCharacters
Metacharacters are characters that are interpreted in a special way by a RegEx engine. 

Here's a list of metacharacters:

**[] . ^ $ * + ? {} () \ |**

### 1. **`[]` - Square brackets**

Square brackets specifies a set of characters you wish to match.

| Expression | String | Matched       |
|------------|--------|---------------|
|            | hi     | No Match      |
|[abcde]     | a man  | Matched('a')  |
|            | boss   | Matched('b')  |
|            | ABP New| No Match      |



above expression check if either `a, b, c, d, or e` is present in the string if matches for any character in character set returns match object


#### You can also specify a range of characters using - inside square brackets.
* [abcdef] - [a-z]
* [0123456789] - [0-9]
* [A-Z]
* [a-z]
* [0-59] - same as [0123459]
* [a-zA-Z]

#### You can complement (invert) the character set by using caret ^ symbol at the start of a square-bracket.
* [^a-z] - No small case alphabets
* [^0-9] - No digits


<h3 style="color:red; font-weight:800">Note</h3>


if you use square bracket pattern with re.match it will check whether the string starts with any of the character present within character set.

In [30]:
# check for above strings

pattern = "[abcde]"
strings = ["hi", "a man", "boss", "ABP New"]

for string in strings:
    print(f"{string}\t", "matched" if re.match(pattern, string) else "No Match")

hi	 No Match
a man	 matched
boss	 matched
ABP New	 No Match


### 2. **`.` Period**

A period matches any <b style="color:yellow">single character</b> (except newline '\n').

To match newline '\n' we need to add DOTALL Flag.


| Expression | String | Matched       |
|------------|--------|---------------|
|            | hi     | No Match      |
|   `...`    | man    | Matched       |
|            | boss   | Mached        |
|            | ABP    | Matched       |


in above pattern it will check for minimum three characters, if more characters are present still it will return matched


**To check for only three Characters** we need to modify the pattern as `...$`

In [7]:
strings = ["hi", "man", "boss", "ABP"]
pattern = "..."

for string in strings:
    print(f'{string}\t', "Matched" if re.match(pattern, string) else "No Match")

hi	 No Match
man	 Matched
boss	 No Match
ABP	 Matched


In [13]:
# check for only three characters

strings = ["hi", "man", "boss", "ABP"]
pattern = "...$"

for string in strings:
    print(f'{string}\t', "Matched" if re.match(pattern, string) else "No Match")

hi	 No Match
man	 Matched
boss	 No Match
ABP	 Matched


### 3.**`^` Caret**

The caret symbol <b style="color:yellow">^</b> is used to check if a string <b style="color:yellow">starts with a certain character or set of characters</b>

| Expression | String | Matched       |
|------------|--------|---------------|
|            | ganesh | Matched       |
| `^g.....`  | gunkesh| Matched       |
|            | gaaaah | Matched       |
|            | ganes  | No Match      |
| `^ab`      | abc    | Matched       |
|            | acb    | No Match      |

In [18]:
strings = ["ganesh", "gunkesh", "gaaaah", "ganes" ]

pattern = "^g....."

for string in strings:
    print(f"{string}\t", "Matched" if re.match(pattern, string) else "No Match")

print("\n")
    
pattern = '^ab'
strings = ["ab", "abc", "acb", "ababc"]
for string in strings:
    print(f"{string}\t", "Matched" if re.match(pattern, string) else "No Match")


ganesh	 Matched
gunkesh	 Matched
gaaaah	 Matched
ganes	 No Match


ab	 Matched
abc	 Matched
acb	 No Match
ababc	 Matched


### 4.**`$` Dollar**

The `$` symbol used to check if string <b style="color:yellow">ends with a certain character or set of characters </b>


| Expression | String | Matched? |
|------------|--------|----------|
| `b$`       | b      | Matched  |
|            | ab     | Matched  |
|            | bba    | No Match |
| `..ab$`    | fdab   | Matched  |
|            | abfb   | No Match (endswith b but not preceded with a)|
|            | aab    | No Match (endswith ab, but only three characters) |

In [13]:
import re

strings = ["b", "ab", "bba"]
pattern = "b$"

# compiled = re.compile(pattern)

for string in strings:
    print(f"{string}\t", "Matched" if re.search(pattern,string) else "No Match")


b	 Matched
ab	 Matched
bba	 No Match


### 4.**`*` Star**

The `*` symbol used to match <b style="color:yellow">zero or more</b> occurences of the pattern <b style="color:yellow">left to it</b>


| Expression | String | Matched       |
|------------|--------|---------------|
|            | hi     | Matched(0 occurences)|
|     ma*    | a man  | Matched( 2 occurences of 'a')  |
|            | boss   | Matched(0 occurences)  |
|            | ABP New| Matched(0 occurences)      |

In [29]:
# using * symbol
# it will match even if zero occurences of pattern
# i.e for "web"

strings = ["man", "main", "manan", "mnam", "web"]
pattern = "(ma)*"

for string in strings:
    print(f"{string}\t", "Matched" if re.match(pattern, string) else "No Match")

man	 Matched
main	 Matched
manan	 Matched
mnam	 Matched
web	 Matched


### 5.**`+` Plus**

The `+` symbol is used to match <b style="color:yellow"> one or more</b> occurences of pattern <b style="color:yellow">left to it</b>

| Expression | string | Matched? |
|------------|--------|----------|
| `ma+n`     | man    | Matched  |
|            | mn     | No Match (no a present)|
|            | maan   | Match    |


In above pattern it will match 'm' followed by one or more occurences of 'a' followed by 'n'

In [60]:
strings = ["man", "mn","manan"]
pattern = "ma+n"

for string in strings:
    print(f'{string}\t', "Matched" if re.match(pattern, string) else "No Match")

man	 Matched
mn	 No Match
manan	 Matched


### 6. `?` **Question Mark**

The `?` symbol is used to match <b style="color:yellow">zero or one</b> occurences of pattern <b style="color:yellow">left to it</b>.


| Expression | String | Matched? |
|------------|--------|----------|
|  `ab?`     | abc    | Match    |
|            | abbc   | Match(a is followed by one occurence of 'b' rest is neglected)    |
|            | aabc   | Match    |
|            | aac    | Match (zero occurence of 'b')|
|            | bc     | No match (No 'a' ) |


In [67]:
strings = ["abc", "abbc", "aabc", "aac", "bc"]
pattern = "ab?"

for string in strings:
    print(f"{string}\t", "Matched" if re.match(pattern, string) else "No Match")

abc	 Matched
abbc	 Matched
aabc	 Matched
aac	 Matched
bc	 No Match
