# Regular Expression 
A Regular Expression or RegEx is a sequence of characters that form a search pattern.

RegEx can be used to check if a string contains the specified search pattern or not.

## RegEx Module 
Python has a built-in package called re, which can be used to work with regular expressions.

after importing re module we can start using regular expressions.


In [4]:
import re 
txt = "Feel the pain"

data = re.search("pain$",txt)

if data:
    print("Yes!")

else:
    print("No match!!!")


Yes!


# RegEx functions
The re module offers a set of functions that allows us to search a string for a match

|Function    |Description                                              |
|------------|---------------------------------------------------------|
| findall()  | Returns a list of containing all matches.                |
| search()   | Returns a match object if there is a match anywhere in the string.|
| split()    | Returns a list where the string has been split at each match.|
| sub()      | Replaces one or many matches with a sting.|

# Metacharacters
Metacharacters are characters with a special meaning.

|Character |               Description                                                | Example         |
|----------|--------------------------------------------------------------------------|-----------------|
| [ ]      | A set of characters                                                      | [a-m]           |
|  \       | Signals a special sequence can also be used to escape special characters | "\d"            |
|  .       | Any character except newline character                                   | "S....a         |
|  ^       | Starts with                                                              | "^he"           |
|  $       | Ends with                                                                | "pain$"         |
|  *       | Zero or more occurence                                                   | "sh.*a          |
|  +       | One ore more occurence                                                   | "ki.+g"         |
|  ?       | Zero or one occurence                                                    | "lo.?e"         |
| { }      | Exactly the specified number of occurence                                | "i.{2}q         |
| Pipe     | Either or                                                                |"happypipesad"   |
| ( )      | Capture and group     it captures the values and retruns them and they can be accessed by using the .group() method                                                    | ("s.{2}g.+song$)|
| (?: )    | Non-capturing                                                            | (?:a).*+sh      |


pipe means the "|" character.

# Flags
We can add flags to the pattern when using regular expressions.

| Flag              |   short hand  |   Description                        |
|-------------------|---------------|--------------------------------------|
| re.ASCII          |   re.A        | Returns only ASCII matches. 
| re.DEBUG          |               | Returns debug info.
| re.DOTALL         |   re.S        | Makes the (.) character match all characters including newline.
| re.IGNORECASE     |   re.I        | Case-Insensitive matching.
| re.MULTILINE      |   re.M        | Returns only matches at the beginning of each line.
| re.NOFLAG         |               | Specifies that no flag is set for this pattern.
| re.UNICODE        |   re.U        | Returns Unicode matches, this is defaul from python3.
| re.VERBOSE        |   re.X        | Allows whitespace and comments Inside patters make the pattern more readable|

# Special Sequences
A special sequence is a \ followed by one of the character and has a special meaning.

| Characters |          Description                         |   Example    |
|------------|----------------------------------------------|--------------|
|   \A       | returns a match if the specified characters are at the beginning of the string.| "\AThe" |
|   \b       | returns a match where the special character are at the beggining or at the end of the word. |     r"\bpain"   or r"pain\b" |
|   \B       | returns a match where the special character is not present neither the beggining nor the end.  |     r"\Bpain"   or r"pain\B" |  
|   \d       | returns a match where the string contains digits (0-9).           |   "\d"    |
|   \D       | returns a match where the string doesn not contains digits.       |   "\D"    |
|   \s       | returns a match where the string contains a white space character.|   "\s"    |
|   \S       | returns a match where the string doesn't contains white space.    |   "\S"    |
|   \w       | returns a match where the string contains any word containing alphabets,numbers and underscore character.     |   "\w"    |
|   \W       | returns a match where the string does not contains any word containing alphabets,numbers and underscore character.     |   "\w"    |
|   \Z       | returns a match if the specified character are at the end of the string. |    "brain\Z" |

The "r" in the beggining is making sure tha the string is being treated as a raw string.

# Sets 
A set is a set of characters inside square brackets [] with a special meaning.
|   Set                     |           Description                                               |
|---------------------------|---------------------------------------------------------------------|
|   [amz]                   |   returns a match where one of the specified character is present.  |
|   [a-n]                   |   returns a match for any lower case character between a & n.       |
|   [^amz]                  |   returns a match for any character except a,m and z.               |
|   [1762]                  |   returns a match where any of the specified digits are present.    |
|   [0-9]                   |   returns a match where any of the digit between 0 & 9.             |
|   [0-5][0-7]              |   returns a match for any two digit number from 00 to 57.           |
|   [+]                     |   returns a match for any + character in the string.                |


# findall() method
It returns a list containing all matches.
- The list contains the matches in the order they are found.
- If no matches are found an empty list is returned.

In [18]:
new = re.findall("in|an","loving someone can also be painful.")
print(new)


['in', 'an', 'in']


In [19]:
print(re.findall("happy","Loving you is a losing game."))


[]


# search() method
- It searches the string for a match, and return a match object if there is a match.
- If there is more than one match, only the first occurene of the match will be returned.
- If no matches are found, the value None is returned.


In [26]:
new = re.search(r"\s","the king is back.")

if new:
    print("Yes!!!")
else:
    print(new)
    


Yes!!!


In [28]:
print(re.search(r"spiderman","The king is back."))


None


# Match Object
A match object is an object containing info about the search and the result.

__Note__ : If there is no match the value None will be returned insted of the match object.

In [42]:
txt = "I want freedom, even if it means taking it away from someone else."

new = re.search("freedom",txt)
print(new)



<re.Match object; span=(7, 14), match='freedom'>


The match object has properties and methods used to retrieve info. about the search and the matched result.

- .span() => returns a tuple containing the start and position of the match.
- .string => returns the string passed into the function.
- .group() => returns the part of the string where there was a match.

In [44]:
print("Position of matching:",new.span())
print("The string:",new.string)
print("The matched value:",new.group())


Position of matching: (7, 14)
The string: I want freedom, even if it means taking it away from someone else.
The matched value: freedom


# split() method
This returns a list where the string has been split of each match.

In [30]:
txt ="Eren wants freedom but the sad part is freedom is a lie."

new = re.split("freedom",txt)
print(new)


['Eren wants ', ' but the sad part is ', ' is a lie.']


### We can control the number of occurence by specifying the maxsplit= parameter.


In [None]:
new = re.split("freedom",txt,1)   # only matches one freedom 
print(new)


['Eren wants ', ' but the sad part is freedom is a lie.']


# sub() method
This method replaces the matches with the text of our choice.


In [38]:
txt = "Eren: I hate you mikasa the most. this hate is real from the starting, i hate the person."
new = re.sub("hate","love",txt)
print(new)


Eren: I love you mikasa the most. this love is real from the starting, i love the person.


We can control the number of replacement by specifying the count parameter.

In [39]:
new = re.sub("hate","love",txt,2)
print(new)


Eren: I love you mikasa the most. this love is real from the starting, i hate the person.
