# Regular Expression

While programming, we often need to search if a string contains a substring or not. For example, we may need to search if a string contains a valid email address or not. Regular expressions are very useful in this case. Regular expressions are a generalized way to match patterns with sequences of characters. Regular expressions are also called regex or regexp.

The topic of regular expressions is huge. In this article, we will learn the basics of regular expressions. We will learn how to use regular expressions in Python.

## Identifier

So I'll give you a very basic syntax below:

<table>
    <tr>
     <td><strong>Identifier</strong>
     </td>
     <td><strong>Meaning</strong>
     </td>
    </tr>
    <tr>
     <td><strong>\d</strong>
     </td>
     <td>Matches any decimal digit, this is equivalent to the set class [0-9].
     </td>
    </tr>
    <tr>
     <td><strong>\D</strong>
     </td>
     <td>Matches any non-digit character.
     </td>
    </tr>
    <tr>
     <td><strong>\s</strong>
     </td>
     <td>Matches any whitespace character.
     </td>
    </tr>
    <tr>
     <td><strong>\S</strong>
     </td>
     <td>Matches any non-whitespace character.
     </td>
    </tr>
    <tr>
     <td><strong>\w</strong>
     </td>
     <td>Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_].
     </td>
    </tr>
    <tr>
     <td><strong>\W</strong>
     </td>
     <td>Matches any non-alphanumeric character.
     </td>
    </tr>
    </tr>
        <td><strong>.<strong>
        </td>
        <td>Matches any character except newline
        </td>
    </tr>
    <tr>
        <td><strong>\b<strong>
        </td>
        <td>Matches word boundary(white space)
        </td>
    </tr>
    <tr>
        <td><strong>\.<strong>
        </td>
        <td>Just a dot
        </td>
    </tr>
</table>


## Modifier

This one extend the regular expression syntax. Modifier makes the search case-insensitive.

<table>
    <tr>
        <td><strong>Modifier</strong>
        </td>
        <td><strong>Meaning</strong>
        </td>
    </tr>
    <tr>
        <td><strong>^</strong>
        </td>
        <td>Matches the beginning of the line.
        </td>
    </tr>
    <tr>
        <td><strong>$</strong>
        </td>
        <td>Matches the end of the line.
        </td>
    </tr>
    <tr>
        <td><strong>*</strong>
        </td>
        <td>Everything it can get
        </td>
    </tr>
    <tr>
        <td><strong>?</strong>
        </td>
        <td>None or one
        </td>
    </tr>
    <tr>
        <td><strong>+</strong>
        </td>
        <td>One or more
        </td>
    </tr>
    <tr>
        <td><strong>{}</strong>
        </td>
        <td>Range
        </td>
    </tr>
    <tr>
        <td><strong>|</strong>
        </td>
        <td>Or
        </td>
    </tr>
    <tr>
        <td><strong>[]</strong>
        </td>
        <td>Set of characters
        </td>
    </tr>
    <tr>
        <td><strong>()</strong>
        </td>
        <td>Group
        </td>
    </tr>
</table>

## Escape Characters

<table>
    <tr>
        <td><strong>Character</strong>
        </td>
        <td><strong>Meaning</strong>
        </td>
    <tr>
        <td><strong>\n</strong>
        </td>
        <td>Newline
        </td>
    </tr>
    <tr>
        <td><strong>\t</strong>
        </td>
        <td>Tab
        </td>
    </tr>
    <tr>
        <td><strong>\r</strong>
        </td>
        <td>Carriage return
        </td>
    </tr>
    <tr>
        <td><strong>\f</strong>
        </td>
        <td>Form feed
        </td>
    </tr>
    <tr>
        <td><strong>\v</strong>
        </td>
        <td>Vertical tab
        </td>
    </tr>
    <tr>
        <td><strong>\ooo</strong>
        </td>
        <td>Octal value
        </td>
    </tr>
    <tr>
        <td><strong>\xhh</strong>
        </td>
        <td>Hex value
        </td>
    </tr>
    <tr>
        <td><strong>\s</strong>
        </td>
        <td>White space
        </td>
    </tr>
</table>

## Applying Regular Expression in Python

### Finding a string

In order to apply these *re* in Python, we need to import the module

In [29]:
import re

text = "The agent's phone number is 408-555-1234, names John"

Imagine that we want to fine the agent name in the example above. We can use the following code:

In [45]:
name = re.findall(r'names (\w+)', text)
print(name)

['John']


As you can see, we use the *re.findall()* function to find all the matches. The first argument is the pattern we want to find, and the second argument is the string we want to search.

At the beginning, we using an *r* character before we write the expression to tell the program that this is a regular expression.

## Matching Strings

What we can also do is to check if a string matches a certain regular
expression

In [47]:
text = "bestmail@hotmail.com"

result = re.fullmatch( r"^[a-zA-Z0-9.!#$%&'*+/=? ^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$" ,
text)

if result:
    print("Valid email")
else:
    print("Invalid email")

Valid email


We're not going to describe the code above, because it's very complicated. But what we see here is a new function called *re.fullmatch()*. This function returns the checked string if it matches the regular expression (A boolean type in this code).

## Manipulating Strings

Finally, we're going to take a look at how we can manipulate strings using regular expressions. We're going to use the *re.sub()* function for this.

In [55]:
text = "The agent's phone number is 408-555-1234, names John"
print(text)

The agent's phone number is 408-555-1234, names John


We'll try to replace the word "agent" with the word "special agent". We can do this with the following code:

In [56]:
# Replace the word 'agent'

text = re.sub(r"agent's", "secret agent's", text)

print(text)

The secret agent's phone number is 408-555-1234, names John


In this example, we replace the "agent"" with "special agent". The first argument is the pattern we want to find, the second argument is the string we want to replace, and the third argument is the string we want to search.