# Regular expressions  

Regular expressions are used to find text patterns

can help with:
- editing code
- verifying users input
- scrapping web etc

## Raw String in Python:  
Raw Sting in Python - is a string with prefix "r". Such strings ignore "\" inside  
For example, if we print:

In [2]:
print('\tTab')

	Tab


We can see that \t is interpreted as the actual Tab  
But a raw string will just print all characters:

In [3]:
print(r'\tTabl')

\tTabl


Such Strings are used to specify Text patterns in regular expressions

# Verigying a user's e-mail

One of the most popular example of Regular Expressions usage - Email validation  
Emails have very specific requirements and therefore can be checked using Regular Expressions

In [4]:
import re

emails:  
1st part of email (Uppercase letters or Lowercase letters or numbers):  

In [5]:
pattern = r"[a-zA-Z0-9]"

[] brackets allow to specify that we are looking for a characters in a given group. This group starts at the beginning of the word and ends when we encounter at @ symbol

In [6]:
pattern = r"[a-zA-Z0-9]+@"

+@ means that we are looking at one or more of @ character

In [7]:
pattern = r"[a-zA-Z0-9]+@[a-zA-Z]"

since emails can contain many different domains, we search for any combination of letters, till . symbol

In [8]:
pattern = r"[a-zA-Z0-9]+@[a-zA-Z]+\."

" back slash ." means literally "Dot"

Let's create a python function that receives list of strings and returns a list of Valid emails:

In [4]:
import re
from typing import List

def valid_emails(strings: List[str]) -> List[str]:
    """Take list of potential emails and returns only valid ones"""

    valid_email_regex = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"

    def is_valid_email(email: str) -> bool:
        return bool(re.fullmatch(valid_email_regex, email))

    emails = []
    for email in strings:
        if is_valid_email(email):
            emails.append(email)

    return emails



When we use regular expressions many times, it is compiled every time, which slows down the process

In [5]:
different_emails = [
    'example1@mail.com',
    'example2@mail.com',
]
for one_email in different_emails:
    m = re.search('^\S+@\S+\.\S+$', one_email)
    if m:
        print(one_email)

example1@mail.com
example2@mail.com


how we can make usage of regular expressions more efficient?  
we can compile it ones: and then use compiled regular expression for each iteration:

In [9]:
different_emails = [
    'example1@mail.com',
    'example2@mail.com',
]

regular_expression_object = re.compile('^\S+@\S+\.\S+$')

for one_email in different_emails:
    m = regular_expression_object.search(one_email)
    if m:
        print(one_email)

example1@mail.com
example2@mail.com


Another adjustment may be using List Comprehension instead of For loop. This will reduce code size and memory consumption.

In [10]:
[email for email in different_emails if regular_expression_object.search(email)]

['example1@mail.com', 'example2@mail.com']

In result we will have the following function:  

In [None]:
# valid email function:
import re
from typing import List


def valid_emails(strings: List[str]) -> List[str]:
    """Take list of potential emails and returns only valid ones"""

    valid_email_regex = re.compile(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$")

    return [email for email in strings if valid_email_regex.search(email)]