# REGULAR EXPRESSION
are a powerful language for matching text patterns. This page gives a basic introduction
to regular expressions themselves sufficient for our Python exercises and shows how regular
expressions work in Python. The Python "re" module provides regular expression support.

`re.search(pattern, string, flags = 0)`

`.` any character except a newline
`*` 0 or more repetitions
`+` 1 or more repetitions
`?` 0 or 1 repetition
`{m}` m repetitions
`{m, n}` m - n repetitions

`r` tells python to treat a certain string as a raw string in regular expression
`\` the escape character(backslash) is a way to convey that you want a new line
`^` matches the start of the string
`$` matches the end of the string or just before the newline at the end of the string
`[]` set of characters
`[^]` complementing the set. eg [^a] means accept every character except the `a` character
`[a-zA-Z0-10_]` accepts characters `a` to `z`, capital `A` to `Z`, numbers `0` to any `number`,
     and  underscore `_`
`\w` represents a word character which is an alpha numeric symbol or the underscore
`\d` decimal digit
`\D` not a decimal digit
`\s` whitespace characters
`\S` not a whitespace  character
`\w` word character ... as well as numbers or numerics and the underscore
`\W` not a word character
`A|B` either A or B
`(...)` a group of words. E.g.
`(?:...)` non-capturing version

 `=> flags`
re.IGNORECASE
re.MULTILINE
re.DOTALL

NB. instead of `r""`, we can use `f""`so as to include variables and other important stuff
`re.sub(pattern, repl, string, count=0, flags=0)` is used to substitute one thing for another

`re.split(pattern, string, maxsplit=0, flags=0` splits a string not using a specific character(.,?) but multiple characters
`re.findall(pattern, string, flags=0)` allows for the search of multiple copies of the same pattern
in different places in the string


In [2]:
import re

def validateEmail():
    email = input("What's your email? ").strip()
    # if re.search(r"^[a-zA-Z0-10_]+@[a-zA-Z0-10_]+\.com$", email):
    if re.search(r"^(\w|\.)+@(\w+\.)?\w+\.(com|edu|gov)$", email, re.IGNORECASE):
        """
        (\w+\.)? means we can accept more words, underscores, characters and
        a literal . or we can accept nothing in the square bracket

         we can represent [a-zA-Z0-10_] with \w
         and it will mean the same thing

         We can also accept more literal strings by including
         them in a parenthesis
         (com|edu|gov|net)

         ^ means the start of our string
         + means 1 or more of the things to the left
         [^@] means any character except the @ sign
        """

        print("Valid")
    else:
        print("invalid")

validateEmail()

Valid


In [3]:
import re


str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w\b', str)
# If-statement after search() tests if it succeeded
if match:
  print('found', match.group()) ## 'found word:cat'
else:
  print('did not find')


found word:cat



The `r` at the start of the pattern string designates a python `raw` string
which passes through backslashes without change which is very handy for
regular expressions (Java needs this feature badly!).
I recommend that you always write pattern strings with the 'r' just as a habit.



In [4]:
import re

def format():
    name = input("What is your name? ").strip()

    # walrus operator
    """
    eliaa = "No!"
    if eliaa:
        print("I am Fine") # OR

    if elia:= "Yes":
        print("I am Good")
    """

    if matches:= re.search(r"^(.+), ?(.+)$", name):
        """
        In regular expression, counting using indexes starts from  1 and not 0
        """
        name = matches.group(2) + " " + matches.group(1)
    print(f"hello, {name}")

format()

hello, Elijah Owusu


In [4]:
import re

# re.sub(pattern, repl, string, count=0, flags=0)

def extractUsername():
    url = input("URL: ").strip()
    username = url.replace("https://twitter.com/", "")
    username1 = url.removeprefix("https://twitter.com/")
    # print(f"Username: {username}")

    username2 = re.sub(r"^(https?://)?(www\.)?twitter.com/", "", url)
    # print(f"Username: {username2}")

    if matches:=re.search(r"^https?://(?:www\.)?twitter\.com/([a-z0-9_]+)", url, re.IGNORECASE):
        # ?:www\. means we don't need to capture the www string when printing
         print(f"Username: {matches.group(1)}")
    else:
        print("I need specifically, your twitter address!")
extractUsername()

Username: elia
