- Regular Expressions
- Case Sensitivity
- Cleaning Up User Input
- Extractiong User Input
- Summing Up


# Regular Expressions
Regular expressions or "regexes" will enable us to examine patterns within our code. 
For example, we might want to validate that an email address is formatted correctly.
Regular expressions will enable us to examine expressions in this fashion.

In [None]:
email = input("What's your email?").strip()

if "@" in email:
    print("Valid")
else:
    print("Invalid")

In [None]:
email = input().strip()

if "@" in email and "." in email:
    print("Valid")
else:
    print("Invalid")

In [None]:
# Improve

email = input().strip()

username, domain = email.split("@")

if username and "." in domain:
    print("Valid")
else:
    print("Invalid")

In [None]:
email = input("What's your email?").strip()

username, domain = email.split("@")

if username and domain.endswith(".edu"):
    print("Valid")
else:
    print("Invalid")

In [None]:
# 使用Re库

import re

email = input("What's your email?").strip()

if re.search("@", email):
    print("Valid")
else:
    print("Invalid")

## Vocabulary
\. any character except a new line
\* 0 or more repetitions
\+ 1 or more repetitions
? 0 or 1 repetition
{m} m repetitions
{m,n} m-n repetitions

In [None]:
import re

email = input().strip()

if re.search(".+@.+", email):
    print("Valid")
else:
    print("Invalid")

In [None]:
import re

email = input("What's your email?").strip()

if re.search(".+@.+.edu", email):
    print("Valid")
else:
    print("Invalid")

In [ ]:
import re

email = input("What's your email?").strip()

if re.search(".+@.+\.edu", email):  # 使用转义字符
    print("Valid")
else:
    print("Invalid")


## Row String

In [ ]:
import re

email = input("What's your email?").strip()

if re.search(r"^.+@.+\.edu", email):
    print("Valid")
else:
    print("Invalid")


# 解决开头和结尾的匹配
^ matches the start of the string
$ matches the end of the string or just before the newline at the end of the string

In [None]:
import re

email = input().strip()

if re.search(r"^.+@.+\.edu$", email):
    print("Valid")
else:
    print("Invalid")

In [None]:
import re

email = input("What's your email? ").strip()

if re.search(r"^.+@.+\.edu$", email):
    print("Valid")
else:
    print("Invalid")

## 集合和补集
[] set of characters
[^] set of characters

In [None]:
import re

email = input().strip()

if re.search(r"^[^@]+@[^@]+\.edu$", email):
    print("Valid")
else:
    print("Invalid")

In [None]:
import re

email = input().strip()

if re.search(r"^[a-zA-Z0-9_]+@[a-zA-Z0-9_]+\.edu$", email): # `_` just is a `_` symple
    print("Valid")
else:
    print("Invalid")

In [None]:
import re

email = input().strip()

if re.search(r"^\w+@\w+\.edu", email):  # `\w` is same with `[a-zA-Z0-9_]`
    print("Valid")
else:
    print("Invalid")

# Additional patterns 
\d decimal digit
\D not a decimal digit
\s whitespace character
\S not a whitespace character
\w word character, as well as numbers and the underscore
\W not a word chatacter

In [ ]:
import re

email = input().strip()

if re.search(r"^\w+@\w.+\.(com|edu|gov|net|org)$", email):  # `|` has the impact of `or`
    print("Valid")
else:
    print("Invalid")

## Adding even more symbols to our vocabulary
A|B either A or B
(...) a group
(?:...) non-capturing version

## Case Sensitivity
To illustrate how you mighr adress issues around case sensitivity, where there is a difference between EDU and edu 

In [None]:
import re

email = input().strip()

if re.search(r"^\w+@\w+\.edu$", email):
    print("Valid")
else:
    print("Invalid")

## Some built-in flag variables are:
re.IGNORECASE
re.MULTILINE
re.DOTALL

In [None]:
import re

email = input().strip()

if re.search(r"^\w+@\w+\.edu$", email, re.IGNORECASE):
    print("Valid")
else:
    print("Invalid")

In [None]:
# 解决出现比如`malan@cs50.harvard.edu`

import re

email = input().strip()

if re.search(r"^\w+@(\w+\.)?\w+\.edu$", email, re.IGNORECASE):
    print("Valid")
else:
    print("Invalid")
# """
# Notice `(\w+\.)?` communucates to the compiler that this new expression can be there once or not at all.
# Hence, both `malan@cs50.harvard.edu` and `malan@harvard.edu` are considered valid.
# """

In [ ]:
re.search(r"^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$")

Interestingly, the full expression that one would have to type to ensure that a valid email is :
`^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|{~-]+@[a-zA-Z0-9](`

## Clean Up User Input
You should never expect your user to always follow your hopes for celean input. Indeed, user will often voilate your intentions as a programmer.

In [None]:
name = input("What's your name?").strip()
print(f"hello, {name}")

In [None]:
name = input().strip()
if "," in name:
    last, first = name.split(",")
    name = f"{first} {last}"
print(f"hello, {name}")

In [None]:
import re

name = input().strip()
matches = re.search(r"^(.+),(.+)$", name)
if matches:
    last, first = matches.groups()
    name = first + " " +last
print(f"hello, {name}")

In [None]:
import re

name = input().strip()
matches = re.search(r"^(.+), (.+)$", name)
print(matches)
if matches:
    name = matches.group(2) + " " + matches.group(1)
print(f"hello, {name}")

In [None]:
import re

name = input().strip()
matches = re.search(r"^(.+), *(.+)$", name)
if matches:
    name = matches.group(2) + " " +matches.group(1)
print(f"hello, {name}")

In [None]:
import re

name = input("What's your name?").strip()
if matches := re.search(r"^(.+), *(.+)$", name):
    name = matches.group(2) + " " + matches.group(1)
print(f"hello, {name}")

# Extracting User Input
Now, let's extract some specific information from user input

In [ ]:
# twitter.py

url = input("URL: ").strip()
print(url) # Extracting User InputNow, let's extract some specific information from user input

In [17]:
url = input("URL:").strip()

username = url.replace("https://twitter.com/","")

print(f"Username: {username}")

Username: davidjmalan


In [19]:
url = input("URL:").strip()

username = url.removeprefix("https://twitter.com/")

print(f"Username: {username}")


Username: tps://twitter.com/davidjmalan


In [21]:
import re

url = input().strip()

username = re.sub(r"https://twitter.com/", "", url)
print(f"Username: {username}")

Username: davidjmalan


In [ ]:
import re

url = input().strip()

username = re.sub(r"^(https?://)?(www\.)?twitter\.com/", "", url)
print(f"Username: {username}")


In [22]:
import re

url = input().strip()

matches = re.search(r"^https?://(www\.)?twitter\.com/(.+)$", url, re.IGNORECASE)
if matches:
    print(f"Username: {matches.group(2)}")

Username:davidjmalan


In [24]:
import re

url = input().strip()

if matches := re.search(r"^https?://(?:www\.)?twitter\.com/(.+)$", url, re.IGNORECASE):
    print(f"Username: {matches.group(1)}")


Username: davidjmalan


In [25]:
import re

url = input().strip()

if matches := re.search(r"^https?://(?:www\.)?twitter\.com/([a-z0-9_]+)", url, re.IGNORECASE):
    print(f"Username: {matches.group(1)}")


Username: davidjmalan


# Summing Up
- Regular Expressions
- Case Sensitivity
- Clean Up User Input
- Extractiong User Input