# Regular Experessions
Regular expressions are a way to describe sets of strings that meet certain criteria, and are incredibly useful for pattern matching.
## Character classes
A character class makes it possible to search for **any one** of a set of characters, you can **specify the set** or **use pre-defined** sets

|Class| Explanation |
| --- | --- |
| ```[abc]``` | Matches **a, b, or c** |
| ```[a-z]``` | Matches **ANY** character **between a and z**|
| ```[^A-Z]``` | Matches **ANY** character that is **NOT** between **A and Z**|
| ```\d``` | Matches **digits**, equivalent to ```[0-9]``` |
| ```\D``` | Matches any **NON-digits**, opposite of ```\d```, shortcuts below also have opposites|
| ```\w``` | Matches **ANY word characters**, equivalent to ```[A-Za-z0-9_]``` (*opposite of ```\W```*)|
| ```\s``` | Matches **white space characters** (spaces, tabs, linebreaks), (*opposite of ```\S```*) |
| ```.``` | Matches **ANY** non-newline character |

Character classes can be combined, like in ```[a-zA-Z0-9]```.


## Combining Patterns
There are multiple ways to combine patterns together in ReExp

|Combo| Explanation |
| --- | --- |
| ```AB``` | A match for **A followed immediately by one for B**. **Example**: **```x[.,]y```** matches ```x.y``` or ```x,y``` |
|```A\|B```| Matches **either A or B**. **Example**: **```\d+\|Inf```** matches either a sequence of digits or Inf|

A pattern can be followed by one of these **quantifiers** to **specify how many instance** of that pattern **can occur**

| Quantifiers Symbol| Explanation |
| --- | --- |
| ```a*``` | **zero or more** occurences of ```a``` |
| ```a+``` | **one or more** occurences of ```a```|
| ```a?``` | **zero or one** occurences of ```a``` |
| ```a{2, 6}``` | **2 ~ 6** occurences (including 2 and 6) of ```a```, additionaly, ```a{2, }```means **At least 2** occurences of ```a```, and ```a{2}``` means **Exactly 2** occurences of ```a```|


## Groups
* **Parentheses** are used similarly as in arithmetic expressions, to **create groups**. For example, ```(Mahna)+``` matches strings with **1 or more "Mahna"**, like ```"MahnaMahna"```. 
* As a comparison, **Without** the **parentheses**, ```Mahna+``` would match strings with **"Mahn"** followed by 1 or more "a" characters, like **"Mahnaaaa"**.

## Anchors
Anchors are **unique** in that they **don't match characters** - instead, they **match positions** in a string where an expression could land

| Anchors | Explanation |
| --- | --- |
| ```^``` | matches the **beginning** of a string |
| ```$``` | matches the **end** of a string|
| ```\b``` | matches the **word boundary** (whitespace, punctuation)|

## Special Characters
The following **special characters** are used above to denote types of patterns:

```\ / ( ) [ ] { } + * ? | $ ^ .```

This means if you want to match one of those characters, you have to escape it using a backslash. For example, when trying to match ```"(1+3)"```, the corresponding **pattern** should be:

```python
r'\(1\+3\)'
```

In [22]:
import re

## Q1: Email Domain Validator
Create a regular expression that makes sure a given string email is a **valid email address** and that **its domain name is in the provided list** of domains.

* An email address is valid if it contains letters, number, or underscores, followed by an @ symbol, then a domain.
* All domains will have a 3 letter extension following the period.

```python
    >>> email_validator("oski@berkeley.edu", ["berkeley.edu", "gmail.com"])
    True
    >>> email_validator("oski@gmail.com", ["berkeley.edu", "gmail.com"])
    True
    >>> email_validator("oski@berkeley.com", ["berkeley.edu", "gmail.com"])
    False
    >>> email_validator("oski@berkeley.edu", ["yahoo.com"])
    False
    >>> email_validator("xX123_iii_OSKI_iii_123Xx@berkeley.edu", ["berkeley.edu", "gmail.com"])
    True
    >>> email_validator("oski@oski@berkeley.edu", ["berkeley.edu", "gmail.com"])
    False
    >>> email_validator("oski@berkeleysedu", ["berkeley.edu", "gmail.com"])
    False
```

In [23]:
def email_validator(email, domains):
    pattern = r"(\w+@)(\w+\.\w+)"
    valid = False
    for domain in domains:
        if re.fullmatch(pattern, email) and re.search(pattern, email).group(2) == domain:
            valid = True
        else:
            valid = valid
    return valid

In [24]:
email_validator("oski@berkeley.edu", ["berkeley.edu", "gmail.com"])

True

In [25]:
email_validator("oski@gmail.com", ["berkeley.edu", "gmail.com"])

True

In [26]:
email_validator("oski@berkeley.com", ["berkeley.edu", "gmail.com"])

False

In [27]:
email_validator("oski@berkeley.edu", ["yahoo.com"])

False

In [28]:
email_validator("xX123_iii_OSKI_iii_123Xx@berkeley.edu", ["berkeley.edu", "gmail.com"])

True

In [29]:
email_validator("oski@oski@berkeley.edu", ["berkeley.edu", "gmail.com"])

False

In [30]:
email_validator("oski@berkeleysedu", ["berkeley.edu", "gmail.com"])

False

## Q2: Basic URL validation

```python
>>> match_url("https://cs61a.org/resources/#regular-expressions")
True
>>> match_url("https://pythontutor.com/composingprograms.html")
True
>>> match_url("https://pythontutor.com/should/not.match.this")
False
>>> match_url("https://link.com/nor.this/")
False
>>> match_url("http://insecure.net")
True
>>> match_url("htp://domain.org")
False
```


In [126]:
def match_url(text):

    scheme = r'http://|https://'
    domain = r'\w+\.\w{3}'
    path = r'(/\w+)*(\.\w+)?'
    anchor = r'(/\#[\w-]+)?$'
    full_string = scheme + domain + path + anchor
    return bool(re.match(full_string, text))

In [127]:
scheme = r'http://|https://'
print(re.search(scheme, 'http://'))

<re.Match object; span=(0, 7), match='http://'>


In [128]:
match_url("https://cs61a.org/resources/#regular-expressions")

True

In [129]:
match_url("https://pythontutor.com/composingprograms.html")

True

In [130]:
match_url("https://pythontutor.com/should/not.match.this")

False

In [131]:
match_url("https://link.com/nor.this/")

False

In [132]:
match_url("http://insecure.net")

True

In [133]:
match_url("htp://domain.org")

False

# SQL