# Regular Expression
A regular expression (abbreviated as Regex) is a pattern used to match, search, and manipulate text based on specific rules. Regex is very useful for finding or changing specific parts of a text string, such as:
* Searching for all phone numbers in a document.
* Extracting email addresses from text.
* Ensuring password formatting.
* Cleaning text from unwanted symbols or characters.

## Brief Examples of Regex Uses
| Purpose | Regex Pattern | Matching Examples |
| ---------------------------------- | -------------------------------- | ------------------------------------- |
| Search for 3-digit numbers | `\d{3}` | `123`, `456` |
| Check if an email is valid | `\w+@\w+\.\w+` | `nama@email.com` |
| Find words that begin with an uppercase letter | `\b[A-Z]\w+` | `Indonesia`, `Python` |
| Replace all spaces with `-` | use `re.sub(" ", "-", text)` | `Hello World` → `Hello-World` |

## Search for the word "application" in a sentence
* __re.search(r'application', s)__ searches for the first occurrence of the word "application" in the string s.
* __match.start()__ returns the index of the first character in the match.
* __match.end()__ returns the index of the character after the last character in the match.

In [1]:
import re

s = 'Facebook: an application that connects people regardless of space and time'

# Search for the word "application" in the string
match = re.search(r'application', s)

# Print the starting index of the match
print('Start Index:', match.start())

# Print the ending index of the match
print('End Index:', match.end())


Start Index: 13
End Index: 24


## Searching for a Number (GPA) in a String
* The string name contains a decimal number representing the GPA.
* The \d+ pattern in the regular expression is used to find one or more digits.
* The __re.search()__ function matches the first digit found in the string.
* __.start()__ and __.end()__ return the index position of the digit.

In [2]:
import re

name = 'Andra with GPA 287'

# Search for the first sequence of digits (the GPA)
match = re.search(r'\d+', name)
print('GPA starts at index:', match.start())
print('GPA ends at index:', match.end())

GPA starts at index: 15
GPA ends at index: 18


## Searching for the Word "ke" in a Sentence
* __re.search(r'ke', A)__ searches for the first occurrence of the word "ke" in the string A.
* The __.start()__ function returns the starting position index of the word.
* The __.end()__ function returns the ending position index of the word.

In [3]:
sentence = 'Today I want to go to the national library'

# Search for the first occurrence of 'to'
match = re.search(r'to', sentence)
print('First index of "to":', match.start())
print('Last index of "to":', match.end())

First index of "to": 13
Last index of "to": 15


## Detecting the First Number in a Sentence
* The string try contains a sentence with a number at the end: 63.
* The \d+ pattern is used to match numbers with one or more digits.
* The __re.search()__ function finds the first occurrence of a number in a string.
* __.start()__ and __.end()__ are used to indicate the position index of the number in the string.

In [4]:
text = 'Hallo my name is Winardi, my house number is 63'

# Search for the first occurrence of a number in the text
match = re.search(r'\d+', text)

# Print the start and end index of the matched number
print('Start index of the number:', match.start())
print('End index of the number:', match.end())

Start index of the number: 45
End index of the number: 47


## Detecting Dates and Specific Words
* __re.search(r'\d+', trial)__ searches for the first number that appears, which is 10.
* Then __re.search(r'lahir', trial)__ is used to find the word "lahir" in the string.
* __.start()__ and __.end()__ return the starting and ending index positions of the matching word "lahir".

In [5]:
trial = "I was born on April 10, 1997"
match = re.search(r'\d+', trial)
print('First date index:', match.start())
match = re.search(r'born', trial)
print('First index:', match.start())
print('Last index:', match.end())

First date index: 20
First index: 6
Last index: 10


## Find All Numbers at Once
* __re.findall(r, trial)__ searches for all patterns that match the expression (\d+).
* This pattern means all numbers with one digit or more.
* The result of __re.findall__ is a list of all the numbers found in the string trial.
* In the example string "I was born on April 10, 1997," two numbers were found: '10' and '1997'.

In [6]:
r = "(\d+)"
match = re.findall(r, trial)
print (match)

['10', '1997']


# Metacharacter
A metacharacter is a special symbol in regular expressions that does not represent a literal character but has a specific function for matching patterns in strings. Metacharacters help make searching flexible and powerful.

Here is a summary of commonly used metacharacters:
| Metacharacter | Meaning | Examples | | |
| ------------- | ----------------------------------------------------- | ----------------------------------------------------- | -------- | -------------------------------- |
| `.` | Represents any single character except a newline | `a.c` matches `abc`, `axc` | | |
| `^` | Marks the beginning of a string | `^Hello` matches if the string begins with "Hello" | | |
| `$` | Marks the end of a string | `rumah$` matches if the string ends with "rumah" | | |
| `*` | **0 or more** of the previous character | `lo*` matches `l`, `lo`, `loo`, etc. | | |
| `+` | **1 or more** of the previous character | `lo+` matches `lo`, `loo`, not `l` | | |
| `?` | **0 or 1 times** of the previous character (optional) | `ru?mah` matches `rmah`, `rumah` | | |
| `[]` | **Character set** (optional character) | `[aeiou]` matches vowels | | |
| `[^]` | **Negation** of the character set | `[^0-9]` matches all but numbers | | |
| `{m}` | **Exactly m times** previous character | `\d{4}` matches any 4-digit number | | |
| `{m,n}` | **Between m and n times** | `a{2,4}` matches `aa`, `aaa`, `aaaa` | | |

## Matching a Single Character with `.`
1. __re.search(r'.', s)__:
* The period (.)` in regex is a metacharacter that matches any single character, except the newline (\n) character.
* The __re.search()__ function searches for the first matching character in the string.
* In the string `'Adgan167@gmail.com'`, the first character is `'A'`, so the search result is `'A'`.
2. __print(match)__ displays the match result object, for example: `<re.Match object; span=(0, 1), match='A'>`

In [7]:
s = 'Adgan167@gmail.com'

match = re.search(r'.', s)
print(match)

<re.Match object; span=(0, 1), match='A'>


## Matching Literal Dot Characters with `\.`
1. __re.search(r'\.', s)__:
* The dot symbol `(.)` as a metacharacter needs to be escaped with `\` to be matched as a literal dot character.
* The pattern `r'\.'` matches an actual dot, not any random character.
* In the string 'Adgan167@gmail.com', there is a dot after gmail, which is 'gmail.com', so the regex matches that dot.
2. __print(match)__ will display the dot match object, for example: `<re.Match object; span=(14, 15), match='.'>`

In [8]:
match = re.search(r'\.', s)
print(match)

<re.Match object; span=(14, 15), match='.'>


## Search for Specific Numbers and Letters Using __re.findall__

### Target String

In [9]:
# Define a multiline string containing student numbers and a phone number
string = """Hello, my student number is 923472387 and
          my friend's student number is 234130324
          phone number of Aditya is 082298221785
       """

# Find all sequences of digits (i.e., numbers) in the text
matches = re.findall(r'\d+', string)

# Print all matched numbers
print("Extracted numbers from the text:")
for number in matches:
    print(number)


Extracted numbers from the text:
923472387
234130324
082298221785


### Digit Search Pattern `(\d+)`
* `\d` is a metacharacter that matches a single digit (0–9).
* `+` means one or more digits.
* __re.findall(r'\d+', string)__ searches for all sequences of digits in a string.
* Result:\
  ['923472387', '234130324', '082298221785']\
  because those are the numbers in the string.

In [10]:
regex = r'\d+'
match = re.findall(regex, string)
print(match)

['923472387', '234130324', '082298221785']


### Search Pattern for Letter a or e
* `[ae]` is a character class, meaning it matches either a or e.
* __re.findall(r'[ae]', string)__ returns all the characters 'a' or 'e' in the string, as a list.
* The result can be a long list like:\
  ['a', 'o', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'e', 'a', 'e', ...]\
  depending on how many times the letter appears.

In [12]:
regex = r'[ae]'
match = re.findall(regex, string)
print(match)

['e', 'e', 'e', 'a', 'e', 'e', 'e', 'e', 'e', 'a']


## re.compile()

## Searching for Specific Characters with re.compile() and findall()

### Creating Regex Patterns with re.compile()
* __re.compile()__ is used to create reusable regex pattern objects.
* [a-e] is a character class that matches any letter from a to e (i.e., a, b, c, d, e).
* __flags=re.IGNORECASE__ makes the search case-insensitive, so it will match both `A-E` and `a-e`.

In [13]:
p = re.compile('[a-e]', flags=re.IGNORECASE)

### Find All Matching Characters
* __findall()__ returns all characters in the string "Muhamad Petra Piyunandra" that match the pattern [a-e], both lowercase and uppercase.
* Matching characters in the string are:\
'a', 'd', 'e', 'b', 'c' (if any).
* Example result:\
  ['a', 'd', 'e', 'a', 'd', 'e', 'a']\
  (depending on the contents of the string).

In [14]:
print(p.findall("Muhamad Petra Piyunandra"))

['a', 'a', 'd', 'e', 'a', 'a', 'd', 'a']


## Match Numbers in Text with \d and \d+

### `\d` Pattern for Searching for Individual Numbers
* `\d` is a metacharacter in regex that matches a single digit (equivalent to [0-9]).
* __findall()__ will search for and return each number as a separate element in the list.

In [15]:
p = re.compile(r'\d')
print(p.findall("I met him at 11:00 a.m. on July 4, 1886"))

['1', '1', '0', '0', '4', '1', '8', '8', '6']


### `\d+` Pattern for Finding Consecutive Numbers
* `\d+` matches one or more consecutive digits.
* The `+` symbol means "one or more."
* Adjacent numbers (such as 11, 00, 1886) are treated as one.

In [16]:
p = re.compile(r'\d+')
print(p.findall("I met him at 11:00 a.m. on July 4, 1886"))

['11', '00', '4', '1886']


## re.split()

__re.split(pattern, string, maxsplit=0, flags=0)__

### Splitting by Non-Alphanumeric Characters
* `r'\W+'` is a regular expression that matches one or more non-word characters (not letters, numbers, or underscores).
* In `'Word, word, Word'`:
  - Non-word characters are `,` and `space`
  - The string is split in two places: after `Word,` and after `word`

In [18]:
print(re.split(r'\W+', 'Word, word, Word'))

['Word', 'word', 'Word']


### Separation based on non-alphanumeric characters again
* Same as before, but with spaces as non-alphanumeric characters.
* Spaces are used as separators.

In [19]:
print(re.split(r'\W+', "Word word Word"))

['Word', 'word', 'Word']


### Split complex sentences based on non-alphanumeric characters
* The `\W+` regex splits a string at every space, comma (,), and colon (:)
* The string is split into text and number parts.

In [21]:
print(re.split(r'\W+', 'On January 12, 2016, at 11:02 AM'))

['On', 'January', '12', '2016', 'at', '11', '02', 'AM']


### Split string by number
* Regex `\d+` means match one or more numbers
* Whenever a number (12, 2016, 11, 02) is found, the string will be cut there

In [23]:
print(re.split(r'\d+', 'On January 12, 2016, at 11:02 AM'))

['On January ', ', ', ', at ', ':', ' AM']


### Split with a limit on the number of splits (maxsplit=1)
* The regular expression `\d+` matches one or more digits.
* The third argument: 1, the maxsplit parameter, means to split only once, in this case after the word "Pada."

In [48]:
print(re.split(r'\d+', 'On January 12, 2016, at 11:02 AM', maxsplit=1))

['On January ', ', 2016, at 11:02 AM']


### Split without limits (default maxsplit)
* No maxsplit argument is given, so all `\d+` (number) patterns will be used for splitting.
* Same as the previous example (no limits)
* Split is performed on all numbers, i.e., 12, 2016, 11, 02

In [25]:
print(re.split(r'\d+','On January 12, 2016, at 11:02 AM'))

['On January ', ', ', ', at ', ':', ' AM']


### Split with flags=re.IGNORECASE
* The pattern `[a-f]+` matches one or more letters from a to f
* __flags=re.IGNORECASE__ matches case-insensitively
* Note: Because matching is case-insensitive (IGNORECASE), the letters A, B, etc. are also counted as matches. The results may appear complex because the pattern `[a-f]+` can match more text.

In [26]:
print(re.split(r'[a-f]+', 'Aey, Boy oh boy, come here', flags=re.IGNORECASE))

['', 'y, ', 'oy oh ', 'oy, ', 'om', ' h', 'r', '']


### Split without flags=re.IGNORECASE (case-sensitive)
* The pattern `[a-f]+` (without IGNORECASE) only matches lowercase a-f
* Capital letters like A, B will not be matched
* Therefore, only lowercase e and a are matched

In [27]:
print(re.split(r'[a-f]+', 'Aey, Boy oh boy, come here'))

['A', 'y, Boy oh ', 'oy, ', 'om', ' h', 'r', '']


## re.sub()

__re.sub(pattern, repl, string, count=0, flags=0)__\
re.sub(pattern, repl, string, count=0, flags=0) replaces all (or some if count is given) occurrences of a RegEx pattern with a new string in the text.

### Replacement with IGNORECASE Flag
* Will replace `'ub'` (case-insensitive) with `'~*'` in the string 'Subject has ordered an Uber'.
* `'ub'` will be recognized in both 'Subject' (Ub) and 'Uber' (Ub).

In [28]:
print(re.sub('ub', '~*', 'Subject has ordered an Uber', flags=re.IGNORECASE))

S~*ject has ordered an ~*er


### Without the IGNORECASE Flag
* Without the flags, it's case sensitive.
* `'ub'` only matches all lowercase letters, it doesn't match `'Ub'`.
* Only `'ub'` in 'Subject' is replaced, not `'Uber'`.

In [29]:
print(re.sub('ub', '~*', 'Subject has ordered an Uber'))

S~*ject has ordered an Uber


### Limit Number of Replacements (count=1)
* Use IGNORECASE, but count=1 limits the number of replacements to 1.
* `'ub'` in `'Subject'` is replaced, while `'Uber'` remains.

In [30]:
print(re.sub('ub', '~*', 'Subject has ordered an Uber', count=1, flags=re.IGNORECASE))

S~*ject has ordered an Uber


### Specific Patterns with the Escape Character `\s`
* Pattern: \sAND\s means space + 'AND' + space.
* 'and' in the sentence does not match because 'AND' ≠ 'dan', unless 'and' is translated to 'and'.
* However, because of IGNORECASE, 'and' matches.
* Replaced with '&'.

In [31]:
print(re.sub('\sAND\s', ' & ', 'Baked Beans dan Spam', flags=re.IGNORECASE))

Baked Beans dan Spam


In [32]:
print(re.sub('\sAND\s', ' & ', 'Baked Beans and Spam', flags=re.IGNORECASE))

Baked Beans & Spam


### Substring replacement 'ti' with 'ty' once

In [33]:
print(re.sub('ti','ty','Aditia adiTia pramana putra', count = 1, flags = re.IGNORECASE))

Aditya adiTia pramana putra


### Replace 'ti' without the IGNORECASE flag

In [34]:
print(re.sub(r'ti','ty','Aditia adiTia pramana putra'))

Aditya adiTia pramana putra


## re.subn()

__re.subn(pattern, repl, string, count=0, flags=0)__

### Substitute blank characters '' with '~*' (twice)
* This will insert `'~*'` twice into the string `'Subject has booked an Uber'` between characters.
* `''` means the empty pattern means the match occurs between every character, including the beginning and end of the string.
* `'~*'` means the replacement string to be inserted.
* `count=2` means only perform the first two substitutions.
* The result is the tuple `('~*~*Subject has booked an Uber', 2)` because only two insertions are performed.

In [35]:
print(re.subn('', '~*', 'The subject has ordered an Uber', count=2))

('~*T~*he subject has ordered an Uber', 2)


### Case-insensitive substitution
* Replace all occurrences of the word `'ub'` with `'~*'` in a case-insensitive manner (re.IGNORECASE).
* `'ub'` matches `'Ub'` in `"Subject"` and `"Uber"`.

In [36]:
print(re.subn('ub', '~*', 'The subject has ordered an Uber', flags=re.IGNORECASE))

('The s~*ject has ordered an ~*er', 2)


## re.escape()

__re.escape(string)__

### Adding escapes to spaces (single non-alphanumeric characters)
* Non-alphanumeric characters mean that only spaces `' '` are considered non-alphanumeric.
* __re.escape()__ will add a backslash `(\)` before each space.

In [37]:
print(re.escape("It's Amazing Even at 1 AM"))

It's\ Amazing\ Even\ at\ 1\ AM


### Escape various non-alphanumeric (complex) characters
* `\[, \]` means to escape the regex meaning of the square brackets (character class).
* `\-` means that the minus sign inside the character class can mean a range, so it is escaped.
* `\,` means that the comma is also escaped even though it is not special in the regex.
* `\\t` means that the tab is changed to two backslashes because \t is originally a tab.
* `\^` means that the caret is the regex anchor symbol (the beginning of the string), escaped to make it literal.

In [38]:
print(re.escape("I Asked What This Was [a-9], He Said \t ^WoW"))

I\ Asked\ What\ This\ Was\ \[a\-9\],\ He\ Said\ \	\ \^WoW


## re.search()

### Regex Pattern Definitions
* `r"..."` means a raw string literal so that the backslash `(\)` is not interpreted as an escape character.
* `([a-zA-Z]+)` means the first group matches one or more letters of the alphabet (month names).
* `(\d+)` means the second group matches one or more digits (day/date).

In [39]:
regex = r"([a-zA-Z]+) (\d+)"

### Matching with re.search()
* __re.search()__ searches for the first occurrence of a regular expression pattern in a string.
* Returns a match object if it matches, or None if it doesn't.

In [40]:
# Python program to demonstrate how re.search() works
import re

# Regex pattern definition:
# - (\d+)         → matches digits (day)
# - ([a-zA-Z]+)   → matches month name (letters only)
regex = r"(\d+) ([a-zA-Z]+)"

# String to be tested
sentence = "I was born on 24 June"

# Searching for the regex pattern in the sentence
match = re.search(regex, sentence)

# If a match is found
if match is not None:
    # Display the start and end index of the match
    print("Match found at index %s, %s" % (match.start(), match.end()))

    # Display the full match (group 0)
    print("Full match: %s" % (match.group(0)))

    # Display the first group result (Day)
    print("Day: %s" % (match.group(1)))

    # Display the second group result (Month)
    print("Month: %s" % (match.group(2)))

# If no match is found
else:
    print("No regex match found.")


Match found at index 14, 21
Full match: 24 June
Day: 24
Month: June


# Match Object

In [41]:
import re

# Example 1: Search for the letter 'N' at the beginning of a word in a string
s = "Universitas Negeri terbaik di Indonesia adalah ..."

# \bN means: find the letter 'N' at the beginning of a word
res = re.search(r"\bN", s)

# Display the regex used
print("Regex used (Example 1):", res.re)

# Display the source string
print("Source string (Example 1):", res.string)

# Display the start index of the matched 'N'
print("Start index of the match for 'N':", res.start())

# Display the end index of the match (right after 'N')
print("End index of the match for 'N':", res.end())

# Display the match range as a tuple (start, end)
print("Match span (start, end):", res.span())


print("\n" + "-"*50 + "\n")  # Separator between examples


# Example 2: Search for the letter 'M' at the beginning of a word in a string with newline
a = 'Gw kece dari lahir\nMuhamad Petra Piyunandra'

# \bM means: find the letter 'M' at the beginning of a word
res = re.search(r"\bM", a)

# Display the regex used
print("Regex used (Example 2):", res.re)

# Display the source string
print("Source string (Example 2):", res.string)

# Display the start index of the matched 'M'
print("Start index of the match for 'M':", res.start())

# Display the end index of the match (right after 'M')
print("End index of the match for 'M':", res.end())

# Display the match range as a tuple (start, end)
print("Match span (start, end):", res.span())

Regex used (Example 1): re.compile('\\bN')
Source string (Example 1): Universitas Negeri terbaik di Indonesia adalah ...
Start index of the match for 'N': 12
End index of the match for 'N': 13
Match span (start, end): (12, 13)

--------------------------------------------------

Regex used (Example 2): re.compile('\\bM')
Source string (Example 2): Gw kece dari lahir
Muhamad Petra Piyunandra
Start index of the match for 'M': 19
End index of the match for 'M': 20
Match span (start, end): (19, 20)


## Matching pattern with text

In [42]:
# Python program to demonstrate the working of re.match()
import re

# Example function using regular expressions
# to extract the month and day from a given date string.
def findMonthAndDate(string):

    regex = r"([a-zA-Z]+) (\d+)"
    match = re.match(regex, string)

    if match == None:
        print("Not a valid date format")
    else:
        print("Provided data: %s" % (match.group()))
        print("Month: %s" % (match.group(1)))
        print("Day: %s" % (match.group(2)))

# Driver Code
findMonthAndDate("24 April")
print("")
findMonthAndDate("24 Juni")
print("")
findMonthAndDate("24 Desember")
print("")
findMonthAndDate("Januari 79")


Not a valid date format

Not a valid date format

Not a valid date format

Provided data: Januari 79
Month: Januari
Day: 79


## Find all occurence in pattern

In [43]:
# Python program to demonstrate the working of findall()
# Example text string where regular expressions
# will be applied.

string = """Hello, my number is 123456789 and
        my friend's number is 987654321"""

# Example regular expression to find digits.
regex = r'\d+'

# Using re.search() to find the first match
match = re.search(regex, string)
print(match)
print("")

# Using re.findall() to find all matches
match = re.findall(regex, string)
print(match)

<re.Match object; span=(20, 29), match='123456789'>

['123456789', '987654321']


## re.VERBOSE

In [44]:
# Without using VERBOSE
regex_email = re.compile(r'^([a-z0-9_\.-]+)@([0-9a-z\.-]+)\.([a-z\.]{2,6})$',
                         re.IGNORECASE)

# Using VERBOSE for better readability
regex_email = re.compile(r"""
                ^([a-z0-9_\.-]+)     # Local part
                @                    # Single @ symbol
                ([0-9a-z\.-]+)       # Domain name
                \.                   # A single dot .
                ([a-z]{2,6})$        # Top-level domain (TLD)
                """, re.VERBOSE | re.IGNORECASE)


This is passed as an argument to re.compile(), i.e., re.compile(RegularExpression, re.VERBOSE). re.compile() returns a RegExObject that is then matched against the given string.

Let's consider an example where a user is asked to enter their email ID and we need to validate it using RegEx. The email format is as follows:

- Personal details/local sections such as john123
- Single @
- Domain names such as gmail/yahoo, etc.
- Single dot (.)
- Top-level domains such as .com/.org/.net

Input : expectationvsreality@gmail.com\
Output : Valid

Input : greatwonderland@yahoo.com@\
Output: Invalid

invalid because there is @ after the top level domain name.

In [45]:
# Python3 program to demonstrate implementation of VERBOSE in RegEX
import re

def validate_email(email):

    # RegexObject = re.compile( Regular expression, flag )
    # Compiling the regular expression pattern into a
    # regular expression object
    regex_email = re.compile(r"""
                ^([a-z0-9_\.-]+)     # Local part
                @
                ([0-9a-z\.-]+)       # Domain name
                \.                   # Single dot .
                ([a-z]{2,6})$        # Top-level domain
                """, re.VERBOSE | re.IGNORECASE)

    # RegexObject is matched with the input
    # string using the fullmatch() function
    # If a match is found, fullmatch()
    # returns a MatchObject instance

    res = regex_email.fullmatch(email)

    # If match is found, the string is valid
    if res:
        print("{} is Valid. Details are as follows:".format(email))

        # Print the local/personal part of the email ID
        print("Local: {}".format(res.group(1)))

        # Print the domain name of the email ID
        print("Domain: {}".format(res.group(2)))

        # Print the top-level domain of the email ID
        print("Top-Level Domain: {}".format(res.group(3)))
        print()

    else:
        # If no match is found, the string is invalid
        print("{} is Invalid".format(email))

# Driver Code
validate_email("aliceinwonderland@gmail.com")
validate_email("harrypotter@yahoo.com@")
validate_email("Crucialllife@.com")


aliceinwonderland@gmail.com is Valid. Details are as follows:
Local: aliceinwonderland
Domain: gmail
Top-Level Domain: com

harrypotter@yahoo.com@ is Invalid
Crucialllife@.com is Invalid


In [46]:
# Python3 program to demonstrate implementation of VERBOSE in RegEX
import re

def user_instagram(instagram):

    regex_instagram = re.compile(r"""
                    ^@              # single @ symbol
                    ([a-z0-9_\.-]+) # username part
                    """, re.VERBOSE | re.IGNORECASE)

    res = regex_instagram.fullmatch(instagram)

    if res:
        print("{} is Valid. Details are as follows:".format(instagram))

        print("Username: {}".format(res.group(1)))

    else:
        # If no match is found, the string is invalid
        print("{} is Invalid".format(instagram))

# Driver Code
user_instagram("@aditpramna")
user_instagram("aditya167")
user_instagram("aditya@pram")
user_instagram("@winardi1004")


@aditpramna is Valid. Details are as follows:
Username: aditpramna
aditya167 is Invalid
aditya@pram is Invalid
@winardi1004 is Valid. Details are as follows:
Username: winardi1004


In [47]:
import re

def validate_email(email):

    regex_email = re.compile(r"""
    ^([a-z0-9_\.-]+)     # Local part
    @                    # Single @ symbol
    ([0-9a-z\.-]+)       # Domain name
    \.                   # Single dot .
    ([a-z]{2,6})$        # Top-level domain
    """, re.VERBOSE | re.IGNORECASE)

    res = regex_email.fullmatch(email)

    if res:
        print("{} is Valid. The details are as follows:".format(email))

        print("Local: {}".format(res.group(1)))

        print("Domain: {}".format(res.group(2)))
    
        print("Top-level Domain: {}".format(res.group(3)))
        print()

    else:
        print("{} is not a valid email format".format(email))

# Driver Code
validate_email("adgan167@gmail.com")
validate_email("aditpramna1@@gmail.com")
validate_email("adityapramana107@gmail.com")
validate_email("Awinardi1004@gmail.com")


adgan167@gmail.com is Valid. The details are as follows:
Local: adgan167
Domain: gmail
Top-level Domain: com

aditpramna1@@gmail.com is not a valid email format
adityapramana107@gmail.com is Valid. The details are as follows:
Local: adityapramana107
Domain: gmail
Top-level Domain: com

Awinardi1004@gmail.com is Valid. The details are as follows:
Local: Awinardi1004
Domain: gmail
Top-level Domain: com



# Regex Function Summary

| Function | Purpose |
| ----------- | ------------------------------------------------------------- |
| `findall()` | Gets all matches as a list |
| `compile()` | Creates a regex object |
| `split()` | Splits a string based on a pattern |
| `sub()` | Replaces a pattern with a new string |
| `subn()` | Like `sub`, but also gives the number of replacements |
| `escape()` | Protects special characters from being interpreted as regex |
| `search()` | Searches for the first **one** match |
| `VERBOSE` | Makes regex more readable with comments and spaces |

# Thank you