# Regular Expressions

# Tasks today:
1) <b>Importing</b> <br>
2) <b>Using Regular Expressions</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) re.compile() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) re.match() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; c) re.findall() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; d) re.search() <br>
3) <b>Sets</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) Integer Ranges <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) Character Ranges <br>
4) <b>Counting Occurences</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) {x} <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) {, x} <br>
 &nbsp;&nbsp;&nbsp;&nbsp; c) {?} <br>
 &nbsp;&nbsp;&nbsp;&nbsp; d) {*} <br>
 &nbsp;&nbsp;&nbsp;&nbsp; e) {+} <br>
5) <b>In-Class Exercise #1</b> <br>
6) <b>Escaping Characters</b> <br>
7) <b>Grouping</b> <br>
8) <b>In-Class Exercise #2</b> <br>
9) <b>Opening a File</b> <br>
 &nbsp;&nbsp;&nbsp;&nbsp; a) open() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; b) with open() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; c) re.match() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; d) re.search() <br>
 &nbsp;&nbsp;&nbsp;&nbsp; e) Store the String in a Variable <br>
10) <b>Regex Project</b> <br>

### Importing <br>
<p>Regular Expressions are universal throughout most programming languages... They are generally imported through the module 're'.</p>

In [19]:
# import re
import re


### Using Regular Expressions <br>
<p>Regular expressions give us the ability to search for patterns within text, strings, files, etc. They serve several uses, such as; security measures, searching, filtering, pattern recognition, and more...</p>

##### RegEx Cheatsheet

##### re.compile()

In [None]:
# using compile, pre determines the string to be used in regular expression methods
pattern = re.compile(r'abcd')
print(pattern)

##### re.match()

In [None]:
#match only matches at the beginning of the string
match = pattern.match('abcd123')

##### re.findall()

In [None]:
# return a list


##### re.search()

### Sets <br>
<p>The following cells will allow you to use regular expressions to search for certain values within a range such as numbers 1 through 4.</p>

##### [a-z] or [A-Z] - any lowercase/uppercase letters from a to z<br/>[^2] - anything that's not 2

##### Integer Ranges

In [20]:
pattern_int = re.compile(r'[0-7]')

##### Character Ranges

### Counting Occurences

##### {x} - something that occurs {num_of_times}

##### {x, x} - something that occurs between x and x times

##### ? - something that occurs 0 or 1 time

##### * - something that occurs at least 0 times

##### + - something that occurs at least once

##### In-class exercise 1: 

Use a regular expression to find every number in the given string

In [1]:
import re
y_string = "This string has 10909090 numbers, but it is only 1 string. I hope you solve this 2day."
pattern = re.compile(r'[0-9]+')
found = pattern.findall(y_string)
print(found)


['10909090', '1', '2']


### Escaping Characters

##### \w - look for any Unicode character<br/>\W - look for anything that isnt a Unicode character

[History on Unicode](http://unicode.org/standard/WhatIsUnicode.html)

[More on Unicode Characters](https://en.wikipedia.org/wiki/List_of_Unicode_characters)

##### \d - look for any digit 0-9<br/>\D - look for anything that isnt a digit

In [None]:
pattern = re.compile(r'\d(1,2))

##### \s - look for any white space<br/>\S - look for anything that isnt whitespace

##### \b - look for boundaries or edges of a word<br/>\B - look for anything that isnt a boundary

### Grouping

In [None]:
pattern = re.compile(r'([A-Z[w]+]) ([A-Z][A-Za-z]+)')

##### In-class Exercise 2:

Write a function using regular expressions to find the domain name in the given email addresses (and return None for the invalid email addresses)<br><b>HINT: Use '|' for either or</b>

In [20]:
# You can also use the $ at the end of your compile expression -- this stops the search

#Expected output:
#jordanw@codingtemple.orgcom
#pocohontas1776@gmail.com
#None
#yourfavoriteband@g6.org
#None

import re

my_emails = ["jordanw@codingtemple.orgcom", "pocohontas1776@gmail.com", "helloworld@aol..com",
             "yourfavoriteband@g6.org", "@codingtemple.com"]

def validate_email(email):
    pattern = re.compile(r'([A-Za-z0-9]+)@().[A-Za-z0-9]+.(org|com)')
    
    if pattern.match(email):
        return(email)
    else:
        return(None)
    
for email in my_emails:
    print(validate_email(email))



jordanw@codingtemple.orgcom
pocohontas1776@gmail.com
None
yourfavoriteband@g6.org
None


### Opening a File <br>
<p>Python gives us a couple ways to import files, below are the two used most often.</p>

##### open()

In [None]:
try:
    file = open("files/names.txt")
    data = file.read()
finally:
    file.close()
print(type)
print(type(data))

##### with open()

In [None]:
#use this way
with open("files/names.txt") as file:
    data = file.read()
    print(data)

##### re.match()

##### re.search()

##### Store the String to a Variable

### In-Class Exercise #3 <br>
<p>Print each persons name and twitter handle, using groups, should look like:</p>
<p>==============<br>
   Full Name / Twitter<br>
   ==============</p>
Derek Hawkins / @derekhawkins

 Erik Sven-Osterberg / @sverik

 Ryan Butz / @ryanbutz

 Example Exampleson / @example

 Ripal Pael / @ripalp

 Darth Vader / @darthvader

In [29]:
def get_twitter():
    with open("files/names.txt", "r") as file:
        for line in file:
            twitter = re.search(r"(?P<twit>@[a-zA-Z0-9]+)(?!.)",line)
            name = re.search(r"(?P<first_name>\w+), (?P<last_name>\w+)",line)
            full_name = f"{name.group('last_name')} {name.group('first_name')}"
            if twitter:
                print(f"{full_name} / {twitter.group('twit')}")

get_twitter()

AttributeError: 'NoneType' object has no attribute 'group'

### Regex project

Use python to read the file regex_test.txt and print the last name on each line using regular expressions and groups (return None for names with no first and last name, or names that aren't properly capitalized)
##### Hint: use with open() and readlines()

In [None]:
"""
Expected Output
Abraham Lincoln
Andrew P Garfield
Connor Milliken
Jordan Alexander Williams
None
None
"""

In [27]:
names = ['Abraham Lincoln', 'Andrew P Garfield', 'Connor Milliken', 'Jordan Alexander Williams', 
         'Madonna', 'programming is cool']


def validate_name(name):
    pattern = re.compile(r'([A-Z])([a-z]+) ([A-Z])([a-z]+)')
    if pattern.match(name):
        return name
    else:
        return None
    
for name in names:
    print(validate_name(name))

Abraham Lincoln
None
Connor Milliken
Jordan Alexander Williams
None
None
