In [44]:
import re

# re functions
* match
    * Pattern should match at the *beginning* of each string
* search
    * Pattern can be anywhere in the string

## Introduction

If there is no match, the match and search function will return None. Just for technicality, ```None``` is an instance of the class/type of ```NoneType```. To specify, if there exists or none, we then convert the None into a boolean. 

In [22]:
# This will output the 'NoneType', which is the class of None.
type(re.match('foo', 'myfavestring'))

NoneType

In [25]:
# Converts the None into a boolean False
bool(re.match('foo', 'myfavestring'))

False

If there is a match, then ```match``` and ```search``` will return a match object. It specifies where it is found through the span and what it actually matched.

In [26]:
re.match('foo', 'foo myfavestring')

<re.Match object; span=(0, 3), match='foo'>

## Begins with foo

In [32]:
# Without regular expressions
def begins_with_foo_no_re(s):
    """Returns True if s begins with substring 'foo' (case-sensitive)"""
    return s[0:3] == 'foo'

# With regular expressions
def begins_with_foo_re_match(s):
    """Returns True if s begins with substring 'foo' (case-sensitive)"""
    return bool(re.match('foo', s))

# Instead of using match, we can also just use ^ to indicate the start of a string
def begins_with_foo_re_search(s):
    """Returns True if s begins with substring 'foo' (case-sensitive)"""
    return bool(re.search('^foo', s))


In [None]:
# Without regular expressions
def begins_with_foo_insensitive_no_re(s):
    """Returns True if s begins with substring 'foo' (case-insensitive)"""
    # We use .casefold() than .lower() since the latter is more general.
    s = s.casefold()
    return s[0:3] == 'foo'
    
# We can use a flag to make it case-insensitive through re.IGNORECASE or re.I
def begins_with_foo_insensitive_re(s):
    """Returns True if s begins with substring 'foo' (case-sensitive)"""
    return bool(re.match('foo', s, flags = re.I))

In [38]:
begins_with_foo_insensitive_re('Fooooo')

True

## Ends with foo
Setting up on how to get is substring **ends** with foo for 2 situations of case-senstive and case-insensitive

In [30]:
# Case-insenstive versions
def ends_with_foo_no_re(s):
    """Returns True if s ends with substring 'foo'"""
    return s.endswith('foo')

def end_with_foo_re(s):
    """Returns True if s ends with substring 'foo'"""
    return bool(re.search("foo$", s))

In [31]:
end_with_foo_re('uwufoo')

True

In [41]:
# Without regular expressions
def ends_with_foo_insensitive_no_re(s):
    """Returns True if s begins with substring 'foo' (case-insensitive)"""
    # We use .casefold() since it is more general than .lower()
    s = s.casefold()
    return s[-3:] == 'foo'
    
# We can use a flag to make it case-insensitive through re.IGNORECASE or re.I
def ends_with_foo_insensitive_re(s):
    """Returns True if s begins with substring 'foo' (case-sensitive)"""
    return bool(re.search('foo$', s, flags = re.I))

In [43]:
ends_with_foo_insensitive_re('faveFOo')

True

If for example you plave ```"^foo$"```, then you will only accept strings specifically that are only ```foo```. Anything else is false.

In [34]:
bool(re.search("^foo$", 'foo'))

True

## Has foo

In [51]:
# Case-sensitive
def has_foo_sensitive_no_re(s):
    """Returns True if s has substring 'foo' (case-sensitive)"""
    return 'foo' in s

def has_foo_sensitive_re(s):
    """Returns True if s has substring 'foo' (case-sensitive)"""
    return bool(re.search('foo', s))

In [54]:
# Case-insensitive
def has_foo_insensitive_no_re(s):
    """Returns True if s has substring 'foo' (case-insensitive)"""
    return 'foo' in s.lower()

def has_foo_insensitive_re(s):
    """Returns True if s has substring 'foo' (case-insensitive)"""
    return bool(re.search('foo', s, flags = re.I))

has_foo_insensitive_re('uwuFOO$')

True

## Begins with a number

We will now use the *raw* string format. We just do this by ```r'some string'```. We do this so that it can recognize the backslash.

In [60]:
def begins_with_num_no_re(s):
    return s[0].isnumeric()


def begins_with_num_re_match(s):
    return bool(re.match(r'\d', s))

def begins_with_num_re_search(s):
    return bool(re.search(r'^\d', s))

In [61]:
begins_with_num_re_search('12')

True

## Begins with a number then letter

In [68]:
def begins_number_letter_no_re(s):
    """Returns True if s begins with a number then a letter"""
    return s[0].isdigit() and s[1].isalpha()

def begins_number_letter_re(s):
    """Returns True if s begins with a number then a letter"""
    return bool(re.match(r'\d[a-zA-Z]', s))

begins_number_letter_re('1s')

True

## Has a substring of number then letter

In [70]:
def number_then_letter_no_re(s):
    """Returns True if s contains a number followed by a letter"""
    for i in range(1, len(s)):
        if s[i-1].isdigit() and s[i].isalpha():
            return True
    return False

def number_then_letter_re(s):
    """Returns True if s contains a number followed by a letter"""
    return bool(re.search(r'\d[a-zA-Z]', s))

# Using lazydog

Let us look more closely on what is a ```match``` object. It has the group method, which allows you to specify a certain group. What is most important is ```group(0)``` or basically the entire match.

* group
    * group(0) is entire match
    * group(n) is the $n$-th paranthesized group
    * group(name) is the paranthesized group name
* groups
    * Return all subgroups
* start
    * Starting index of the string
* end
    * Ending index + 1 of the string. This is because Python takes the exclusive for string splicing
* span
    * Gives the start and end numbers as a tuple

In [71]:
lazydog = '''The quick brown fox jumps over the lazydog. The quick brown fox jumps over the lazy dog.
The quick brown then fox 123 abc123 999xyz jumps he over the -he lazy      she dog. The quick brown fox jumps over the lazy$!@dog.'''

In [75]:
match = re.match('The', lazydog)
# This returns us a match object
match

<re.Match object; span=(0, 3), match='The'>

In [76]:
match.group()

'The'

We can now notice how helpful the ```.start()```, ```.end()```, and ```.span()``` functions are through the string splicing.

In [79]:
lazydog[match.start(): match.end()]

'The'

In [82]:
start, end = match.span()
lazydog[start : end]

'The'

For the ```search``` method, it will only give you the first match. Thus, you would then have to use the ```finditer``` and ```findall``` method.

In [84]:
match = re.search('The', lazydog)
# You will see only one match, not any of the other "The"s
match

<re.Match object; span=(0, 3), match='The'>

In [87]:
re.findall('The', lazydog)

['The', 'The', 'The', 'The']

In [89]:
matches = re.finditer('The', lazydog)

for match in matches:
    print(match)

<re.Match object; span=(0, 3), match='The'>
<re.Match object; span=(44, 47), match='The'>
<re.Match object; span=(89, 92), match='The'>
<re.Match object; span=(173, 176), match='The'>
