## Lecture 5: [1/27]

## Regular Expressions

---

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

**Initialization**
* `lazydog`: This variable is assigned a long string containing multiple occurrences of the word "The".

**Importing the re Module**:
* `import re`: This line imports the `re` module, which provides functions for working with regular expressions in Python.

**Defining the `find_all` Function:**
* `def find_all(pattern, s)`: This defines a function named find_all that takes two arguments:
    * `pattern`: The regular expression pattern to search for.
    * `s`: The string in which to search for the pattern.
    * The function is designed to find all occurrences of the given pattern in the string and return a list of all the matches.
    * `start_index = 0`: This line initializes a variable `start_index` to 0. This variable will be used to keep track of the starting position for each search within the string.
    * `matches = []`: This line creates an empty list called `matches`. This list will be used to store all the matches found.
    * `while True`:This line starts a `while` loop that will continue to execute until it is explicitly stopped.

* `match = re.search(pattern, s[start_index:])`: 
    * This line uses the `re.search()` function from the `re` module to search for the specified `pattern` within the string `s`, starting from the `start_index`.
    * `re.search()` returns a `match` object if the pattern is found, otherwise it returns `None`.

* `if match`: This line checks if a match was found.

* `matches.append(match.group())`
    * If a match is found, this line appends the matched text `(match.group())` to the `matches` list.

* `start_index = match.end()`
* This line updates the `start_index` to the position after the end of the current match, so the next search will start from this position.

* `else`:This block executes if no match is found.

* `return matches`: If no more matches are found, the function returns the list of all matches found so far.

**Finding All Occurrences of "The"**:
* `for match in re.finditer('The', lazydog)`: This loop iterates over all the matches of the pattern "The" found in the `lazydog` string using the `re.finditer()` function.
* `print(match)`: Inside the loop, each match object is printed. The output shows the starting and ending indices of each match within the string.



In [3]:
lazydog = '''The quick brown fox jumps over the lazydog. The quick brown fox jumps over the lazy dog.
The quick brown then fox 123 abc123 999xyz jumps he over the -he lazy      she dog. The quick brown fox jumps over the lazy$!@dog.'''

In [4]:
import re

In [5]:
def find_all(pattern, s):
    """Return list of all matches of pattern in s"""
    start_index = 0
    matches = []
    while True:
        match = re.search(pattern, s[start_index:])
        if match:
            matches.append(match.group())
            start_index += match.end()
        else:
            return matches

In [6]:
for match in re.finditer('The', lazydog):
    print(match)

<re.Match object; span=(0, 3), match='The'>
<re.Match object; span=(44, 47), match='The'>
<re.Match object; span=(89, 92), match='The'>
<re.Match object; span=(173, 176), match='The'>


<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

* This code is using the re.finditer() function from the re (regular expressions) library to find all occurrences of the word "The" within a given string.

1. `for match in re.finditer('The', lazydog)`:
* This line iterates over all the matches of the pattern "The" found in the `lazydog` string using the `re.finditer()` function.
* `re.finditer('The', lazydog)` finds all non-overlapping matches of the pattern 'The' in the string 'lazydog' and returns an iterator over these matches.

2. `print(match.group())`
* Inside the loop, this line prints the actual text that was matched by the regular expression. `match.group()` extracts the matched substring from the original string.


In [7]:
for match in re.finditer('The', lazydog):
    print(match.group())

The
The
The
The


<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`matches = []`
* This line creates an empty list named `matches`. This list will be used to store all the matches found.

`for match in re.finditer('The', lazydog)`:
* This line iterates over all the matches of the pattern "The" found in the `lazydog` string using the `re.finditer()` function.
* `re.finditer('The', lazydog)` finds all non-overlapping matches of the pattern 'The' in the string 'lazydog' and returns an iterator over these matches.

`matches.append(match.group())`
* Inside the loop, this line appends the actual text that was matched by the regular expression to the `matches` list. `match.group()` extracts the matched substring from the original string.

`matches`
* This line prints the `matches` list, which now contains all the occurrences of the word "The" found in the string.


In [8]:
matches = []
for match in re.finditer('The', lazydog):
    matches.append(match.group())

matches

['The', 'The', 'The', 'The']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`matches = [match.group() for match in re.finditer('The', lazydog)]`
* This line is the core of the code. It uses a list comprehension to create a list of all the matches found.
* `re.finditer('The', lazydog)`: This part finds all non-overlapping matches of the pattern 'The' in the string 'lazydog' and returns an iterator over these matches.
* `for match in ...`: This iterates through each match object returned by `re.finditer()`.
* `match.group()`: This extracts the actual text that was matched by the regular expression from each match object.
* `[match.group() for match in ...]`: This list comprehension creates a new list where each element is the result of calling `match.group()` for each `match` in the iterator.

`matches`
* This line prints the `matches` list, which now contains all the occurrences of the word "The" found in the string.


In [9]:
matches = [match.group() for match in re.finditer('The', lazydog)]
matches

['The', 'The', 'The', 'The']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`def find_all(pattern, s)`:
* This line defines a function named `find_all` that takes two arguments:
* `pattern`: The regular expression pattern to search for.
* `s`: The string in which to search for the pattern.

`return [match.group() for match in re.finditer(pattern, s)]`
* This line is the core of the function. It uses list comprehension to create a list of all the matches found.
* `re.finditer(pattern, s)`: This part finds all non-overlapping matches of the `pattern` in the string `s` and returns an iterator over these matches.
* `for match in ...`: This iterates through each `match` object returned by `re.finditer()`.
* `match.group()`: This extracts the actual text that was matched by the regular expression from each `match` object.
* `[match.group() for match in ...]`: This list comprehension creates a new list where each element is the result of calling `match.group()` for each `match` in the iterator.

`find_all('The', lazydog)`
* This line calls the find_all function with the pattern 'The' and the string 'lazydog' as arguments, and stores the result (the list of matches) in the output.


In [13]:
def find_all(pattern, s):
    """Return list of all matches of pattern in s"""
    return [match.group() for match in re.finditer(pattern, s)]

In [10]:
find_all('The', lazydog)

['The', 'The', 'The', 'The']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall('[ab]azy', 'lazydog')`
* This line searches for the pattern `[ab]azy`  within the string "lazydog".
* `[ab]` is a character set that matches either 'a' or 'b'.
* The output is an empty list `[]` because there are no occurrences of "aazy" or "bazy" in the string.

2. `re.findall('[abl]azy', 'lazydog')`
* This line searches for the pattern `[abl]azy` within the string "lazydog".
* `[abl]` is a character set that matches 'a', 'b', or 'l'.
The output is `['lazy', 'lazy', 'lazy', 'lazy']` because there are four occurrences of "lazy" in the string, and "l" matches the character set.

3. `re.findall('[^ab]azy', 'lazydog')`
( This line searches for the pattern `[^ab]azy` within the string "lazydog".
* `[^ab]` is a negated character set that matches any character except 'a' or 'b'.
* The output is also `['lazy', 'lazy', 'lazy', 'lazy']` because all occurrences of "lazy" in the string start with 'l', which is not 'a' or 'b'.

In [11]:
re.findall('[ab]azy', lazydog)

[]

In [14]:
re.findall('[abl]azy', lazydog)

['lazy', 'lazy', 'lazy', 'lazy']

In [15]:
re.findall('[^ab]azy', lazydog)

['lazy', 'lazy', 'lazy', 'lazy']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

* This code is using the `re.findall()` function from the `re` (regular expressions) library to find all occurrences of patterns within the string "lazydog". These patterns use character sets with special characters like `^` (caret) and `-` (hyphen) to match specific characters or ranges of characters.

1. `re.findall('[a-t]he', 'lazydog')`
* This line searches for the pattern `[a-t]` he within the string "lazydog".
* `[a-t]` is a character set that matches any character from 'a' to 't' in the alphabet.
* The output is `['the', 'the', 'the', 'the', 'the']` because there are five occurrences of "the" in the string, and each starts with a letter between 'a' and 't'.

2. `re.findall('[^a-t]he', 'lazydog')`
* This line searches for the pattern `[^a-t]` he within the string "lazydog".
* `[^a-t]` is a negated character set that matches any character except those from 'a' to 't'.
* The output is `['The', 'The', 'The', 'he', '-he', 'she', 'The']` because these occurrences of "he" are preceded by characters that are not between 'a' and 't'.

3. `re.findall('[a^t]he', 'lazydog')`
* This line searches for the pattern `[a^t]` he within the string "lazydog".
* `[a^t]` is a character set that matches either 'a' or '^' (caret).
* The output is `['the', 'the', 'the', 'the', 'the']` because only the letter 'a' matches within this set.

In [16]:
re.findall('[at]he', lazydog)

['the', 'the', 'the', 'the', 'the']

In [17]:
re.findall('[^at]he', lazydog)

['The', 'The', 'The', ' he', '-he', 'she', 'The']

In [18]:
re.findall('[a^t]he', lazydog)

['the', 'the', 'the', 'the', 'the']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall('[a-z-A-Z]he', 'lazydog')`
* This line searches for the pattern `[a-z-A-Z]`he within the string "lazydog".
* `[a-z-A-Z]` is a character set that matches any lowercase letter, any uppercase letter, and the hyphen '-'.
* The output is `['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']` because now the hyphen '-' is also included in the set of matching characters.

2. `re.findall('[a-z-A-Z]he', 'lazydog')`
* This line searches for the pattern `[a-z-A-Z] he`  within the string "lazydog".
* `[a-z-A-Z]` is a character set that matches any lowercase letter, any uppercase letter, and the hyphen '-'.
* The output is the same as the previous line because the pattern is identical.

3. `re.findall('[a-z-A-Z]he', 'lazydog')`
* This line searches for the pattern `[a-z-A-Z]` he within the string "lazydog".
* `[a-z-A-Z]` is a character set that matches any lowercase letter, any uppercase letter, and the hyphen '-'.
* The output is the same as the previous lines because the pattern is identical.

**Key Points**:
* The character set `[a-z-A-Z]` matches any lowercase letter, any uppercase letter, and the hyphen '-'.
* The `re.findall()` function returns a list of all the non-overlapping matches found in the string.

In [19]:
re.findall('[a-zA-Z]he', lazydog)

['The', 'the', 'The', 'the', 'The', 'the', 'the', 'she', 'The', 'the']

In [20]:
re.findall('[a-zA-Z-]he', lazydog)

['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']

In [21]:
re.findall('[-a-zA-Z]he', lazydog)

['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall('[a-z-A-Z1-]he', 'lazydog')`
* This line searches for the pattern `[a-z-A-Z1-]he` within the string "lazydog".
* `[a-z-A-Z1-]` is a character set that matches any lowercase letter, any uppercase letter, the hyphen '-', and the digit '1'.
* The output is `['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']` because now the digit '1' is also included in the set of matching characters.

2. `re.findall('.he', 'lazydog')`
* This line searches for the pattern `'.he'` within the string "lazydog".
* `.` is a special character in regular expressions that matches any character (except newline).
* The output is a list containing all occurrences of "he" preceded by any character, including spaces, hyphens, and letters.

**Key Points**:
* Character sets can include ranges of characters (e.g., `a-z`, `A-Z`), individual characters, and special characters like `-` (hyphen) and `^` (caret).
* The period `.` is a special character in regular expressions that matches any single character except for a newline.
* The `re.findall()` function returns a list of all the non-overlapping matches found in the string.

In [22]:
re.findall('[a-zA-Z1-]he', lazydog)

['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']

In [23]:
re.findall('.he', lazydog)

['The',
 'the',
 'The',
 'the',
 'The',
 'the',
 ' he',
 'the',
 '-he',
 'she',
 'The',
 'the']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall('\d', 'lazydog')`
* This line searches for the pattern `\d` within the string "lazydog".
* `\d` is a special character that matches any digit (0-9).
* The output is `['1', '2', '3', '1', '2', '3', '9', '9', '9']` because it extracts all the digits from the string.

2. `re.findall(r'\d', 'lazydog')`
* This line does the same as the previous line. The `r` before the string indicates that it is a raw string, which means that backslashes are treated literally. In this case, it doesn't make a difference because the \d pattern itself is already a special sequence.

3. `re.findall(r'\dhe', 'lazydog')`
* This line searches for the pattern \dhe within the string "lazydog".
* `\dhe` matches a digit followed by "he".
* The output is `[]` because there are no occurrences of a digit followed by "he" in the string.

4. `re.findall(r'\Dhe', 'lazydog')`
* This line searches for the pattern `\Dhe` within the string "lazydog".
* `\D` is a special character that matches any character that is not a digit.
* The output is a list containing all occurrences of "he" preceded by any character that is not a digit.

**In essence, this code snippet demonstrates how to use special characters like `\d` (digit) and `\D` (non-digit) within regular expressions to define patterns and extract specific information from a string.**




In [24]:
re.findall('\\d', lazydog)

['1', '2', '3', '1', '2', '3', '9', '9', '9']

In [25]:
re.findall(r'\d', lazydog)

['1', '2', '3', '1', '2', '3', '9', '9', '9']

In [26]:
re.findall(r'\dhe', lazydog)

[]

In [27]:
re.findall(r'\Dhe', lazydog)

['The',
 'the',
 'The',
 'the',
 'The',
 'the',
 ' he',
 'the',
 '-he',
 'she',
 'The',
 'the']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall(r'\she', 'lazydog')`
* This line searches for the pattern `\she` within the string "lazydog".
* `\s` is a special character that matches any whitespace character (space, tab, newline, etc.).
* The output is `['he']` because it finds the occurrence of "he" preceded by a single space.

2. `re.findall(r'\She', 'lazydog')`
* This line searches for the pattern `\She` within the string "lazydog".
* `\S` is a special character that matches any non-whitespace character.
* The output is `['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']` because it finds all occurrences of "he" preceded by a non-whitespace character.

3. `re.findall(r'\whe', 'lazydog')`
* This line searches for the pattern `\whe` within the string "lazydog".
* `\w` is a special character that matches any word character (letters, digits, and underscores).
* The output is `['The', 'the', 'The', 'the', 'The', 'the', 'the', 'she', 'The', 'the']` because it finds all occurrences of "he" preceded by a word character.

4. `re.findall(r'\Whe', 'lazydog')`
* This line searches for the pattern `\Whe` within the string "lazydog".
* `\W` is a special character that matches any non-word character (anything that is not a letter, digit, or underscore).
* The output is `['he', '-he']` because it finds the occurrences of "he" preceded by a non-word character (space and hyphen).

**In essence, this code snippet demonstrates how to use special characters like `\s`, `\S`, `\w`, and `\W` within regular expressions to match whitespace and non-whitespace characters, enabling you to extract specific patterns from a string.**

In [28]:
re.findall(r'\she', lazydog)

[' he']

In [29]:
re.findall(r'\She', lazydog)

['The', 'the', 'The', 'the', 'The', 'the', 'the', '-he', 'she', 'The', 'the']

In [30]:
re.findall(r'\whe', lazydog)

['The', 'the', 'The', 'the', 'The', 'the', 'the', 'she', 'The', 'the']

In [31]:
re.findall(r'\Whe', lazydog)

[' he', '-he']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

* This code is using the `re.findall()` function from the `re` (regular expressions) library to find all occurrences of patterns within the string "lazydog". These patterns use special characters to match any character or zero or more occurrences of a character.

1. `re.findall(r'a*', 'lazydog')`
* This line searches for the pattern `a*` within the string "lazydog".
* `*` is a quantifier that matches zero or more occurrences of the preceding character or group. In this case, it matches zero or more occurrences of the letter 'a'.
* The output is `['', '', '', '', '', '', '', '', '', '', '', '']` because it finds many occurrences of zero or more 'a' characters within the string.

2. `re.findall(r'.*', 'lazydog')`
* This line searches for the pattern `.*` within the string "lazydog".
* `.` matches any character (except newline), and * matches zero or more occurrences.
* The output is `['lazydog']` because `.*` matches the entire string

**Key Points**:
* `*` is a quantifier that matches zero or more occurrences of the preceding character or group.
* `.` is a special character that matches any character (except newline).
* The `re.findall()` function returns a list of all the non-overlapping matches found in the string.

In [32]:
re.findall(r'a*', lazydog)

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'a',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'a',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'a',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'a',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 

In [10]:
re.findall(r'*', lazydog)

error: nothing to repeat at position 0

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall('99*', 'lazydog')`
* This line searches for the pattern `99*` within the string "lazydog".
* `*` is a quantifier that matches zero or more occurrences of the preceding character or group. In this case, it matches zero or more occurrences of the digit '9'.
* The output is `['999']` because it finds the occurrence of three consecutive '9's in the string.

2. `re.findall('99*', 'lazydog')`
* This line searches for the pattern `99*` within the string "lazydog".
* `*` is a quantifier that matches zero or more occurrences of the preceding character or group. In this case, it matches zero or more occurrences of the digit '9'.
* The output is `['9', '9', '9']` because it finds three individual occurrences of the digit '9'.

3. `re.findall('.*', 'lazydog')`
* This line searches for the pattern `.*` within the string "lazydog".
* `.` matches any character (except newline), and `*` matches zero or more occurrences.
* The output is a list containing the entire string "lazydog" because `.*` matches the entire string.

**Key Points**:
* `*` matches zero or more occurrences of the preceding character or group.
* `?` matches zero or one occurrence of the preceding character or group.1

In [34]:
re.findall('99*', lazydog)

['999']

In [35]:
re.findall('99*?', lazydog)

['9', '9', '9']

In [36]:
re.findall('.*', lazydog)

['The quick brown fox jumps over the lazydog. The quick brown fox jumps over the lazy dog.',
 '',
 'The quick brown then fox 123 abc123 999xyz jumps he over the -he lazy      she dog. The quick brown fox jumps over the lazy$!@dog.',
 '']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

1. `re.findall('.*?', 'lazydog')`
* This line searches for the pattern `.*?` within the string "lazydog".
* `.` matches any character (except newline).
* `*` is a quantifier that matches zero or more occurrences of the preceding character or group.
* `?` is a quantifier that makes the preceding quantifier (* in this case) non-greedy.

**Non-greedy matching**:
* By default, quantifiers like `*` are greedy. This means they try to match as much of the string as possible.
* The `?` after the quantifier makes it non-greedy. This means it will match as few characters as possible while still satisfying the pattern.

**In this case**:
* `.*?` will match individual characters one by one because the non-greedy quantifier will try to match the shortest possible string that satisfies the pattern.
* Therefore, the output of this code is a list of all the individual characters in the string "lazydog".




In [37]:
re.findall('.*?', lazydog)

['',
 'T',
 '',
 'h',
 '',
 'e',
 '',
 ' ',
 '',
 'q',
 '',
 'u',
 '',
 'i',
 '',
 'c',
 '',
 'k',
 '',
 ' ',
 '',
 'b',
 '',
 'r',
 '',
 'o',
 '',
 'w',
 '',
 'n',
 '',
 ' ',
 '',
 'f',
 '',
 'o',
 '',
 'x',
 '',
 ' ',
 '',
 'j',
 '',
 'u',
 '',
 'm',
 '',
 'p',
 '',
 's',
 '',
 ' ',
 '',
 'o',
 '',
 'v',
 '',
 'e',
 '',
 'r',
 '',
 ' ',
 '',
 't',
 '',
 'h',
 '',
 'e',
 '',
 ' ',
 '',
 'l',
 '',
 'a',
 '',
 'z',
 '',
 'y',
 '',
 'd',
 '',
 'o',
 '',
 'g',
 '',
 '.',
 '',
 ' ',
 '',
 'T',
 '',
 'h',
 '',
 'e',
 '',
 ' ',
 '',
 'q',
 '',
 'u',
 '',
 'i',
 '',
 'c',
 '',
 'k',
 '',
 ' ',
 '',
 'b',
 '',
 'r',
 '',
 'o',
 '',
 'w',
 '',
 'n',
 '',
 ' ',
 '',
 'f',
 '',
 'o',
 '',
 'x',
 '',
 ' ',
 '',
 'j',
 '',
 'u',
 '',
 'm',
 '',
 'p',
 '',
 's',
 '',
 ' ',
 '',
 'o',
 '',
 'v',
 '',
 'e',
 '',
 'r',
 '',
 ' ',
 '',
 't',
 '',
 'h',
 '',
 'e',
 '',
 ' ',
 '',
 'l',
 '',
 'a',
 '',
 'z',
 '',
 'y',
 '',
 ' ',
 '',
 'd',
 '',
 'o',
 '',
 'g',
 '',
 '.',
 '',
 '',
 'T',
 '',
 'h',
 '',


<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('9?', 'lazydog')`
* This line searches for the pattern `9?` within the string "lazydog".
* `?` is a quantifier that matches zero or one occurrence of the preceding character or group. In this case, it matches zero or one occurrence of the digit '9'.

**What the output would look like**:
* Since the string "lazydog" does not contain any digits, the output would be a list containing only empty strings (''). This is because the `?` quantifier allows for zero occurrences of the digit '9' at every position within the string.

**In essence, this code snippet demonstrates how to use the ? quantifier in regular expressions to match zero or one occurrence of a character or group of characters**

**Key Points**:
* `?` matches zero or one occurrence of the preceding character or group.
* The `re.findall()` function returns a list of all the non-overlapping matches found in the string.



In [38]:
re.findall('9?', lazydog)

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '9',
 '9',
 '9',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

* `re.findall('l(?=9??)', 'lazydog')`
* This line of code uses the `re` module (regular expressions) in Python to * find all occurrences of the letter "l" followed by zero or one occurrence of the digit "9" within the string "lazydog".
* `re.findall()`: This function is used to find all non-overlapping matches of a pattern in a given string.
* `'l(?=9??)'`: This is the regular expression pattern:
* `l`: Matches the letter "l" literally.
* `(?=9??)`: This is a positive lookahead assertion. It checks if the following characters match the pattern inside the assertion, but doesn't include them in the match result.
* `9`: Matches the digit "9" literally.
* `?`: This is a quantifier that matches zero or one occurrence of the preceding element (in this case, the digit "9").
* `?`: This is a non-capturing group, meaning it doesn't capture the matched group for later use.


In [39]:
re.findall('9??', lazydog)

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '9',
 '',
 '9',
 '',
 '9',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('9+', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `'9+'` within the string "lazydog".
* `'9+'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `'+'` is a quantifier that matches one or more occurrences of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, the output will be `['999']`.

`re.findall('9+?', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern '9+?' within the string "lazydog".
* `'9+'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `'+'` is a quantifier that matches one or more occurrences of the preceding element (in this case, the digit "9").
* `'?'` after the `'+'` makes it a non-greedy match. It tries to match as few characters as possible while still satisfying the pattern.
* This will find each individual "9" in the sequence "999", resulting in the output `['9', '9', '9']`.


`re.findall('9{2}', 'lazydog')`
* This line uses the re.findall() function to find all non-overlapping occurrences of the pattern `'9{2}'` within the string "lazydog".
* `'9{2}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{2}` is a quantifier that matches exactly two occurrences of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, it will find two consecutive "9"s, resulting in the output `['99']`.

In [40]:
re.findall('9+', lazydog)

['999']

In [41]:
re.findall('9+?', lazydog)

['9', '9', '9']

In [42]:
re.findall('9{2}', lazydog)

['99']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('9{1}', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern '9{1}' within the string "lazydog".
* `'9{1}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{1}` is a quantifier that matches exactly one occurrence of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, it will find each individual "9", resulting in the output `['9', '9', '9']`.

`re.findall('9{3}', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `'9{3}'` within the string "lazydog".
* `'9{3}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{3}` is a quantifier that matches exactly three occurrences of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, it will find the entire sequence, resulting in the output `['999']`.

`re.findall('9{2,3}', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `'9{2,3}'` within the string "lazydog".
* `'9{2,3}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{2,3}` is a quantifier that matches between 2 and 3 occurrences of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, it will find the entire sequence, resulting in the output `['999']`.

In [43]:
re.findall('9{1}', lazydog)

['9', '9', '9']

In [44]:
re.findall('9{3}', lazydog)

['999']

In [45]:
re.findall('9{2,3}', lazydog)

['999']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('9{1,2}', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `'9{1,2}'` within the string "lazydog".
* '9{1,2}' is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{1,2}` is a quantifier that matches between 1 and 2 occurrences of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, it will find both the first two "9"s and the last two "9"s, resulting in the output `['99', '99']`.

`re.findall('9{1,2}?', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `'9{1,2}?'` within the string "lazydog".
* `'9{1,2}?'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{1,2}` is a quantifier that matches between 1 and 2 occurrences of the preceding element (in this case, the digit "9").
'?' after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
* Since there is a sequence of "999" in the string, it will find each individual "9", resulting in the output `['9', '9', '9']`.

`re.findall('9{1,}', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `'9{1,}'` within the string "lazydog".
* `'9{1,}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{1,}` is a quantifier that matches one or more occurrences of the preceding element (in this case, the digit "9").
* Since there is a sequence of "999" in the string, it will find the entire sequence, resulting in the output `['999']`.

`re.findall('9{,2}', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `'9{,2}'` within the string "lazydog".
* `'9{,2}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{,2}` is a quantifier that matches zero to two occurrences of the preceding element (in this case, the digit "9").
* This will find all empty strings (zero occurrences of "9") between characters and the individual "9"s in the sequence "999", resulting in multiple empty strings being found.

In [46]:
re.findall('9{1,2}', lazydog)

['99', '9']

In [47]:
re.findall('9{1,2}?', lazydog)

['9', '9', '9']

In [48]:
re.findall('9{1,}', lazydog)

['999']

In [49]:
re.findall('9{,2}', lazydog)

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '99',
 '9',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('9{1,}?', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern '9{1,}?' within the string "lazydog".
* `'9{1,}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{1,}` is a quantifier that matches one or more occurrences of the preceding element (in this case, the digit "9").
* `'?'` after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
Since there is a sequence of "999" in the string, it will find each individual "9", resulting in the output `['9', '9', '9']`.

`re.findall('9{,2}?', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `'9{,2}?'` within the string "lazydog".
* `'9{,2}'` is a regular expression pattern.
* `'9'` matches the digit "9" literally.
* `{,2}` is a quantifier that matches zero to two occurrences of the preceding element (in this case, the digit "9").
* `'?'` after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
* Since there are no occurrences of the digit "9" in the string "lazydog", the output will be an empty list `[]`.

**Summary**
* The code attempts to find occurrences of the digit "9" with zero to two repetitions in the string "lazydog". However, since there are no "9"s in the string, the output is an empty list.

In [50]:
re.findall('9{1,}?', lazydog)

['9', '9', '9']

In [51]:
re.findall('9{,2}?', lazydog)

['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '9',
 '',
 '9',
 '',
 '9',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall(r'quick\w*', 'lazydog')`
* This line uses the re.findall() function from the re module to find all non-overlapping occurrences of the pattern `r'quick\w*'` within the string "lazydog".
* `r'quick\w*'` is a raw string regular expression pattern.
* `'quick'` matches the literal string "quick".
* `\w*` matches zero or more word characters (letters, digits, and underscores).
* Since there are no occurrences of the word `"quick" in the string "lazydog", the output is an empty list `[]`.


`re.findall(r'lazy\w*', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `r'lazy\w*'` within the string "lazydog".
* `r'lazy\w*'` is a raw string regular expression pattern.
* `'lazy'` matches the literal string "lazy".
* `\w*` matches zero or more word characters.
* The output is `['lazydog', 'lazy', 'lazy', 'lazy']`. It finds all occurrences of the string "lazy" followed by any number of word characters.

`re.findall(r'lazy\S*', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `r'lazy\S*'` within the string "lazydog".
* `r'lazy\S*'` is a raw string regular expression pattern.
* 'lazy' matches the literal string "lazy".
* `\S*` matches zero or more non-whitespace characters (any character except spaces, tabs, and newlines).
* The output is `['lazydog.', 'lazy', 'lazy', 'lazy$!@dog.']`. It finds all occurrences of the string "lazy" followed by any number of non-whitespace characters.


In [52]:
re.findall(r'quick\w*', lazydog)

['quick', 'quick', 'quick', 'quick']

In [53]:
re.findall(r'lazy\w*', lazydog)

['lazydog', 'lazy', 'lazy', 'lazy']

In [54]:
re.findall(r'lazy\S*', lazydog)

['lazydog.', 'lazy', 'lazy', 'lazy$!@dog.']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall(r'\w{1,3}', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `r'\w{1,3}'` within the string "lazydog".
* `r'\w{1,3}'` is a raw string regular expression pattern.
* `\w` matches any word character (letters, digits, and underscores).
* `{1,3}` is a quantifier that matches between 1 and 3 occurrences of the preceding element (in this case, any word character).
* The output shows a list of words extracted from the string "lazydog" with each word having a length between 1 and 3 characters.

`re.findall(r'\w{1,3}?', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `r'\w{1,3}?'` within the string "lazydog".
* `r'\w{1,3}'` is a raw string regular expression pattern.
* `\w`matches any word character (letters, digits, and underscores).
* `{1,3}` is a quantifier that matches between 1 and 3 occurrences of the preceding element (in this case, any word character).
* `'?'` after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
* The output shows a list of individual characters extracted from the string "lazydog", as the non-greedy quantifier ensures that only single characters are matched.

**Summary**
* Both cells use the `re.findall()` function to extract substrings from the input string "lazydog". The difference lies in the quantifier `?` in the second cell. This non-greedy quantifier causes the pattern to match the shortest possible substrings that satisfy the condition, resulting in individual characters instead of words.

In [55]:
re.findall(r'\w{1,3}', lazydog)

['The',
 'qui',
 'ck',
 'bro',
 'wn',
 'fox',
 'jum',
 'ps',
 'ove',
 'r',
 'the',
 'laz',
 'ydo',
 'g',
 'The',
 'qui',
 'ck',
 'bro',
 'wn',
 'fox',
 'jum',
 'ps',
 'ove',
 'r',
 'the',
 'laz',
 'y',
 'dog',
 'The',
 'qui',
 'ck',
 'bro',
 'wn',
 'the',
 'n',
 'fox',
 '123',
 'abc',
 '123',
 '999',
 'xyz',
 'jum',
 'ps',
 'he',
 'ove',
 'r',
 'the',
 'he',
 'laz',
 'y',
 'she',
 'dog',
 'The',
 'qui',
 'ck',
 'bro',
 'wn',
 'fox',
 'jum',
 'ps',
 'ove',
 'r',
 'the',
 'laz',
 'y',
 'dog']

In [56]:
re.findall(r'\w{1,3}?', lazydog)

['T',
 'h',
 'e',
 'q',
 'u',
 'i',
 'c',
 'k',
 'b',
 'r',
 'o',
 'w',
 'n',
 'f',
 'o',
 'x',
 'j',
 'u',
 'm',
 'p',
 's',
 'o',
 'v',
 'e',
 'r',
 't',
 'h',
 'e',
 'l',
 'a',
 'z',
 'y',
 'd',
 'o',
 'g',
 'T',
 'h',
 'e',
 'q',
 'u',
 'i',
 'c',
 'k',
 'b',
 'r',
 'o',
 'w',
 'n',
 'f',
 'o',
 'x',
 'j',
 'u',
 'm',
 'p',
 's',
 'o',
 'v',
 'e',
 'r',
 't',
 'h',
 'e',
 'l',
 'a',
 'z',
 'y',
 'd',
 'o',
 'g',
 'T',
 'h',
 'e',
 'q',
 'u',
 'i',
 'c',
 'k',
 'b',
 'r',
 'o',
 'w',
 'n',
 't',
 'h',
 'e',
 'n',
 'f',
 'o',
 'x',
 '1',
 '2',
 '3',
 'a',
 'b',
 'c',
 '1',
 '2',
 '3',
 '9',
 '9',
 '9',
 'x',
 'y',
 'z',
 'j',
 'u',
 'm',
 'p',
 's',
 'h',
 'e',
 'o',
 'v',
 'e',
 'r',
 't',
 'h',
 'e',
 'h',
 'e',
 'l',
 'a',
 'z',
 'y',
 's',
 'h',
 'e',
 'd',
 'o',
 'g',
 'T',
 'h',
 'e',
 'q',
 'u',
 'i',
 'c',
 'k',
 'b',
 'r',
 'o',
 'w',
 'n',
 'f',
 'o',
 'x',
 'j',
 'u',
 'm',
 'p',
 's',
 'o',
 'v',
 'e',
 'r',
 't',
 'h',
 'e',
 'l',
 'a',
 'z',
 'y',
 'd',
 'o',
 'g']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall(r'\w{1,3}', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `r'\w{1,3}'` within the string "lazydog".
* `r'\w{1,3}'` is a raw string regular expression pattern.
* `\w` matches any word character (letters, digits, and underscores).
* `{1,3}` is a quantifier that matches between 1 and 3 occurrences of the preceding element (in this case, any word character).
* The output shows a list of words extracted from the string "lazydog" with each word having a length between 1 and 3 characters.

`re.findall(r'\w{1,3}?', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `r'\w{1,3}?'` within the string "lazydog".
* `r'\w{1,3}'` is a raw string regular expression pattern.
* `\w` matches any word character (letters, digits, and underscores).
* `{1,3}` is a quantifier that matches between 1 and 3 occurrences of the preceding element (in this case, any word character).
* `'?'` after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
* The output shows a list of individual characters extracted from the string "lazydog", as the non-greedy quantifier ensures that only single characters are matched.

**Summary**
* Both cells use the re.findall() function to extract substrings from the input string "lazydog". The difference lies in the quantifier ? in the second cell. This non-greedy quantifier causes the pattern to match the shortest possible substrings that satisfy the condition, resulting in individual characters instead of words.

In [57]:
re.findall(r'\w{,3}', lazydog)

['The',
 '',
 'qui',
 'ck',
 '',
 'bro',
 'wn',
 '',
 'fox',
 '',
 'jum',
 'ps',
 '',
 'ove',
 'r',
 '',
 'the',
 '',
 'laz',
 'ydo',
 'g',
 '',
 '',
 'The',
 '',
 'qui',
 'ck',
 '',
 'bro',
 'wn',
 '',
 'fox',
 '',
 'jum',
 'ps',
 '',
 'ove',
 'r',
 '',
 'the',
 '',
 'laz',
 'y',
 '',
 'dog',
 '',
 '',
 'The',
 '',
 'qui',
 'ck',
 '',
 'bro',
 'wn',
 '',
 'the',
 'n',
 '',
 'fox',
 '',
 '123',
 '',
 'abc',
 '123',
 '',
 '999',
 'xyz',
 '',
 'jum',
 'ps',
 '',
 'he',
 '',
 'ove',
 'r',
 '',
 'the',
 '',
 '',
 'he',
 '',
 'laz',
 'y',
 '',
 '',
 '',
 '',
 '',
 '',
 'she',
 '',
 'dog',
 '',
 '',
 'The',
 '',
 'qui',
 'ck',
 '',
 'bro',
 'wn',
 '',
 'fox',
 '',
 'jum',
 'ps',
 '',
 'ove',
 'r',
 '',
 'the',
 '',
 'laz',
 'y',
 '',
 '',
 '',
 'dog',
 '',
 '']

In [58]:
re.findall(r'\w{,3}?', lazydog)

['',
 'T',
 '',
 'h',
 '',
 'e',
 '',
 '',
 'q',
 '',
 'u',
 '',
 'i',
 '',
 'c',
 '',
 'k',
 '',
 '',
 'b',
 '',
 'r',
 '',
 'o',
 '',
 'w',
 '',
 'n',
 '',
 '',
 'f',
 '',
 'o',
 '',
 'x',
 '',
 '',
 'j',
 '',
 'u',
 '',
 'm',
 '',
 'p',
 '',
 's',
 '',
 '',
 'o',
 '',
 'v',
 '',
 'e',
 '',
 'r',
 '',
 '',
 't',
 '',
 'h',
 '',
 'e',
 '',
 '',
 'l',
 '',
 'a',
 '',
 'z',
 '',
 'y',
 '',
 'd',
 '',
 'o',
 '',
 'g',
 '',
 '',
 '',
 'T',
 '',
 'h',
 '',
 'e',
 '',
 '',
 'q',
 '',
 'u',
 '',
 'i',
 '',
 'c',
 '',
 'k',
 '',
 '',
 'b',
 '',
 'r',
 '',
 'o',
 '',
 'w',
 '',
 'n',
 '',
 '',
 'f',
 '',
 'o',
 '',
 'x',
 '',
 '',
 'j',
 '',
 'u',
 '',
 'm',
 '',
 'p',
 '',
 's',
 '',
 '',
 'o',
 '',
 'v',
 '',
 'e',
 '',
 'r',
 '',
 '',
 't',
 '',
 'h',
 '',
 'e',
 '',
 '',
 'l',
 '',
 'a',
 '',
 'z',
 '',
 'y',
 '',
 '',
 'd',
 '',
 'o',
 '',
 'g',
 '',
 '',
 '',
 'T',
 '',
 'h',
 '',
 'e',
 '',
 '',
 'q',
 '',
 'u',
 '',
 'i',
 '',
 'c',
 '',
 'k',
 '',
 '',
 'b',
 '',
 'r',
 '',
 'o',
 '',

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall(r'\w{2,}', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `r'\w{2,}'` within the string "lazydog".
* `r'\w{2,}'` is a raw string regular expression pattern.
* `\w` matches any word character (letters, digits, and underscores).
* `{2,}` is a quantifier that matches two or more occurrences of the preceding element (in this case, any word character).
* The output shows a list of words extracted from the string, where each word has a length of two or more characters

`re.findall(r'\w{2,}?', 'lazydog')`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `r'\w{2,}?'` within the string "lazydog".
r'\w{2,}' is a raw string regular expression pattern.
* `\w` matches any word character (letters, digits, and underscores).
* `{2,}` is a quantifier that matches two or more occurrences of the preceding element (in this case, any word character).
* `'?'` after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
* The output shows a list of substrings extracted from the string, where each substring has a length of two or more characters, but the non-greedy quantifier ensures that the substrings are as short as possible.

`re.findall(r'[a-c]{1,2}?', 'lazydog')`
* This line uses the re.findall() function from the re module to find all non-overlapping occurrences of the pattern r'[a-c]{1,2}?' within the string "lazydog".

`r'[a-c]{1,2}?' is a raw string regular expression pattern`.
* [a-c] is a character class that matches any single character between 'a' and 'c' (i.e., 'a', 'b', or 'c').
{1,2} is a quantifier that matches between 1 and 2 occurrences of the preceding element (in this case, any character within the character class).
'?' after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
In the string "lazydog", there is one occurrence of two consecutive characters within the specified range: "ab". Therefore, the output of this code is ['ab'].

`re.findall(r'[a-c]{1,2}?', 'lazydog')`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `r'[a-c]{1,2}?'` within the string "lazydog".
* `r'[a-c]{1,2}?'` is a raw string regular expression pattern.
* `[a-c]` is a character class that matches any single character between * 'a' and 'c' (i.e., 'a', 'b', or 'c').
* `{1,2}` is a quantifier that matches between 1 and 2 occurrences of the preceding element (in this case, any character within the character class).
* `'?'` after the quantifier makes it non-greedy, meaning it will try to match as few characters as possible while still satisfying the pattern.
* In the string "lazydog", there is one occurrence of two consecutive characters within the specified range: "ab". Therefore, the output of this code is ['ab'].
* The code successfully extracts the substring "ab" from the string "lazydog" using the regular expression pattern that matches one or two consecutive characters between 'a' and 'c'.



**Summary**
* Both cells use the re.findall() function to extract substrings from the input string "lazydog" based on the number of word characters. The difference lies in the non-greedy quantifier '?' in the second cell, which results in shorter substrings being extracted compared to the first cell.

In [59]:
re.findall(r'\w{2,}', lazydog)

['The',
 'quick',
 'brown',
 'fox',
 'jumps',
 'over',
 'the',
 'lazydog',
 'The',
 'quick',
 'brown',
 'fox',
 'jumps',
 'over',
 'the',
 'lazy',
 'dog',
 'The',
 'quick',
 'brown',
 'then',
 'fox',
 '123',
 'abc123',
 '999xyz',
 'jumps',
 'he',
 'over',
 'the',
 'he',
 'lazy',
 'she',
 'dog',
 'The',
 'quick',
 'brown',
 'fox',
 'jumps',
 'over',
 'the',
 'lazy',
 'dog']

In [60]:
re.findall(r'\w{2,}?', lazydog)

['Th',
 'qu',
 'ic',
 'br',
 'ow',
 'fo',
 'ju',
 'mp',
 'ov',
 'er',
 'th',
 'la',
 'zy',
 'do',
 'Th',
 'qu',
 'ic',
 'br',
 'ow',
 'fo',
 'ju',
 'mp',
 'ov',
 'er',
 'th',
 'la',
 'zy',
 'do',
 'Th',
 'qu',
 'ic',
 'br',
 'ow',
 'th',
 'en',
 'fo',
 '12',
 'ab',
 'c1',
 '23',
 '99',
 '9x',
 'yz',
 'ju',
 'mp',
 'he',
 'ov',
 'er',
 'th',
 'he',
 'la',
 'zy',
 'sh',
 'do',
 'Th',
 'qu',
 'ic',
 'br',
 'ow',
 'fo',
 'ju',
 'mp',
 'ov',
 'er',
 'th',
 'la',
 'zy',
 'do']

In [61]:
re.findall(r'[a-c]{2,}?', lazydog)

['ab']

# Metacharacters

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`doobi = """..."""`
* This line defines a multiline string named `doobi` containing the lyrics of the song. The `"""` (triple quotes) are used to define multiline strings in Python.

`re.findall(r'[Dd]oobi', doobi)`
* This line uses the `re.findall()` function from the re module to find all non-overlapping occurrences of the pattern r'[Dd]oobi' within the string doobi.
* `r'[Dd]oobi'` is a raw string regular expression pattern.
* `[Dd]` is a character class that matches either the uppercase 'D' or the lowercase 'd'.
* `oobi` matches the literal string "oobi".
* The output shows a list of all the occurrences of "Doobi" and "doobi" found within the lyrics.

`re.findall(r'Doobidoobi', doobi)`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `r'Doobidoobi'` within the string doobi.
r'Doobidoobi' is a raw string regular expression pattern that matches the literal string "Doobidoobi".
* The output shows a list of all the occurrences of "Doobidoobi" found within the lyrics, which is the same as the output of the previous cell because "Doobidoobi" doesn't occur in the lyrics.



In [62]:
doobi = """Doobi doobi dapp dapp(3x)
Doobi doobi dipp dipp
Doobi doobi dapp dapp
Du dapp da dapp
Bee-beep beep beep beep beep beep beep beep
Dibby dabba
Dooboo dabba (3x)
Beep beep beep beep beep beep beep beep
Sabi ng jeep, sabi ng jeep, sabi ng
Bee bee bee bee bee bee bee bee bee bee bee beep (fast)
Pubo purro bap bap
Purro pab bap bap jollibee
Beep (18x)
Beep, beep, beep, beep"""

In [63]:
re.findall('[Dd]oobi', doobi)

['Doobi', 'doobi', 'Doobi', 'doobi', 'Doobi', 'doobi']

In [64]:
re.findall('Doobi|doobi', doobi)

['Doobi', 'doobi', 'Doobi', 'doobi', 'Doobi', 'doobi']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('Beep beep', doobi)`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern 'Beep beep' within the string `doobi`.
* `'Beep beep'` is a literal string pattern that matches the exact sequence "Beep beep".
* The output shows a list of all the occurrences of the exact string "Beep beep" found within the lyrics.

`re.findall(' [Bb]eep', doobi)`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern `[Bb]eep'` within the string doobi.
* `' [Bb]eep'` is a regular expression pattern.
*` (space)` matches a single space character.
* `[Bb]` is a character class that matches either the uppercase 'B' or the lowercase 'b'.
* `eep` matches the literal string "eep".
* The output shows a list of all the occurrences of " beep" and " Beep" found within the lyrics, capturing the word "beep" with either an uppercase or lowercase "B" preceded by a space.


In [65]:
re.findall('Beep|beep', doobi)

['beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'Beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'Beep',
 'Beep',
 'beep',
 'beep',
 'beep']

In [66]:
re.findall('[Bb]eep', doobi)

['beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'Beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'Beep',
 'Beep',
 'beep',
 'beep',
 'beep']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall(r'[Dd]oobi|[Bb]eep', doobi)`
* This line uses the `re.findall()` function from the `re` module to find all non-overlapping occurrences of the pattern `r'[Dd]oobi|[Bb]eep'` within the string `doobi`.

* `r'[Dd]oobi|[Bb]eep'` is a regular expression pattern.
* `[Dd]oobi` matches either "Doobi" or "doobi".
* `|` is the OR operator, meaning it matches either the pattern before or after it.
* `[Bb]eep` matches either "Beep" or "beep".
* The output shows a list of all the occurrences of "Doobi", "doobi", "Beep", and "beep" found within the lyrics.

`re.findall(r'^\s*[Dd]oobi', doobi)`
* This line uses the `re.findall()` function to find all non-overlapping occurrences of the pattern r'^\s*[Dd]oobi' within the string doobi.
* `r'^\s*[Dd]oobi'` is a regular expression pattern.
* `^` matches the beginning of the string.
* `\s*` matches zero or more whitespace characters (spaces, tabs, newlines).
* `[Dd]oobi` matches either "Doobi" or "doobi".
*  The output shows a list of all the occurrences of "Doobi" or "doobi" that are at the beginning of a line (possibly preceded by whitespace).


* Both cells use the `re.findall()` function to extract specific patterns from the lyrics stored in the `doobi` string. The first cell searches for all occurrences of "Doobi", "doobi", "Beep", and "beep". The second cell searches for occurrences of "Doobi" or "doobi" only at the beginning of lines, allowing for potential whitespace before the word.

In [67]:
re.findall('[Dd]oobi|[Bb]eep', doobi)

['Doobi',
 'doobi',
 'Doobi',
 'doobi',
 'Doobi',
 'doobi',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'Beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'beep',
 'Beep',
 'Beep',
 'beep',
 'beep',
 'beep']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall(r'^\s*[Dd]oobi', doobi, flags=re.M)`
* This line uses the `re.findall()` function with the `re.M` flag to find all non-overlapping occurrences of the pattern `r'^\s*[Dd]oobi' within the string `doobi`.
* `re.M` is the `MULTILINE` flag, which modifies the behavior of the `^` and `$` anchors. With this flag:
* `^` matches the beginning of each line, not just the beginning of the entire string.
* This allows the pattern to find occurrences of "Doobi" or "doobi" at the beginning of any line within the multiline string `doobi`.

`re.findall(r'^\s*[Dd]oobi', doobi, flags=re.MULTILINE)`
* This line uses the re.findall()` function with the `re.MULTILINE` flag to find all non-overlapping occurrences of the pattern `r'^\s*[Dd]oobi'` within the string doobi.
* `re.MULTILINE` is the same as `re.M`, the `MULTILINE` flag. It modifies the behavior of the `^` and `$` anchors so that `^` matches the beginning of each line.
* This allows the pattern to find occurrences of "Doobi" or "doobi" at the beginning of any line within the multiline string `doobi`.

* All three cells use the `re.findall()` function to extract occurrences of "Doobi" or "doobi" from the lyrics stored in the `doobi` string.
* Cell 1 searches only for occurrences at the beginning of the entire string.
* Cells 2 and 3 use the `re.M` or `re.MULTILINE` flag to search for occurrences at the beginning of each line within the string, which results in the same output as they both use the `re.MULTILINE` flag.

In [68]:
re.findall('^[Dd]oobi', doobi)

['Doobi']

In [69]:
re.findall('^[Dd]oobi', doobi, flags=re.M)

['Doobi', 'Doobi', 'Doobi']

In [70]:
re.findall('^[Dd]oobi', doobi, flags=re.MULTILINE)

['Doobi', 'Doobi', 'Doobi']

<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

`re.findall('(?m)^[Dd]oobi', doobi)`
* `(?m)`: Enables multiline mode, so ^ matches the start of each line (not just the beginning of the string).
* `[Dd]`: Matches either an uppercase D or lowercase d.
* `oobi`: Matches the exact sequence "oobi".
* Overall: Matches lines starting with "Doobi" or "doobi".
* Input (`doobi`): Likely a string with multiple lines.
* Output: `['Doobi', 'Doobi', 'Doobi']`
* Indicates that "Doobi" appeared at the beginning of three different lines

`Regex Pattern: [Bb]eep$`
* `[Bb]`: Matches either an uppercase B or lowercase b.
* `eep$`: Matches the exact sequence "eep" at the end of a line (due to $).
* Input (`oobi`: The same multiline string as before.
* Output: `['beep']`
* Indicates that a single line ends with "beep".

`Regex Pattern: [Bb]eep$ (same as the previous cell)`.
* Flags: `re.M` (multiline mode):
* Causes `$` to match the end of each line, not just the end of the entire string.
* Input (doobi): The same multiline string.
* Output: `['beep', 'beep', 'beep']`
* Indicates that "beep" appeared at the end of three separate lines


In [71]:
re.findall('(?m)^[Dd]oobi', doobi)

['Doobi', 'Doobi', 'Doobi']

In [72]:
re.findall('[Bb]eep$', doobi)

['beep']

In [73]:
re.findall('[Bb]eep$', doobi, flags=re.M)

['beep', 'beep', 'beep']


<div style="border: 2px solid black; padding: 10px; border-radius: 5px;"> 

Regex Pattern: `[Bb]ee\w*`
* `[Bb]`: Matches "B" or "b".
* `ee`: Matches the exact sequence "ee".
* `\w*`: Matches zero or more word characters (letters, digits, or underscores).
* Flags: `re.M` (multiline mode):
* Ensures the regex can match patterns across individual lines.
* Input (`doobi`): Likely a string containing multiple lines.
* Output: A list of all words starting with "Bee" or "bee", followed by any word characters (`\w*`).
* The output includes variations like "Bee" and multiple "beep" entries, indicating matches on various lines.

Regex Pattern: `\b\w+\b`
* `\b`: Matches a word boundary (the start or end of a word).
* `\w+`: Matches one or more word characters.
* `\b`: Ensures the match ends at a word boundary.
* Input String: `'Hi, p0whz'`
* Contains the words "Hi" and "p0whz", separated by a comma and space.
Output: `['Hi', 'p0whz']`
* Matches all individual words in the string. It ignores punctuation (like the comma).

**Key Differences**
* First Cell: Matches words starting with "Bee" or "bee" and includes any additional characters in the same word. The re.M flag ensures the search occurs across multiple lines.
* Second Cell: Extracts all individual words from a single string, ensuring each match is a complete word bounded by non-word characters. It doesn't involve multiline functionality.

In [None]:
re.findall(r'[Bb]ee\w*', doobi, flags=re.M)

In [81]:
re.findall(r'\b\w+\b', 'Hi, p0whz')

['Hi', 'p0whz']