## 1.Question

What is the name of the feature responsible for generating Regex objects?

## Answer

In Python, the `re` module is responsible for generating Regex (regular expression) objects. The `re` module provides support for working with regular expressions, which are powerful tools for pattern matching and text manipulation.

To use regular expressions in Python, you first need to import the `re` module. Then, you can use functions like `re.compile()` to create a Regex object that represents a regular expression pattern.

Example:


In [3]:
import re

pattern = re.compile(r'\b[A-Za-z]+\b')

## 2. Question

Why do raw strings often appear in Regex objects?

## Answer

Raw strings (strings prefixed with `r`) are often used in Regex objects to avoid issues with backslashes (`\`). In Python, backslashes are used as escape characters in regular strings, which can lead to problems when working with regex patterns that also use backslashes as metacharacters.

For example, consider the regex pattern `\d+`, which matches one or more digits. In a regular string, you would need to escape the backslash like this: `\\d+`. However, in a raw string, you can use `\d+` directly without escaping the backslash: `r'\d+'`.

Using raw strings makes regex patterns more readable and reduces the likelihood of errors caused by incorrect handling of backslashes. Therefore, it's a common practice to use raw strings in Regex objects in Python.


## 3. Question

What is the return value of the `search()` method?

## Answer

The `search()` method in the `re` module in Python is used to search for a specified pattern in a string. It returns a match object if the pattern is found, and `None` if the pattern is not found.

A match object contains information about the match, including the matched string, the start and end positions of the match in the input string, and any captured groups.

Example:

In [8]:
import re

pattern = r'\b\d{3}\b'
text = 'The code is 123 and 456'
match = re.search(pattern, text)

if match:
    print('Match found:', match.group())
else:
    print('No match found')


Match found: 123


## 4. Question

From a Match item, how do you get the actual strings that match the pattern?

## Answer

In Python's `re` module, when you use the `search()` or `findall()` method to find matches for a pattern in a string, you get a Match object. To get the actual strings that match the pattern from a Match object, you can use the `group()` method.

The `group()` method returns the string matched by the regular expression. If the regular expression contains capturing groups (defined by parentheses `()`), you can specify the group number as an argument to `group()` to get the string matched by that specific group.

Example:


In [9]:
import re

pattern = r'\b\d{3}\b'
text = 'The code is 123 and 456'
match = re.search(pattern, text)

if match:
    print('Match found:', match.group())
else:
    print('No match found')


Match found: 123


If you have capturing groups in your regular expression pattern, you can use match.group(n) to get the string matched by the n-th group, where n is the group number (starting from 1).

In [12]:
pattern = r'(\b\d{3}\b) and (\b\d{3}\b)'
text = 'The code is 123 and 456'
match = re.search(pattern, text)

if match:
    print('First group:', match.group(1))
    print('Second group:', match.group(2))
else:
    print('No match found')


First group: 123
Second group: 456


## 5.Question

In the regex which is created from `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`, what does group zero cover? Group 2? Group 1?

## Answer

In a regex pattern, the groups are defined by parentheses `()`. In the regex `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`, there are two groups defined:

- Group 0 (the entire match): The entire match of the pattern is considered as group 0. In this case, group 0 covers the entire matched string, including both the area code and the phone number, separated by a hyphen. To access group 0, you can use `match.group(0)`.

- Group 1: The first group, `(\d\d\d)`, matches and captures three digits (the area code) before the hyphen. To access group 1, you can use `match.group(1)`.

- Group 2: The second group, `(\d\d\d-\d\d\d\d)`, matches and captures the seven digits of the phone number after the hyphen. To access group 2, you can use `match.group(2)`.

Example :

In [13]:
import re

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
text = 'My phone number is 123-456-7890'
match = re.search(pattern, text)

if match:
    print('Group 0 (entire match):', match.group(0))
    print('Group 1 (area code):', match.group(1))
    print('Group 2 (phone number):', match.group(2))
else:
    print('No match found')


Group 0 (entire match): 123-456-7890
Group 1 (area code): 123
Group 2 (phone number): 456-7890


## 6.Question

In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to match real parentheses and periods?

## Answer

In regex syntax, certain characters have special meanings (metacharacters), such as parentheses `(` and `)`, which are used for grouping, and periods `.`, which match any single character except newline. If you want to match these characters literally, you can use a backslash `\` before the character to escape it.

To match a literal parentheses or period in a regex pattern, you can use the backslash `\` to escape the character. For example, to match a literal opening parenthesis `(`, you would use `\(`, and to match a literal period `.`, you would use `\.`.

Example:

In [15]:
import re

pattern = r'\(\d{3}\) \d{3}\.\d{4}'
text = 'My phone number is (123) 456.7890'
match = re.search(pattern, text)

if match:
    print('Match found:', match.group())
else:
    print('No match found')


Match found: (123) 456.7890


## 7. Question

The `findall()` method returns a string list or a list of string tuples. What causes it to return one of the two options?

## Answer

The `findall()` method in the `re` module in Python returns a list of all non-overlapping matches of a pattern in a string. The return value of `findall()` depends on whether the pattern contains capturing groups (defined by parentheses `()`).

1. If the pattern contains no capturing groups, `findall()` returns a list of strings, where each string is a match of the pattern in the input string.

Example:


In [18]:
import re

text = 'Hello 123 and 456'
pattern = r'\d+'
matches = re.findall(pattern, text)
print(matches)

['123', '456']


2. If the pattern contains capturing groups, `findall()` returns a list of tuples, where each tuple contains the matched strings for each capturing group.

Example:

In [24]:
import re

text = 'John has 2 apples, and Mary has 3 oranges.'
pattern = r'(\w+) has (\d+) (\w+)'
matches = re.findall(pattern, text)
print(matches)


[('John', '2', 'apples'), ('Mary', '3', 'oranges')]


## 8. Question

In standard expressions, what does the `|` character mean?

## Answer

In standard expressions, specifically in programming languages like Python, the `|` character is used as a bitwise OR operator. It performs a bitwise OR operation between two integers.

For example:

In [31]:
a = 5  
b = 3  
result = a | b  
print(result)   


7


## 9.Question

In regular expressions, what does the `|` character mean?

## Answer

In regular expressions (regex), the `|` character is used as a logical OR operator. It allows you to specify multiple alternatives for a pattern. The `|` character matches either the pattern on its left or the pattern on its right.

For example, the regex `r'cat|dog'` matches either the string "cat" or the string "dog". If either "cat" or "dog" is found in the input string, it is considered a match.

Example:

In [32]:
import re

text = 'I have a cat and a dog'
pattern = r'cat|dog'
matches = re.findall(pattern, text)
print(matches)

['cat', 'dog']


## 10.Question

In regular expressions, what is the difference between the `+` and `*` characters?

## Answer

In regular expressions (regex), the `+` and `*` characters are quantifiers that specify how many times a preceding character or group can occur in a match.

- The `+` character matches one or more occurrences of the preceding character or group. It requires at least one occurrence for a match.

Example: 


In [43]:
import re

text = 'aaaabbbb'
pattern = r'a+b+'
matches = re.findall(pattern, text)
print(matches)

['aaaabbbb']


- The `*` character matches zero or more occurrences of the preceding character or group. It allows for zero occurrences as well.

In [44]:
import re

text = 'aaaabbbb'
pattern = r'a*b*'
matches = re.findall(pattern, text)
print(matches)


['aaaabbbb', '']


## 11.Question

What is the difference between `{4}` and `{4,5}` in regular expressions?

## Answer

In regular expressions (regex), `{4}` and `{4,5}` are quantifiers that specify the exact number of occurrences of a preceding character or group.

- `{4}` specifies exactly 4 occurrences of the preceding character or group.

Example: 

In [49]:
import re

text = 'aaaabbbb'
pattern = r'a{4}b{4}'
matches = re.findall(pattern, text)
print(matches)

['aaaabbbb']


- `{4,5}` specifies a range of occurrences, from 4 to 5, of the preceding character or group.

In [59]:
import re

text = 'aaaabbbb'
pattern = r'a{4,5}b{4,5}'
matches = re.findall(pattern, text)
print(matches)


['aaaabbbb']


## 12. Question

What do you mean by the `\d`, `\w`, and `\s` shorthand character classes signify in regular expressions?

## Answer

In regular expressions, the `\d`, `\w`, and `\s` are shorthand character classes that represent certain types of characters:

- `\d`: Represents any digit character. It is equivalent to `[0-9]`.
- `\w`: Represents any word character. It is equivalent to `[a-zA-Z0-9_]`. Word characters include letters, digits, and the underscore.
- `\s`: Represents any whitespace character. It includes spaces, tabs, and newline characters.

These shorthand character classes are often used to match specific types of characters in a regular expression pattern. For example, the pattern `\d{3}-\d{3}-\d{4}` can be used to match a phone number in the format "###-###-####", where each `#` represents a digit character.


In [68]:
import re

text = 'abc 123 def'
pattern = r'\d+'
matches = re.findall(pattern, text)
print(matches)


['123']


In [69]:
import re

text = 'hello_world 123'
pattern = r'\w+'
matches = re.findall(pattern, text)
print(matches)


['hello_world', '123']


In [70]:
import re

text = 'hello\tworld\n123'
pattern = r'\s+'
matches = re.findall(pattern, text)
print(matches)


['\t', '\n']


## 13.Question

What do the `\D`, `\W`, and `\S` shorthand character classes signify in regular expressions?

## Answer

In regular expressions, the `\D`, `\W`, and `\S` are shorthand character classes that represent the negation of `\d`, `\w`, and `\s` respectively:

- `\D`: Represents any non-digit character. It is equivalent to `[^0-9]`.
- `\W`: Represents any non-word character. It is equivalent to `[^a-zA-Z0-9_]`. Non-word characters include symbols and punctuation.
- `\S`: Represents any non-whitespace character. It is equivalent to `[^\s]`. Non-whitespace characters include any character other than space, tab, and newline.

These shorthand character classes are useful for matching characters that are not of a certain type. For example, `\D+` can be used to match one or more non-digit characters in a string.


In [62]:
import re

text = '123abc456'
pattern = r'\D+'
matches = re.findall(pattern, text)
print(matches)


['abc']


In [67]:
import re

text = 'hello!world123'
pattern = r'\W+'
matches = re.findall(pattern, text)
print(matches)


['!']


In [66]:
import re

text = 'hello world'
pattern = r'\S+'
matches = re.findall(pattern, text)
print(matches)


['hello', 'world']


## 14.Question

What is the difference between `.*?` and `.*` in regular expressions?

## Answer

In regular expressions, `.*?` and `.*` are both used to match any character (except for newline) zero or more times, but they differ in their behavior:

- `.*?`: Matches as few characters as possible, i.e., it performs a non-greedy or lazy match. It will try to match the smallest possible substring that satisfies the rest of the pattern.

- `.*`: Matches as many characters as possible, i.e., it performs a greedy match. It will try to match the largest possible substring that satisfies the rest of the pattern.

For example, consider the string `'abc def ghi'` and the pattern `r'(.*)'`:

- Using `.*`, the pattern will match the entire string `'abc def ghi'` because it greedily consumes all characters.
- Using `.*?`, the pattern will match only `'abc'` because it lazily consumes characters, stopping at the first occurrence of a space.

Example :

In [71]:
import re

text = 'abc def ghi'
pattern_greedy = r'(.*)'
pattern_non_greedy = r'(.*?)'

print(re.findall(pattern_greedy, text))      
print(re.findall(pattern_non_greedy, text))   

['abc def ghi', '']
['', 'a', '', 'b', '', 'c', '', ' ', '', 'd', '', 'e', '', 'f', '', ' ', '', 'g', '', 'h', '', 'i', '']


## 15.Question

What is the syntax for matching both numbers and lowercase letters with a character class?

## Answer

To match both numbers and lowercase letters with a character class in a regular expression, you can use the following syntax:


In [79]:
import re

text = 'a1b2c3'
pattern = r'[a-z0-9]+'
matches = re.findall(pattern, text)
print(matches)

['a1b2c3']


## 16.Question

What is the procedure for making a regular expression in regex case insensitive?

## Answer

To make a regular expression case insensitive in regex, you can use the `re.IGNORECASE` flag or `re.I` shorthand in Python. This flag tells the regex engine to ignore case when matching characters.

Here's how you can use it:

In [82]:
import re

text = 'Hello World'
pattern = r'hello'
matches = re.findall(pattern, text, re.IGNORECASE)
print(matches)

['Hello']


## 17.Question

What does the `.` character normally match? What does it match if `re.DOTALL` is passed as the second argument in `re.compile()`?

## Answer

- Normally, the `.` character in a regular expression matches any character except a newline (`\n`). It matches any single character, including letters, digits, whitespace, and symbols, but not newline characters.

- If `re.DOTALL` (or `re.S`) is passed as the second argument in `re.compile()`, it changes the behavior of the `.` character to match any character, including newline (`\n`). This flag allows the dot to match newline characters as well, making it match across multiple lines.

Example:


In [83]:
import re

text = 'hello\nworld'
pattern_normal = r'.+'
pattern_dotall = r'.+'
regex_normal = re.compile(pattern_normal)
regex_dotall = re.compile(pattern_dotall, re.DOTALL)

print('Normal match:', regex_normal.findall(text))

print('DOTALL match:', regex_dotall.findall(text))

Normal match: ['hello', 'world']
DOTALL match: ['hello\nworld']


## 18. Question

If `numReg = re.compile(r'\d+')`, what will `numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')` return?

## Answer

The `sub()` method in the `re` module is used for replacing occurrences of a pattern in a string. In this case, `numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')` will replace all sequences of digits in the input string with the letter 'X'. The method will return a new string with the replacements applied.


In [84]:
import re

numReg = re.compile(r'\d+')
result = numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')
print(result)

X drummers, X pipers, five rings, X hen


## 19. Question

What does passing `re.VERBOSE` as the second argument to `re.compile()` allow you to do?

## Answer

Passing `re.VERBOSE` as the second argument to `re.compile()` allows you to write regular expressions in a more readable and organized manner by ignoring whitespace and comments. This flag enables verbose mode, where whitespace within the pattern is ignored unless it is escaped or within a character class.

Example :


In [85]:
import re

pattern = re.compile(r'''
    (\d{3}|\(\d{3}\))?  # area code
    (\s|-|\.)?          # separator
    \d{3}               # first 3 digits
    (\s|-|\.)           # separator
    \d{4}               # last 4 digits
''', re.VERBOSE)

text = 'Phone numbers: 123-456-7890, (123) 456-7890, 123 456 7890, 123.456.7890'

matches = pattern.findall(text)
for match in matches:
    print(match)

('123', '-', '-')
('(123)', ' ', '-')
('123', ' ', ' ')
('123', '.', '.')


## 20.Question

How would you write a regex that matches a number with commas for every three digits? It must match the following:
- '42'
- '1,234'
- '6,368,745'

but not the following:
- '12,34,567' (which has only two digits between the commas)
- '1234' (which lacks commas)

## Answer

Following regex pattern is to match numbers with commas for every three digits:


In [86]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')

numbers = ['42', '1,234', '6,368,745', '12,34,567', '1234']
for number in numbers:
    if pattern.match(number):
        print(f'Match: {number}')
    else:
        print(f'No match: {number}')


Match: 42
Match: 1,234
Match: 6,368,745
No match: 12,34,567
No match: 1234


## 21.Question

How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
- 'Haruto Watanabe'
- 'Alice Watanabe'
- 'RoboCop Watanabe'

but not the following:
- 'haruto Watanabe' (where the first name is not capitalized)
- 'Mr. Watanabe' (where the preceding word has a non-letter character)
- 'Watanabe' (which has no first name)
- 'Haruto watanabe' (where Watanabe is not capitalized)

## Answer


In [90]:
import re

pattern = re.compile(r'[A-Z][a-zA-Z]*\sWatanabe')

names = [
    'Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe',
    'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe'
]

for name in names:
    if pattern.match(name):
        print(f'Match: {name}')
    else:
        print(f'No match: {name}')

Match: Haruto Watanabe
Match: Alice Watanabe
Match: RoboCop Watanabe
No match: haruto Watanabe
No match: Mr. Watanabe
No match: Watanabe
No match: Haruto watanabe


## 22.Question

How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
- 'Alice eats apples.'
- 'Bob pets cats.'
- 'Carol throws baseballs.'
- 'Alice throws Apples.'
- 'BOB EATS CATS.'

but not the following:
- 'RoboCop eats apples.'
- 'ALICE THROWS FOOTBALLS.'
- 'Carol eats 7 cats.'

## Answer

In [91]:
import re

pattern = re.compile(r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$', re.IGNORECASE)

sentences = [
    'Alice eats apples.', 'Bob pets cats.', 'Carol throws baseballs.',
    'Alice throws Apples.', 'BOB EATS CATS.',
    'RoboCop eats apples.', 'ALICE THROWS FOOTBALLS.', 'Carol eats 7 cats.'
]

for sentence in sentences:
    if pattern.match(sentence):
        print(f'Match: {sentence}')
    else:
        print(f'No match: {sentence}')


Match: Alice eats apples.
Match: Bob pets cats.
Match: Carol throws baseballs.
Match: Alice throws Apples.
Match: BOB EATS CATS.
No match: RoboCop eats apples.
No match: ALICE THROWS FOOTBALLS.
No match: Carol eats 7 cats.
