### 1. What is the name of the feature responsible for generating Regex objects?

The 're' module is responsible for generating regular expression (Regex) objects. The 're' module provides a set of functions and methods that can be used to create and work with regular expressions. The re.compile() function is specifically used to create a Regex object by compiling a regular expression pattern. 

### 2. Why do raw strings often appear in Regex objects?

Raw strings often appear in Regex objects because regular expressions often contain special characters such as backslashes (\), which are also used to represent escape sequences in Python string literals. By using a raw string, you can avoid the need to escape special characters in the regular expression pattern.

In [1]:
import re

pattern = r'\d+'
regex = re.compile(pattern)

### 3. What is the return value of the search() method?

The search() method is used to search a string for a match to a regular expression pattern, and it returns a match object if a match is found, otherwise it returns None. The match object contains information about the match, such as the starting and ending positions of the match within the string.

### 4. From a Match item, how do you get the actual strings that match the pattern?

You can get the actual string(s) that match the pattern from a match object using the group() method. The group() method returns the matched string(s) or a specific subgroup within the match.

### 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?


The regular expression r'(\d\d\d)-(\d\d\d-\d\d\d\d)' creates two capturing groups:

Group 1: (\d\d\d): matches three digits (0-9) separated by hyphen.

Group 2: (\d\d\d-\d\d\d\d): matches four digits (0-9) separated by hyphen.

Group 0 (the entire match) covers the entire string that matches the pattern, which includes both groups 1 and 2, as well as any characters that are matched outside of the capturing groups.

### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

To match literal parentheses and periods (i.e., to treat them as ordinary characters and not as special regex syntax), you can use the backslash character "" to escape them in your regular expression.

Here are some examples:

To match a left parenthesis: '\('.

To match a right parenthesis: '\)'.

To match a period: '\.'.

### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

The findall() method returns a list of all non-overlapping matches of a regular expression in a string. The structure of the resulting list depends on the number of capturing groups in the regular expression:

If the regular expression contains no capturing groups, findall() returns a list of strings, where each string represents a match of the regular expression.

If the regular expression contains one or more capturing groups, findall() returns a list of tuples, where each tuple represents a match of the regular expression, and each element of the tuple corresponds to one of the capturing groups.

### 8. In standard expressions, what does the | character mean?

In standard regular expressions, the | character is a logical OR operator, which allows you to match either one pattern or another. It's called a pipe character or alternation.

### 9. In regular expressions, what does the character stand for?

In regular expressions, the dot . character is called a "wildcard" or "dot metacharacter", and it matches any single character except for a newline character. It can be used to represent any character in a pattern, and can be used to match a wide variety of different strings.

### 10.In regular expressions, what is the difference between the + and * characters?

In regular expressions, the + and * characters are both quantifiers that specify how many times the preceding character or group should be matched.

The * (asterisk) matches zero or more occurrences of the preceding character or group. For example, the regular expression ab*c would match strings that have an a, followed by zero or more b characters, followed by a c. This would match strings like ac, abc, abbc, abbbc, and so on.

The + (plus) matches one or more occurrences of the preceding character or group. For example, the regular expression ab+c would match strings that have an a, followed by one or more b characters, followed by a c. This would match strings like abc, abbc, abbbc, and so on, but would not match the string ac.

### 11. What is the difference between {4} and {4,5} in regular expression?

The {4} specifies that the preceding character or group should be matched exactly 4 times. For example, the regular expression a{4} would match strings that have four consecutive a characters.

The {4,5} specifies that the preceding character or group should be matched between 4 and 5 times. For example, the regular expression a{4,5} would match strings that have either four or five consecutive a characters.

### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

In regular expressions, \d, \w, and \s are shorthand character classes that match certain types of characters:

\d matches any digit character, equivalent to the character class [0-9].

\w matches any word character, which includes alphanumeric characters and underscore (_), equivalent to the character class [a-zA-Z0-9_].

\s matches any whitespace character, including space, tab, and newline.

### 14. What is the difference between .* ? and .*  .?

.* ? is a non-greedy match for any character (except a newline), repeated zero or more times. The ? makes the * quantifier lazy, which means it will match as few characters as possible to satisfy the pattern. For example, in the string abcdef, the pattern a.*?d would match only abcd, not the entire string abcdef.

.* is a greedy match for any character (except a newline), repeated zero or more times. The * quantifier is greedy by default, which means it will match as many characters as possible to satisfy the pattern. For example, in the string abcdef, the pattern a.*d would match the entire string abcdef.

### 15. What is the syntax for matching both numbers and lowercase letters with a character class?

To match both numbers and lowercase letters with a character class, you can use the following syntax: [0-9a-z]

### 16. What is the procedure for making a normal expression in regax case insensitive?

To make a regular expression in regex case-insensitive, you can add the "re.IGNORECASE" or "re.I" flag as the second argument to the re.compile() function. Here's an example:

In [10]:
pattern = re.compile(r"hello", re.IGNORECASE)
matches = pattern.findall("Hello World, hello there!")
print(matches)  

['Hello', 'hello']


### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

In regular expressions, the . character normally matches any character except a newline character. However, if re.DOTALL is passed as the second argument in re.compile(), then the . character will match any character, including a newline character.

### 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

The sub() method of a regular expression object in Python replaces all occurrences of the pattern in a string with a new string. In this case, numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') will replace all occurrences of one or more digits with the character 'X' in the input string '11 drummers, 10 pipers, five rings, 4 hen'. Therefore, the return value will be the following string:

'X drummers, X pipers, five rings, X hen'

### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

Passing re.VERBOSE as the second argument to re.compile() allows you to write regular expressions that are easier to read and understand by allowing you to add whitespace and comments.

Normally, whitespace and comments are ignored in regular expressions, and you have to use backslashes to escape any whitespace or comment characters if you want to include them in the expression. However, with re.VERBOSE, you can add whitespace and comments to your regular expression without affecting its meaning.

### 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
#### '42'
#### '1,234'
#### '6,368,745'
#### but not the following:
#### '12,34,567' (which has only two digits between the commas)
#### '1234' (which lacks commas)


In [11]:
import re

regex = re.compile(r'^\d{1,3}(,\d{3})*$')

print(regex.match('42'))           
print(regex.match('1,234'))       
print(regex.match('6,368,745'))    
print(regex.match('12,34,567'))    
print(regex.match('1234'))

<re.Match object; span=(0, 2), match='42'>
<re.Match object; span=(0, 5), match='1,234'>
<re.Match object; span=(0, 9), match='6,368,745'>
None
None


### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
#### 'Haruto Watanabe'
#### 'Alice Watanabe'
#### 'RoboCop Watanabe'
#### but not the following:
#### 'haruto Watanabe' (where the first name is not capitalized)
#### 'Mr. Watanabe' (where the preceding word has a nonletter character)
#### 'Watanabe' (which has no first name)
#### 'Haruto watanabe' (where Watanabe is not capitalized)


In [15]:
names = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe', 'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

regex = re.compile('[A-Z][a-z]*\sWatanabe$')

for name in names:
    if regex.search(name):
        print(name)

Haruto Watanabe
Alice Watanabe
RoboCop Watanabe


### 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
#### 'Alice eats apples.'
#### 'Bob pets cats.'
#### 'Carol throws baseballs.'
#### 'Alice throws Apples.'
#### 'BOB EATS CATS.'
#### but not the following:
#### 'RoboCop eats apples.'
#### 'ALICE THROWS FOOTBALLS.'
#### 'Carol eats 7 cats.'


In [14]:

pattern = re.compile(r'^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.$', re.IGNORECASE)

# Define the example sentences
sentences = [
    'Alice eats apples.',
    'Bob pets cats.',
    'Carol throws baseballs.',
    'Alice throws Apples.',
    'BOB EATS CATS.',
    'RoboCop eats apples.',
    'ALICE THROWS FOOTBALLS.',
    'Carol eats 7 cats.',
]

# Find all matching sentences
matching_sentences = []
for sentence in sentences:
    if pattern.search(sentence):
        matching_sentences.append(sentence)

# Print the matching sentences
print(matching_sentences)

['Alice eats apples.', 'Bob pets cats.', 'Carol throws baseballs.', 'Alice throws Apples.', 'BOB EATS CATS.']
