**1.What is the name of the feature responsible for generating Regex objects?**

**Ans:** `re.compile()` is the function which can be used to compile a regular expression pattern (string) into a regular expression (Regex) object.

**Syntax:** ```
            re.compile(pattern, flag=0)
            ```
            
where,
1. `pattern` : regex pattern is string format which to be used to match inside target string.
2. `flags` : Optional parameter, can be use to modify expression's behavior.

**2.Why do raw strings often appear in Regex objects?**

**Ans:** Regular expressions use the backslash character `\` to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflictts with Python's usage of the same character for the same purpose in string literals. 

In short, to match a literal backslash, one has to write `\\\\` as the regex string, and each backslash must be expressed as `\\` inside a regular Python string literal. This leads to lots of repeated backslashes which makes the resulting strings difficult to understand.

The solution is to use Python's raw string notation for regular expressions which does not treat `\` as an escape character or a special character. Hence  regular expressions will be expressed in Python code using this raw string notation.

**3.What is the return value of the search() method?**

**Ans:** `search()` scans through the target string to match the regular expression pattern and returns the match object. It returns `None` if no match is found corresponding to the pattern.

In [1]:
#if found
import re
match = re.search('he','Hi there, hello! abcd aa he he is target string.', flags=re.IGNORECASE)
print(match)

<re.Match object; span=(4, 6), match='he'>


In [2]:
#if not found
import re
match = re.search('z','Hi there, hello! abcd aa this is target string.', flags=re.IGNORECASE)
print(match)

None


**4.From a Match item, how do you get the actual strings that match the pattern?**

**Ans:** We can use `match.group` to get actual string that matches the pattern.

In [3]:
import re
match = re.search('he','Hi there, hello! abcd aa this is target string.', flags=re.IGNORECASE)
print(match.group())

he


**5.In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?**

**Ans:** Here group 0 means entire regex pattern, Group 1 covers first set of parentheses of regex and Group 2 covers remaining set of parentheses so,

- **Group 0** : `(\d\d\d)-(\d\d\d-\d\d\d\d)`
- **Group 1** : `(\d\d\d)`
- **Group 2** : `(\d\d\d-\d\d\d\d)`

Lets verify this with an example

In [4]:
import re
regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = regex.search('012-345-6789')
print(mo.group(0))
print(mo.group(1))
print(mo.group(2))

012-345-6789
012
345-6789


**6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?**

**Ans:** To consider real parentheses and periods they can be escaped with a backslash such as `\.`, `\(`, and `\)`.

Let's see an example

In [5]:
import re
regex = re.compile(r'(\(\d\d\d\))-(\d\d\d-\d\d\d\d\.)')
mo = regex.search('(012)-345-6789.')
print(mo.group(0))
print(mo.group(1))
print(mo.group(2))

(012)-345-6789.
(012)
345-6789.


**7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?**

**Ans:** The result of the `findall()` function depends on the pattern:

- If the pattern has no capturing groups, the `findall()` function returns a list of strings that match the whole pattern.
- If the pattern has one capturing group, the `findall()` function returns a list of strings that match the group.
- If the pattern has multiple capturing groups, the `findall()` function returns the tuples of strings that match the groups.

**8.In standard expressions, what does the | character mean?**

**Ans:** In Standard Expressions `|` means `OR` operator.

**9.In regular expressions, what does the ? character stand for?**

**Ans:** `?` can be explained as "this may or may not be here". It can be used when the charcter or group may or may not present in string.

**10.In regular expressions, what is the difference between the + and * characters?**

**Ans:** 
- `+` means "one or more"
- `*` means "zero or more"

**11.What is the difference between {4} and {4,5} in regular expression?**

**Ans:** 
- `{4}` matches exactly three instances of the preceeding group. 
- `{4,5}` matches between four and five instances of the preceeding group.

**12.What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?**

**Ans:**
- `\d`: Matches all digit character. This is equivalent to [0-9]
- `\w`: Matches all word characters. This is equivalent to [a-zA-Z0-9_]
- `\s`: matches a whitespace (blank, tab \t, and newline \r or \n)

**13.What do means by \D, \W, and \S shorthand character classes signify in regular expressions?**

**Ans:** In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart.

- `\D`: matches all non digit charcters
-`\W`: matches all non word charcters
-`\S`: matches anything that is NOT matched by \s, i.e., non-whitespace

**14.What is the difference between .*? and .* ?**

**Ans:** 
- `.*` : is greedy quantifiers by default grasp as many characters as possible for a match. For example, the regex `xy{2,4}` try to match for **"xyyyy"**, then **"xyyy"**, and then **"xyy"**.
- `.*?` : is lazy or non-greedy quantifiers. We can put an extra ? after the repetition operators to curb its greediness (i.e., stop at the shortest match). 

**15.What is the syntax for matching both numbers and lowercase letters with a character class?**

**Ans:**  The Synatax for matching both number and lowecase letter is `[a-z0-9]`

**16.What is the procedure for making a normal expression in regax case insensitive?**

**Ans:** we can pass `IGNORECASE` flag as a parameter to make regex case insensitive. Below is syntax
```python
import re
match = re.search('he','Hi there, hello! abcd aa he he is target string.', flags=re.IGNORECASE)
```

**17.What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?**

**Ans:** `Dot (.)` will match all characters, excluding newline. If we pass `re.DOTALL` as 2nd argument then it will match all charcters including newline.

**18.If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?**

**Ans:** As can be seen in below execution output will be `'X drummers, X pipers, five rings, X hen'`

In [6]:
import re
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

**19.What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?**

**Ans:** `re.VERBOSE` is a flag which allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments. Whitespace within the pattern is ignored, except when in a character class, or when preceded by an unescaped backslash, or within tokens like *?, (?: or (?P<...>. 

For example, (? : and * ? are not allowed. When a line contains a # that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.

**20.How would you write a regex that match a number with comma for every three digits? It must match the given following:**
- '42'
- '1,234'
- '6,368,745'

**but not the following:**
- '12,34,567' (which has only two digits between the commas)
- '1234' (which lacks commas)

In [7]:
import re
pattern = r'^\d{1,3}(,\d{3})*$'
reg = re.compile(pattern)

print(reg.search('42'))
print(reg.search('1,234'))
print(reg.search('6,368,745'))
print(reg.search('12,34,567'))
print(reg.search('1234'))

<re.Match object; span=(0, 2), match='42'>
<re.Match object; span=(0, 5), match='1,234'>
<re.Match object; span=(0, 9), match='6,368,745'>
None
None


**21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:**
- 'Haruto Watanabe'
- 'Alice Watanabe'
- 'RoboCop Watanabe'

**but not the following:**
- 'haruto Watanabe' (where the first name is not capitalized)
- 'Mr. Watanabe' (where the preceding word has a nonletter character)
- 'Watanabe' (which has no first name)
- 'Haruto watanabe' (where Watanabe is not capitalized)

In [8]:
import re
pattern = r'[A-Z]{1}[a-z]*\WWatanabe'
reg = re.compile(pattern)

print(reg.search('Haruto Watanabe'))
print(reg.search('Alice Watanabe'))
print(reg.search('RoboCop Watanabe'))
print(reg.search('haruto Watanabe'))
print(reg.search('Mr. Watanabe'))
print(reg.search('Watanabe'))
print(reg.search('Haruto watanabe'))

<re.Match object; span=(0, 15), match='Haruto Watanabe'>
<re.Match object; span=(0, 14), match='Alice Watanabe'>
<re.Match object; span=(4, 16), match='Cop Watanabe'>
None
None
None
None


**22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:**
- 'Alice eats apples.'
- 'Bob pets cats.'
- 'Carol throws baseballs.'
- 'Alice throws Apples.'
- 'BOB EATS CATS.'

**but not the following:**
- 'RoboCop eats apples.'
- 'ALICE THROWS FOOTBALLS.'
- 'Carol eats 7 cats.'

In [9]:
import re
pattern = r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.'
reg = re.compile(pattern, flags = re.IGNORECASE)

print(reg.search('Alice eats apples.'))
print(reg.search('Bob pets cats.'))
print(reg.search('Carol throws baseballs.'))
print(reg.search('Alice throws Apples.'))
print(reg.search('BOB EATS CATS.'))
print(reg.search('RoboCop eats apples.'))
print(reg.search('ALICE THROWS FOOTBALLS.'))
print(reg.search('Carol eats 7 cats.'))

<re.Match object; span=(0, 18), match='Alice eats apples.'>
<re.Match object; span=(0, 14), match='Bob pets cats.'>
<re.Match object; span=(0, 23), match='Carol throws baseballs.'>
<re.Match object; span=(0, 20), match='Alice throws Apples.'>
<re.Match object; span=(0, 14), match='BOB EATS CATS.'>
None
None
None
