### 1. What is the name of the feature responsible for generating Regex objects?

The **re.compile()** function returns Regex objects.

In [2]:
import re
re.compile("String")

re.compile(r'String', re.UNICODE)

### 2. Why do raw strings often appear in Regex objects?

Raw strings are used so that backslashes do not have to be escaped.

### 3. What is the return value of the search() method?

The **search()** method searches a string for a specified value, and returns the position of the match. It searches for the whole string even if the string contains multi-lines and tries to find a match of the substring in all the lines of string.

### 4. From a Match item, how do you get the actual strings that match the pattern?

- Import the regex module with import re.
- Create a Regex object with the re.
- Pass the string you want to search into the Regex object's search() method.
- Call the Match object's group() method to return a string of the actual matched text

In [2]:
import re
NumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
Num = NumRegex.search('My number is 345-444-4542.')
print(Num.group())

345-444-4542


### 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?

Group 0 is the entire match, group 1 covers the first set of parentheses, and group 2 covers the second set of parentheses.

In [14]:
NumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
Num = NumRegex.search('My number is 345-444-4542.')

In [15]:
Num.group(0)

'345-444-4542'

In [16]:
Num.group(1)

'345'

In [17]:
Num.group(2)

'444-4542'

### 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

In [21]:
'''Periods and parentheses can be escaped with a backslash: \., \(, and \).'''

'Periods and parentheses can be escaped with a backslash: \\., \\(, and \\).'

### 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

If the regex has no groups, a list of strings is returned. If the regex has groups, a list of tuples of strings is returned.

### 8. In standard expressions, what does the | character mean?

The | character is called a pipe. The | character signifies matching "either, or" between two groups.

- For example, 
    the regular expression r'Cricket|Soccer Sport' will match either 'Cricket' or 'Soccer Sport'.

When both Cricket and Soccer Sport occur in the searched string, the first occurrence of matching text will be returned as the Match object.

### 9. In regular expressions, what does the ? character stand for?

The ? character can either mean "match zero or one of the preceding group" or be used to signify nongreedy matching.

### 10.In regular expressions, what is the difference between the + and * characters?

The + matches one or more. The * matches zero or more.

### 11. What is the difference between {4} and {4,5} in regular expression?

The {3} matches exactly three instances of the preceding group. The {3,5} matches between three and five instances.

### 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

The \d, \w, and \s shorthand character classes match a single digit, word, or space character, respectively.

### 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

The \D, \W, and \S shorthand character classes match a single character that is not a digit, word, or space character, respectively.

### 14. What is the difference between .\*? and .\*?

- .* - The dot-star uses greedy mode: It will always try to match as much text as possible.

- .? - To match any and all text in a non-greedy fashion, use the dot, star, and question mark (.?). Like with braces, the question mark tells Python to match in a non-greedy way.

### 15. What is the syntax for matching both numbers and lowercase letters with a character class?

Syntax for matching both numbers and lowercase letters with a xharacter class is **[0-9a-z]** or **[a-z0-9]**

### 16. What is the procedure for making a normal expression in regax case insensitive?

Passing re.I or re.IGNORECASE as the second argument to re.compile() will make the matching case insensitive.

In [3]:
casesen = re.compile(r'machine', re.I)
casesen.search('MACHINE learning is part of data science').group()

'MACHINE'

### 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

The . character normally matches any character except the newline character. If re.DOTALL is passed as the second argument to re.compile(), then the dot will also match newline characters.

### 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

'X drummers, X pipers, five rings, X hens'

In [5]:
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

### 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

The re. VERBOSE argument allows you to add whitespace and comments to the string passed to re. compile().

### 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
- '42'
- '1,234'
- '6,368,745'
### but not the following:
- '12,34,567' (which has only two digits between the commas)
- '1234' (which lacks commas)

re.compile(r'^\d{1,3}(,\d{3})*$') will create this regex, but other regex strings can produce a similar regular expression.

In [7]:
reg = re.compile(r'^\d{1,3}(,\d{3})*$')
reg.search('42').group()

'42'

In [8]:
reg.search('1,234').group()

'1,234'

In [9]:
reg.search('6,368,745').group()

'6,368,745'

In [10]:
reg.search('12,34,567').group()

AttributeError: 'NoneType' object has no attribute 'group'

### 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
- 'Haruto Watanabe'
- 'Alice Watanabe'
- 'RoboCop Watanabe'
### but not the following:
- 'haruto Watanabe' (where the first name is not capitalized)
- 'Mr. Watanabe' (where the preceding word has a nonletter character)
- 'Watanabe' (which has no first name)
- 'Haruto watanabe' (where Watanabe is not capitalized)

re.compile(r'[A-Z][a-z]*\sWatanabe')

In [15]:
name = re.compile(r'[A-Z][a-z]*\sWatanabe')
name.search('Haruto Watanabe').group()

'Haruto Watanabe'

In [16]:
name.search('Alice Watanabe').group()

'Alice Watanabe'

In [18]:
name.search('RoboCop Watanabe').group()

'Cop Watanabe'

In [19]:
name.search('haruto Watanabe').group()

AttributeError: 'NoneType' object has no attribute 'group'

### 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
- 'Alice eats apples.'
- 'Bob pets cats.'
- 'Carol throws baseballs.'
- 'Alice throws Apples.'
- 'BOB EATS CATS.'
### but not the following:
- 'RoboCop eats apples.'
- 'ALICE THROWS FOOTBALLS.'
- 'Carol eats 7 cats.'

re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.', re.IGNORECASE)

In [21]:
name = re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.', re.IGNORECASE)
name.search('Alice eats apples.').group()

'Alice eats apples.'

In [24]:
name.search('Carol throws baseballs.').group()

'Carol throws baseballs.'