## 1. What is the name of the feature responsible for generating Regex objects?

`re` module in python is responsible for regex    
**re.compile** is used to generate regex objects

In [11]:
import re
re.compile(r'\d\d\d-\d\d\d')

re.compile(r'\d\d\d-\d\d\d', re.UNICODE)

## 2. Why do raw strings often appear in Regex objects?

As Regex objects frequently uses backslash `\` , we use raw strings inside re.compile() function.  
Instead of putting extra `\` it is much easier to use raw string

In [2]:
import re
pattern_match = re.compile(r'\d\d\d-\d\d\d') ## Matching a number with pattern like 999-999
res = pattern_match.search('The average income of an employee is $ 678-444')
print('Pattern Found: ',res.group())

Pattern Found:  678-444


## 3. What is the return value of the search() method?

- If regex pattern is not found, search() returns **None**  
- If regex pattern is found, search() returns **MatchObject**

In [6]:
some_res = pattern_match.search('The average income of an employee is $ 678-444')
print('Search Return Type: ',type(some_res))
none_res = pattern_match.search('The average income of an employee is $ abc-def')
print('Search Return Type: ',type(none_res))

Search Return Type:  <class 're.Match'>
Search Return Type:  <class 'NoneType'>


## 4. From a Match item, how do you get the actual strings that match the pattern?

- Using **group()** method

In [12]:
print('Pattern Found: ',res.group())

Pattern Found:  678-444


## 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?

- group zero will return the entire matched text
- group 1 and 2 will cover the pattern within 1st and 2nd parentheses respectively

In [13]:
grp_pattern = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
grp_search = grp_pattern.search('The average income of an employee is $ 678-444-9999')
print('Group Zero: ',grp_search.group(0))
print('Group One: ',grp_search.group(1))
print('Group Two: ',grp_search.group(2))

Group Zero:  678-444-9999
Group One:  678
Group Two:  444-9999


## 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

- Using a `\` before the parentheses and periods

In [16]:
grp_pattern = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
grp_search = grp_pattern.search('The average income of an employee is $ (678) 444-9999')
print('Group Zero: ',grp_search.group(0))
print('Group One: ',grp_search.group(1))
print('Group Two: ',grp_search.group(2))

Group Zero:  (678) 444-9999
Group One:  (678)
Group Two:  444-9999


## 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

- `findall()` will return the strings of every match within the searched strings whereas `search()` returns only 1 match  
- If regex expression has **no groups**, findall() returns a list of string matches
- If regex expression has **groups**, findall() returns a list of tuples of strings

In [17]:
pattern_match = re.compile(r'\d\d\d-\d\d\d') ## Matching a number with pattern like 999-999
res = pattern_match.search('The average income of an employee is $ 678-444 in ABC Company and $ 897-543 in XYZ')
print('Pattern Found: ',res.group())

Pattern Found:  678-444


In [24]:
pattern_match = re.compile(r'\d\d\d-\d\d\d') ## Matching a number with pattern like 999-999
res = pattern_match.findall('The average income of an employee is $ 678-444 in ABC Company and $ 897-543 in XYZ')
print(res) #Returns a List of Strings

['678-444', '897-543']


In [25]:
pattern_match = re.compile(r'(\d\d\d)-(\d\d\d)') ## Matching a number with pattern like 999-999
res = pattern_match.findall('The average income of an employee is $ 678-444 in ABC Company and $ 897-543 in XYZ')
print(res) ## Returns a List of Tuples of Strings

[('678', '444'), ('897', '543')]


## 8. In standard expressions, what does the | character mean?

`|` is used to match one of many expressions. It is `either,or` between 2 groups    
Only the first occurence of the matched text will be returned as the Match Object

In [57]:
car_reg_ex = re.compile(r'Hyundai|Honda|Ford')
pipe_test = car_reg_ex.search('India is home for Honda, Hyundai and Ford Carmakers')
pipe_test.group() ## Only first occurence of the Matched Text is returned

'Honda'

## 9. In regular expressions, what does `?` character stand for?

`?` character flags the group that precedes it as an optional part of the pattern

In [39]:
hero_reg_ex = re.compile(r'Wonder(wo)?man')
question_test = hero_reg_ex.search('DC comics has Wonderwoman')
question_test.group() ## Only first occurence of the Matched Text is returned

'Wonderwoman'

In [35]:
question_test = hero_reg_ex.search('DC comics has Wonderman')
question_test.group() 

'Wonderman'

## 10.In regular expressions, what is the difference between the + and * characters?

- The `*` matches zero or more of the preceding group.
- The `+` matches one or more of the preceding group.

In [37]:
hero_reg_ex = re.compile(r'Wonder(wo)*man')
asterik_test = hero_reg_ex.search('DC comics has Wonderman')
print(asterik_test.group())
asterik_test2 = hero_reg_ex.search('DC comics has Wonderwoman')
print(asterik_test2.group())
asterik_test3 = hero_reg_ex.search('DC comics has Wonderwowowowoman')
print(asterik_test3.group())

Wonderman
Wonderwoman
Wonderwowowowoman


In [41]:
hero_reg_ex = re.compile(r'Wonder(wo)+man')
plus_test = hero_reg_ex.search('DC comics has Wonderman')
print(plus_test == None) ## Atleast 1 match should be there
plus_test2 = hero_reg_ex.search('DC comics has Wonderwoman')
print(plus_test2.group())
plus_test3 = hero_reg_ex.search('DC comics has Wonderwowowowoman')
print(plus_test3.group())

True
Wonderwoman
Wonderwowowowoman


## 11. What is the difference between {4} and {4,5} in regular expression?

- {4}  will match the string that is repeated 4 times
- {4,5} will match the string that is repeated minimum 4 times and maximum 5 times

In [47]:
ha_regex = re.compile(r'(ha){4}')
# Testing with 4 ha
res = ha_regex.search('hahahaha')
print(res.group())
# Testing with 3 ha
res1 = ha_regex.search('hahaha')
print(res1 == None) ##As res1 has 3 ha it is not matching
ha_2_regex = re.compile(r'(ha){4,5}')
# Testing with 5 ha
res3 = ha_2_regex.search('hahahahaha')
print(res3.group())

hahahaha
True
hahahahaha


## 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

- **\d** is for numeric digit from 0 to 9
- **\w** is for any letter,numeric digit or the underscore character
- **\s** is for any space,tab or newline character

In [49]:
xmas_regex = re.compile(r'\d+\s\w+')
xmas_regex.findall('11 players, 10 overs, 9 fielders')

['11 players', '10 overs', '9 fielders']

## 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

- **\D** is for any character that is not a numeric digit from 0 to 9
- **\W** is for any character that is not a letter,numeric digit or the underscore character
- **\S** is for any character that is not a space,tab or newline character

## 14. What is the difference between .? and .*? ?

- .* uses greedy mode. It will try to match as much text as possible
- .? uses non-greedy mode

In [52]:
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo = nameRegex.search('First Name: Ineuron Last Name: Technologies')
print(mo.group(1))
print(mo.group(2))

nongreedyRegex = re.compile(r'<.*?>')
mo = nongreedyRegex.search('<To serve man> for dinner.>')
print(mo.group())
nongreedyRegex1 = re.compile(r'<.*>')
mo = nongreedyRegex1.search('<To serve man> for dinner.>')
print(mo.group())

Ineuron
Technologies
<To serve man>
<To serve man> for dinner.>


## 15. What is the syntax for matching both numbers and lowercase letters with a character class?

- [a-z0-9] or [0-9a-z] 

In [55]:
test1 = re.compile(r'[a-z0-9]')
result1 = test1.findall('india AUSTRALIA 12398 winner')
print(result1)

['i', 'n', 'd', 'i', 'a', '1', '2', '3', '9', '8', 'w', 'i', 'n', 'n', 'e', 'r']


In [56]:
test2 = re.compile(r'[0-9a-z]')
result1 = test2.findall('india AUSTRALIA 12398 winner')
print(result1)

['i', 'n', 'd', 'i', 'a', '1', '2', '3', '9', '8', 'w', 'i', 'n', 'n', 'e', 'r']


## 16. What is the procedure for making a normal expression in regex case insensitive?

- Passing re.I or re.IGNORECASE as the second argument to re.compile() will make the matching case insensitive.

In [102]:
robocop = re.compile(r'robocop', re.I)
print(robocop.search('RoboCop is part man, part machine, all cop.').group())
print(robocop.search('ROBOCOP protects the innocent.').group())
print(robocop.search('Al, why does your programming book talk about robocop so much?').group())

RoboCop
ROBOCOP
robocop


## 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

- . normally matches any character except new line character
- if re.DOTALL is passed as 2nd argument in re.compile(), then the . will also match new line characters

In [101]:
noNewlineRegex = re.compile('.*')
print(noNewlineRegex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group())
print()
newlineRegex = re.compile('.*', re.DOTALL)
print(newlineRegex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group())


Serve the public trust.

Serve the public trust.
Protect the innocent.
Uphold the law.


## 18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

In [99]:
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

## 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

- The re.VERBOSE argument is used to add whitespace and comments to the string passed to re.compile().

## 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
'42'  
'1,234'  
'6,368,745'  
**but not the following:**  

'12,34,567' (which has only two digits between the commas)  
'1234' (which lacks commas)


In [97]:
number_string = re.compile(r'^\d{1,3}(,{3})*$')
search = number_string.search('42')
print(search.group())

42


## 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
'Haruto Watanabe'  
'Alice Watanabe'  
'RoboCop Watanabe'  

**but not the following:**  

'haruto Watanabe' (where the first name is not capitalized)  
'Mr. Watanabe' (where the preceding word has a nonletter character)  
'Watanabe' (which has no first name)  
'Haruto watanabe' (where Watanabe is not capitalized)  


In [79]:
name_regex = re.compile(r'[A-Z][a-z]*\sWatanabe')
search_string = name_regex.findall('Haruto Watanabe')
print(search_string)
search_string2 = name_regex.findall('haruto Watanabe')
print('search_string2 has ',len(search_string2),' elements')

['Haruto Watanabe']
search_string2 has  0  elements


## 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
-	 'Alice eats apples.'
-	 'Bob pets cats.'
-	 'Carol throws baseballs.'
-	 'Alice throws Apples.'
-	 'BOB EATS CATS.'  

**but not the following:**  
-	 'RoboCop eats apples.'
-	 'ALICE THROWS FOOTBALLS.'
-	 'Carol eats 7 cats.'

In [72]:
question_22 = re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.',re.IGNORECASE)
result = question_22.findall('Alice eats apples.')
print(result)
result_2 = question_22.findall('RoboCop eats apples.')
print('Result_2 has ',len(result_2),' elements')

[('Alice', 'eats', 'apples')]
Result_2 has  0  elements
