**1. What is the name of the feature responsible for generating Regex objects?**

**Solution:-** The re.compile() function is responsible for generating Regex objects.

**2. Why do raw strings often appear in Regex objects?**

**Solution:-** Regular expressions use the backslash character ('\') to indicate special forms (Metacharacters) or to allow special characters (speical sequences) to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals. Hence, Raw strings are used (e.g. r"\n") so that backslashes do not have to be escaped.

**3. What is the return value of the search() method?**

**Solution:-**The return value of re.search(pattern,string) method is a match object if the pattern is observed in the string else it returns a None

**Example:-**

In [1]:
import re
match = re.search('i','I am full Full Stack Data Science Batch', flags=re.IGNORECASE)
print('Output:',match)
match = re.search('X','I am full Full Stack Data Science Batch', flags=re.IGNORECASE)
print('Output:',match)

Output: <re.Match object; span=(0, 1), match='I'>
Output: None


In [None]:
#If no matches are found, the value None is returned:
import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

None


**4. From a Match item, how do you get the actual strings that match the pattern?**

**Solution:-**The group() method returns strings of the matched text.

**Example:-**

In [None]:
#Search for an upper case "S" character in the beginning of a word, and print the word:
import re
txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Spain


**5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1?**

**Solution:-** Group 0 is the entire match, group 1 covers the first set of parentheses, and group 2 covers the second set of parentheses.

**Example:-**

In [5]:
import re 

match_object = re.match(r'(\w+)@(\w+)\.(\w+)', 'gouravrshi6324@gmail.com') 

# for entire match 
print(match_object.group()) 
# also print(match_object.group(0)) can be used 

# for the first parenthesized subgroup 
print(match_object.group(1)) 

# for the second parenthesized subgroup 
print(match_object.group(2)) 

# for the third parenthesized subgroup 
print(match_object.group(3)) 

# for a tuple of all matched subgroups 
print(match_object.group(1, 2, 3)) 


gouravrshi6324@gmail.com
gouravrshi6324
gmail
com
('gouravrshi6324', 'gmail', 'com')


**6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?**

**Solution:-** Periods and parentheses can be escaped with a backslash: 

**\ ., \ (, and \ )**  

**7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?**

**Solution:-** If the regex has no groups, a list of strings is returned. If the regex has groups, a list of tuples of strings is returned.

findall() is probably the single most powerful function in the re module. Above we used re.search() to find the first match for a pattern. findall() finds all the matches and returns them as a list of strings, with each string representing one match.

In [8]:
## Suppose we have a text with many email addresses
str = 'hotelgouravrshi6324@gmail.com, gouravrshi6324@gmail.com saurab johar'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['hotelgouravrshi6324@gmail.com']
for email in emails:
  # do something with each found email string
  print(email)

hotelgouravrshi6324@gmail.com
gouravrshi6324@gmail.com


**8. In standard expressions, what does the | character mean?**

**Solution:-** The \. \( and \) escape characters in the raw string passed to re.compile() will match actual parenthesis characters.

The | character signifies matching "either, or" between two groups.

**Example:-**

In [22]:
import re
txt = "The rain in Delhi falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":
x = re.findall("falls|stays", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['falls']
Yes, there is at least one match!


**9. In regular expressions, what does the ? character stand for?**

**Solution:-** The ? character can either mean "match zero or one of the preceding group" or be used to signify nongreedy matching.

**10.regular expressions, what is the difference between the + and * characters?**

**Solution:-** The + matches one or more. The * matches zero or more.

**Example:-**

In [23]:
#Check if the string contains "ai" followed by 0 or more "x" characters:
import re
txt = "The rain in Delhi falls mainly in the plain!"

x = re.findall("aix*", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['ai', 'ai', 'ai']
Yes, there is at least one match!


In [24]:
#Check if the string contains "ai" followed by 1 or more "x" characters:
import re
txt = "The rain in Delhi falls mainly in the plain!"

x = re.findall("aix+", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

[]
No match


**11. What is the difference between {4} and {4,5} in regular expression?**

**Solution:-**The {4} matches exactly four instances of the preceding group. The {4,5} matches between four and five instances.

**Example:-**

In [25]:
#Check if the string contains "a" followed by exactly two "l" characters:
import re
txt = "The rain in Delhi falls mainly in the plain!"

x = re.findall("al{2}", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


['all']
Yes, there is at least one match!


**12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?**

**Solution:-** 

1] \d	Returns a match where the string contains digits (numbers from 0-9)

2] \w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)

3] \s	Returns a match where the string contains a white space character

**Example:-**

In [26]:
#Check if the string contains any digits (numbers from 0-9):
import re
txt = "The rain in Delhi"

x = re.findall("\d", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")


[]
No match


In [27]:
#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _ character):
import re
txt = "The rain in Delhi"

x = re.findall("\w", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'D', 'e', 'l', 'h', 'i']
Yes, there is at least one match!


In [28]:
#Return a match at every white-space character:
import re
txt = "The rain in Delhi"

x = re.findall("\s", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

[' ', ' ', ' ']
Yes, there is at least one match!


**13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?**

**Solution:-**

1] \D	Returns a match where the string DOES NOT contain digits

2] \W	Returns a match where the string DOES NOT contain any word characters

3] \S	Returns a match where the string DOES NOT contain a white space character

**Example:-**

In [29]:
#Return a match at every no-digit character:
import re
txt = "The rain in Delhi"

x = re.findall("\D", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['T', 'h', 'e', ' ', 'r', 'a', 'i', 'n', ' ', 'i', 'n', ' ', 'D', 'e', 'l', 'h', 'i']
Yes, there is at least one match!


In [30]:
#Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):
import re
txt = "The rain in Delhi"

x = re.findall("\W", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

[' ', ' ', ' ']
Yes, there is at least one match!


In [17]:
#Return a match at every NON white-space character:
import re
txt = "The rain in Spain"

x = re.findall("\S", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!


**14. What is the difference between  .  and  .?**

**Solution:-** The **. performs a greedy match**, and **the .? performs a nongreedy match.**

**15. What is the syntax for matching both numbers and lowercase letters with a character class?**

**Solution:-** 

1] The syntax for matching the number is [0-9]--->Returns a match for any digit between 0 and 9.

2] The syntax for matching the lowercase letter is [a-zA-Z]--->	Returns a match for any character alphabetically between a and z, lower case OR upper case.

3] The syntax for matching the both is [0-9a-z] or [a-z0-9]

**Example:-**

In [18]:
# For The Number
import re
txt = "8 times before 11:45 AM"

#Check if the string has any digits:
x = re.findall("[0-9]", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['8', '1', '1', '4', '5']
Yes, there is at least one match!


In [19]:
# Fot The lower Case letter
import re
txt = "8 times before 11:45 AM"

#Check if the string has any digits:
x = re.findall("[a-zA-Z]", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['t', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', 'A', 'M']
Yes, there is at least one match!


In [20]:
# Fot The lower Case letter
import re
txt = "8 times before 11:45 AM"

#Check if the string has any digits:
x = re.findall("[0-9a-z]", txt)
print(x)

if x:
  print("Yes, there is at least one match!")
else:
  print("No match")

['8', 't', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', '1', '1', '4', '5']
Yes, there is at least one match!


**16. What is the procedure for making a normal expression in regax case insensitive?**

**Solution:-**Passing re.I or re.IGNORECASE as the second argument to re.compile() will make the matching case insensitive.

**Example:-**

In [36]:
# Importing re package 
import re 

def validating_name(name): 
 
	regex_name = re.compile(r'^(Mr\.|Mrs\.|Ms\.) ([a-z]+)( [a-z]+)*( [a-z]+)*$', re.IGNORECASE)
			 
	res = regex_name.search(name) 

	# If match is found, the string is valid 
	if res: print("Valid") 
		
	# If match is not found, string is invalid 
	else: print("Invalid") 

# Driver Code 
validating_name('Mr. Mayur Ravindra Borkar') 
validating_name('Mr and Mrs. Smith') 
validating_name('Mr. Potter')
validating_name('Mr. Suraj Kale') 

Valid
Invalid
Valid
Valid


**17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?**

**Solution:-**The **.** character normally matches any character except the newline character. If re.DOTALL is passed as the second argument to re.compile(), then the dot will also match newline characters.

**18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?**

**Solution:-** \d	Returns a match where the string contains digits (numbers from 0-9). The return value we get **'X drummers, X pipers, five rings, X hens'**



In [35]:
import re
numReg = re.compile(r'\d+')
numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')

'X drummers, X pipers, five rings, X hen'

**19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?**

**Solution:-**The re.VERBOSE argument allows you to add whitespace and comments to the string passed to re.compile().

In [34]:
# Without Using VERBOSE
regex_email = re.compile(r'^([a-z0-9_\.-]+)@([0-9a-z\.-]+)\.([a-z\.]{2, 6})$', re.IGNORECASE)
 
# Using VERBOSE
regex_email = re.compile(r"""
                            ^([a-z0-9_\.-]+)              # local Part like username
                            @                             # single @ sign 
                            ([0-9a-z\.-]+)                # Domain name like google
                            \.                            # single Dot .
                            ([a-z]{2,6})$                 # Top level Domain  like com/in/org
                         """,re.VERBOSE | re.IGNORECASE)   

**20. How would you write a regex that matche a number with comma for every three digits? It must match the given following:**

'42'

'1,234'

'6,368,745'

but not the following:

'12,34,567' (which has only two digits between the commas)

'1234' (which lacks commas)

**Solution:-**re.compile(r'^\d{1,3}(,\d{3})*$') will create this regex, but other regex strings can produce a similar regular expression.

In [33]:
import re
pattern = r'^\d{1,3}(,\d{3})*$'
pagex = re.compile(pattern)
for ele in ['42','1,234', '6,368,745','12,34,567','1234']:
    print('Output:',ele, '->', pagex.search(ele))

Output: 42 -> <re.Match object; span=(0, 2), match='42'>
Output: 1,234 -> <re.Match object; span=(0, 5), match='1,234'>
Output: 6,368,745 -> <re.Match object; span=(0, 9), match='6,368,745'>
Output: 12,34,567 -> None
Output: 1234 -> None


**21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:**

'Haruto Watanabe'

'Alice Watanabe'

'RoboCop Watanabe'

but not the following:

'haruto Watanabe' (where the first name is not capitalized)

'Mr. Watanabe' (where the preceding word has a nonletter character)

'Watanabe' (which has no first name)

'Haruto watanabe' (where Watanabe is not capitalized)

**Solution:-** re.compile(r'[A-Z][a-z]*\sWatanabe')


In [32]:
import re
pattern = r'[A-Z]{1}[a-z]*\sWatanabe'
namex = re.compile(pattern)
for name in ['Haruto Watanabe','Alice Watanabe','RoboCop Watanabe','haruto Watanabe','Mr. Watanabe','Watanabe','Haruto watanabe']:
    print('Output: ',name,'->',namex.search(name))

Output:  Haruto Watanabe -> <re.Match object; span=(0, 15), match='Haruto Watanabe'>
Output:  Alice Watanabe -> <re.Match object; span=(0, 14), match='Alice Watanabe'>
Output:  RoboCop Watanabe -> <re.Match object; span=(4, 16), match='Cop Watanabe'>
Output:  haruto Watanabe -> None
Output:  Mr. Watanabe -> None
Output:  Watanabe -> None
Output:  Haruto watanabe -> None


**22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:**

'Alice eats apples.'

'Bob pets cats.'

'Carol throws baseballs.'

'Alice throws Apples.'

'BOB EATS CATS.'

but not the following:

'RoboCop eats apples.'

'ALICE THROWS FOOTBALLS.'

'Carol eats 7 cats.'

**Solution:-**re.compile(r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.', re.IGNORECASE)

In [31]:
import re
pattern = r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.'
casex = re.compile(pattern,re.IGNORECASE)
for ele in ['Alice eats apples.','Bob pets cats.','Carol throws baseballs.','Alice throws Apples.','BOB EATS CATS.','RoboCop eats apples.'
,'ALICE THROWS FOOTBALLS.','Carol eats 7 cats.']:
    print('Output: ',ele,'->',casex.search(ele))

Output:  Alice eats apples. -> <re.Match object; span=(0, 18), match='Alice eats apples.'>
Output:  Bob pets cats. -> <re.Match object; span=(0, 14), match='Bob pets cats.'>
Output:  Carol throws baseballs. -> <re.Match object; span=(0, 23), match='Carol throws baseballs.'>
Output:  Alice throws Apples. -> <re.Match object; span=(0, 20), match='Alice throws Apples.'>
Output:  BOB EATS CATS. -> <re.Match object; span=(0, 14), match='BOB EATS CATS.'>
Output:  RoboCop eats apples. -> None
Output:  ALICE THROWS FOOTBALLS. -> None
Output:  Carol eats 7 cats. -> None
