#Question 1


What is the name of the feature responsible for generating Regex objects?


...............


Answer 1 - 


In Python, the feature responsible for generating Regex objects is called the **`"re"`** module. The **`"re"`** module provides functions and methods for working with regular expressions. To create a Regex object, you typically use the `re.compile()` function, which `compiles` a `regular` expression pattern into a `Regex` object that can be used for matching and manipulating strings.

#Question 2 

Why do raw strings often appear in Regex objects?


...............


Answer 2 - 


Raw strings, denoted by prefixing a string with the letter **'r'** (e.g., r"pattern"), are commonly used in Python regular expressions (regex) to avoid unwanted character escapes.

Regular expressions contain special characters with specific meanings, such as backslashes `(\)` or certain combinations of characters. These special characters are used to define patterns and match specific sequences of characters. However, in Python strings, backslashes have their own meaning as escape characters, allowing you to include special characters like newline `(\n)` or tab `(\t)`.

When constructing regular expressions, it is often necessary to use backslashes for special characters, such as matching a literal backslash or specifying word boundaries. However, using regular Python strings would require escaping backslashes with additional backslashes, resulting in more complex and harder-to-read patterns.

By using raw strings, you can treat backslashes as literal characters, without the need for escaping. This is particularly useful when working with regular expressions because they often contain a significant number of backslashes.

For example, consider the regex pattern `"\d+"` , which matches one or more digits. To use this pattern as a regular Python string, you would need to escape the backslash: `"\\d+"` . However, by using a raw string, you can write it as `r"\d+"` , simplifying the pattern and making it more readable.

#Question 3

What is the return value of the search() method?

..............


Answer 3 - 

The search() method in Python's regular expression module (re) returns a match object if it finds a match anywhere in the string being searched. If no match is found, it returns None.

The match object contains information about the match and provides various methods to access and manipulate the matched data. Some of the commonly used methods and attributes of the match object include:

- `group()` : Returns the string matched by the regular expression.

- `start()` : Returns the starting index of the match.

- `end()` : Returns the ending index of the match.

- `span()`: Returns a tuple containing the starting and ending indices of the match.

Here's an example that demonstrates the usage of the *`search()`* method and accessing the match object:

In [1]:
import re

pattern = r"apple"
text = "I have an apple and a banana."

match = re.search(pattern, text)

if match:
    print("Match found!")
    print("Matched string:", match.group())
    print("Starting index:", match.start())
    print("Ending index:", match.end())
    print("Starting and ending indices:", match.span())
else:
    print("No match found.")


Match found!
Matched string: apple
Starting index: 10
Ending index: 15
Starting and ending indices: (10, 15)


In the example above, the `search()` method is used to find the first occurrence of the pattern "apple" in the given text. Since there is a match, it returns a match object. The match object is then used to retrieve information about the match, such as the matched string, starting and ending indices, and the span of the match.

#Question 4

From a Match item, how do you get the actual strings that match the pattern?

...............


Answer 4 - 

To retrieve the actual strings that match the pattern from a match object in Python, you can use the `group()` method. The `group()` method without any arguments returns the entire string that matches the pattern. If you have used capturing groups in your regular expression, you can pass an argument to `group()` to retrieve specific matched groups.

Here's an example that demonstrates how to use the `group()` method to retrieve the matched strings:

In [2]:
import re

pattern = r"(\d{2})-(\d{2})-(\d{4})"
text = "Date: 12-31-2022"

match = re.search(pattern, text)

if match:
    print("Match found!")
    print("Entire match:", match.group())
    print("Day:", match.group(1))
    print("Month:", match.group(2))
    print("Year:", match.group(3))
else:
    print("No match found.")

Match found!
Entire match: 12-31-2022
Day: 12
Month: 31
Year: 2022


In the example above, the regular expression pattern `(\d{2})-(\d{2})-(\d{4})` matches a date pattern in the format **"dd-mm-yyyy"** . The `search()` method is used to find the first occurrence of the pattern in the given text. The `group()` method is then used to retrieve the entire matched string `(match.group())` , as well as specific matched groups based on the capturing parentheses in the pattern `(match.group(1), match.group(2), match.group(3))` .

Note that `group(0)` is equivalent to `group()` and returns the entire match. The subsequent groups, starting from `group(1)` , correspond to the captured groups in the regular expression pattern.

#Question 5

In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover?
Group 2? Group 1?

..............

Answer 5 - 

In the regular expression `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'` , the group zero `(group(0))` covers the entire match of the regular expression pattern. It represents the entire string that matches the pattern.

In this specific pattern, `group(2)` refers to the second capturing group, which is `(\d\d\d-\d\d\d\d)` . It captures a sequence of three digits followed by a hyphen, and then four more digits.

Similarly, `group(1)` refers to the first capturing group, which is `(\d\d\d)` . It captures a sequence of three digits.

Here's an example to illustrate the usage of these groups:

In [4]:
import re

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
text = 'Phone number: 123-456-7890'

match = re.search(pattern, text)

if match:
    print('Match found!')
    print('Group 0 (Entire match):', match.group(0))
    print('Group 1:', match.group(1))
    print('Group 2:', match.group(2))
else:
    print('No match found.')

Match found!
Group 0 (Entire match): 123-456-7890
Group 1: 123
Group 2: 456-7890


In the example above, the regular expression pattern `(\d\d\d)-(\d\d\d-\d\d\d\d)` matches a phone number pattern in the format `"ddd-ddd-dddd"` . The search() method is used to find the first occurrence of the pattern in the given text. The `group(0)` represents the entire match `(123-456-7890)` , `group(1)` represents the first three-digit sequence `(123)`, and  `group(2)` represents the remaining part of the phone number after the hyphen `(456-7890)` .

#Question 6

In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell
a regex that you want it to fit real parentheses and periods?

.................


Answer 6 - 



In regular expression syntax, parentheses and periods have special meanings, but if you want to match literal parentheses and periods, you can use the backslash `(\)`to escape them. By preceding a special character with a backslash, you can indicate to the regular expression engine that you want to match the literal character.

To match a literal parentheses or period, you would use `\` (`,`  `\)`, and `'\` . respectively.

Here's an example that demonstrates how to match literal parentheses and periods using regular expressions:

In [6]:
import re

text = "The value is (123.45)."

# Match literal parentheses and period
pattern = r"\(\d+\.\d+\)"

match = re.search(pattern, text)

if match:
    print("Match found!")
    print("Matched string:", match.group())
else:
    print("No match found.")

Match found!
Matched string: (123.45)


In the example above, the regular expression pattern `r"\(\d+\.\d+\)"` is used to match a literal opening parenthesis followed by one or more digits, a period, one or more digits, and a closing parenthesis. The backslashes `\( and \)` are used to match the literal parentheses, and \. is used to match the literal period.

By escaping the parentheses and period with backslashes, you can ensure that the regular expression matches the literal characters rather than interpreting them with their special meanings.

#Question 7

The **findall()** method returns a string list or a list of string tuples. What causes it to return one of
the two options?

...............


Answer 7 - 

The `findall()` method in Python's regular expression module `(re)` returns a list of strings or a list of string tuples based on the presence of capturing groups in the regular expression pattern.

If the regular expression pattern contains capturing groups (expressions enclosed in parentheses), **findall()** will return a list of string tuples. Each tuple represents a match, and each element in the tuple corresponds to the captured groups in the pattern.

If the regular expression pattern does not contain any capturing groups, `findall()` will return a list of strings. Each string represents a match of the entire pattern.

Here's an example to illustrate the behavior of `findall()` :

In [12]:
import re

text = "The prices are Rs. 100, Rs. 200, Rs. and Rs. 300."

# Corrected pattern with capturing group
pattern_with_group = r"Rs\.\s*(\d+)"

matches_with_group = re.findall(pattern_with_group, text)
print("Matches with group (list of tuples):", matches_with_group)

# Corrected pattern without capturing group
pattern_without_group = r"Rs\.\s*\d+"

matches_without_group = re.findall(pattern_without_group, text)
print("Matches without group (list of strings):", matches_without_group)

Matches with group (list of tuples): ['100', '200', '300']
Matches without group (list of strings): ['Rs. 100', 'Rs. 200', 'Rs. 300']


In the above code, the pattern `r"Rs\.\s*(\d+)"` is used to match prices in the format `"Rs. xxx"` in the given text. The `\s*` matches zero or more whitespace characters after `"Rs."` , and `(\d+)` captures one or more digits.

As a result, `findall()` returns a list of strings 
 `['100', '200', '300']` for **matches_with_group** , where each string represents the captured digits.

The pattern `r"Rs\.\s*\d+"` is used to match prices without capturing groups. It returns a list of strings 
 `['Rs. 100', 'Rs. 200', 'Rs. 300']` for **matches_without_group** , where each string represents a match of the entire pattern.

#Question 8

In standard expressions, what does the | character mean?

.................


Answer 8 - 


In regular expressions, the `|` character is known as the `pipe` or `alternation operator` . It is used to specify alternatives or choices within the pattern. It allows you to match one pattern or another.

The `|` operator **acts** like a logical `OR` operation, where it matches either the pattern on the left side of the `|` or the pattern on the right side of the `|` . It allows you to create a pattern that matches multiple possibilities.

Here's an example to illustrate the usage of the | operator in regular expressions:

In [13]:
text = "I love cats and dogs."

# Pattern with alternation
pattern = r"cats|dogs"

matches = re.findall(pattern, text)
print("Matches:", matches)

Matches: ['cats', 'dogs']


In the example above, the regular expression pattern `r"cats|dogs"` uses the `|` operator to specify two alternatives: **"cats"** and **"dogs"** . It matches either `"cats"` or `"dogs"` in the given text.

You can use the `|` operator to specify multiple alternatives within a pattern, and it will match any of those alternatives if they occur in the text.

#Question 9

In regular expressions, what does the character stand for?

..............


Answer 9 - 

Wrong Question.

#Question 10

In regular expressions, what is the difference between the + and * characters?

..............


Answer 10 - 

In regular expressions, the `"+"` and `"*"` characters are quantifiers that define the number of occurrences of the preceding pattern. The main difference between the two is:

1) `"+" (Plus)` : The `"+"` quantifier matches one or more occurrences of the preceding pattern. It requires at least one occurrence of the pattern for a match.
For example:

- Pattern: `a+` matches one or more consecutive occurrences of the letter `"a"` . It would match "a", "aa", "aaa", and so on, but not an empty string.

2) `"" (Asterisk):` The `""` quantifier matches zero or more occurrences of the preceding pattern. It allows for the pattern to be repeated zero times or any number of times.

For example:

- Pattern: `a*` matches zero or more consecutive occurrences of the letter `"a"` . It would match an empty string, "a", "aa", "aaa", and so on.

#Question 11

What is the difference between {4} and {4,5} in regular expression?

................

Answer 11 - 


In regular expressions, the expressions {4} and {4,5} are both used as quantifiers to specify the number of occurrences of the preceding pattern. However, they have slightly different meanings:

1) `{4}`: The `{4}` quantifier matches exactly four occurrences of the preceding pattern.

For example:

- Pattern: `a{4}` matches exactly four consecutive occurrences of the letter **"a"** . It would match **"aaaa"** , but not **"aa"** or **"aaaaa"** .

2) `{4,5}`: The `{4,5}` quantifier matches a range of occurrences of the preceding pattern, specifically between four and five occurrences (inclusive).

For example:

- Pattern: `a{4,5}` matches between four and five consecutive occurrences of the letter **"a"** . It would match **"aaaa"** and **"aaaaa"** , but not **"aa"** or **"aaaaaa"** .

#Question 12

What do you mean by the \d, \w, and \s shorthand character classes signify in regular
expressions?


.............


Answer 12 - 


In regular expressions, the shorthand character classes \d, \w, and \s are used to match specific types of characters. Here's what each of these shorthand character classes signifies:

1) `\d` : The \d shorthand character class matches any digit character (0-9).

- For example: `\d` matches a single digit character. It would match "0", "5", "9", but not "A" or "$".

2) `\w` : The \w shorthand character class matches any alphanumeric character (a-z, A-Z, 0-9) and underscore (_).

- For example: `\w` matches a single alphanumeric character or underscore. It would match "a", "A", "0", "_", but not "#" or "@".

3) `\s` : The \s shorthand character class matches any whitespace character, including spaces, tabs, and newlines.

- For example: `\s` matches a single whitespace character. It would match a space, a tab, a newline, but not a letter or a digit.

#Question 13 

What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

..............


Answer 13 - 


1) `\D` : The \D shorthand character class matches any character that is not a digit (0-9).

- For example: `\D` matches a single character that is not a digit. It would match letters, symbols, spaces, and any character except for 0-9.

2) `\W` : The \W shorthand character class matches any character that is not an alphanumeric character (a-z, A-Z, 0-9) or underscore (_).

- For example: `\W` matches a single character that is not an alphanumeric character or underscore. It would match symbols, spaces, punctuation marks, and any character except for letters, digits, and underscores.

3) `\S` : The \S shorthand character class matches any character that is not a whitespace character, including spaces, tabs, and newlines.

- For example: `\S` matches a single character that is not a whitespace character. It would match letters, digits, symbols, and any character except for spaces, tabs, and newlines.


#Question 14

What is the difference between .*? and .*?

.............

Answer 14 - 

1) `.*?` - `Lazy or Non-Greedy Matching` :

The .*? pattern matches any character (except for a newline) zero or more times, but it does so lazily or non-greedily. It means that it matches as few characters as possible to satisfy the overall pattern.

- For example: `a.*?b` matches the shortest substring that starts with "a" and ends with "b". In the string "ababab", it would match "ab" three times separately, instead of matching the entire string.

2) `.*` - `Greedy Matching` :

The .* pattern matches any character (except for a newline) zero or more times, but it does so greedily. It means that it matches as many characters as possible to satisfy the overall pattern.

- For example: `a.*b` matches the longest substring that starts with "a" and ends with "b". In the string "ababab", it would match the entire string "ababab" in one match.


#Question 15

What is the syntax for matching both numbers and lowercase letters with a character class?

.............


Answer 15 - 

To match both numbers and lowercase letters using a character class in regular expressions, you can use the range notation within the character class. The syntax for matching both numbers `(0-9)` and lowercase letters `(a-z)` is `[0-9a-z]` .

Here's a breakdown of the syntax:

- **[ ]** : Denotes a character class.

- **0-9** : Matches any digit from 0 to 9.

- **a-z** : Matches any lowercase letter from a to z.

Combining them within the character class, `[0-9a-z]` will match any character that is either a digit or a lowercase letter.

Example:

In [2]:
import re

pattern = r'[0-9a-z]'
text = 'Aa1Bb2Cc3'

matches = re.findall(pattern, text)
print(matches)

['a', '1', 'b', '2', 'c', '3']


You can modify the character class `[0-9a-z]` to suit your specific needs.

For example, if you want to match uppercase letters as well, you can include A-Z within the character class like `[0-9a-zA-Z]`

#question 16

What is the procedure for making a normal expression in regax case insensitive?

..............


Answer 16 - 

To make a regular expression case-insensitive in Python, you can use the re.IGNORECASE flag or the re.I shorthand. This flag allows the regular expression to match both uppercase and lowercase characters regardless of the case.

Here's the procedure for making a regular expression case-insensitive:

1) Import the re module:

2) Define your regular expression pattern:

3) Compile the regular expression pattern using the re.compile() function and specify the re.IGNORECASE flag or re.I shorthand:

4) Use the compiled regular expression object (regex) to match, search, or perform other operations on your input string.



In [3]:
pattern = r'apple'
text = 'I have an Apple and an orange.'

regex = re.compile(pattern, re.IGNORECASE)
match = regex.search(text)

if match:
    print('Match found:', match.group())
else:
    print('No match found.')

Match found: Apple


#Question 17

What does the . character normally match? What does it match if re.DOTALL is passed as 2nd
argument in re.compile()?

..............

Answer 17 - 

In regular expressions, the `.(dot)` character normally matches any character except a newline character `(\n)` . It is a wildcard that can represent any single character.

For example:

In [12]:
pattern = r'a.b'
text = 'axb a\nb a_b'

matches = re.findall(pattern, text)
print(matches)

['axb', 'a_b']


In this example, the pattern `a.b` matches strings that have an `'a'` , followed by any character, and then followed by `'b'` . It matches `'axb'` and `'a_b'` in the input text because the dot matches any character between `'a'` and `'b'` , except for a newline character.

However, when the re.DOTALL flag is passed as the second argument in `re.compile()` , the behavior of the dot character changes. The re.DOTALL flag (or re.S shorthand) allows the dot to match any character, including newline characters `(\n)` .

For example:

In [11]:
pattern = r'a.b'
text = 'axb a\nb a_b'

regex = re.compile(pattern, re.DOTALL)
matches = regex.findall(text)
print(matches)

['axb', 'a\nb', 'a_b']


In this case, the `re.DOTALL` flag enables the dot to match any character, including newline characters. As a result, the pattern `a.b` matches `'axb'` , `'a\nb'` , and `'a_b` ' in the input text, where `'a\nb'` represents the `'a'` , newline character, and `'b'` sequence.

Using the **re.DOTALL** flag can be useful when you want the dot to match any character, including newlines, within your regular expression pattern.

#Question 18

If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4
hen&#39;) return?

..............

Answer 18 - 

If **numReg** = **re.compile(r'\d+')** and you use **numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')** , it will return the string with all the numeric digits replaced by 'X'.

In [13]:
numRegex = re.compile(r'\d+')
result = numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')
print(result)

X drummers, X pipers, five rings, X hen


#Question 19

What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

..............

Answer 19 - 

Passing **re.VERBOSE** as the second argument to `re.compile()` in Python allows you to use verbose mode in regular expressions. Verbose mode enables you to write more readable and organized regular expressions by allowing you to add comments and whitespace to the pattern without affecting its functionality.

Here's what passing re.VERBOSE as the second argument enables you to do:

1) **Add Comments** : You can add comments within the regular expression pattern using the # symbol. These comments are ignored by the regex engine.

2) **Add Whitespace** : You can add whitespace `(spaces, tabs, newlines)` within the regular expression pattern. This helps improve readability by allowing you to format the pattern across multiple lines and indent it as needed.

3) **Ignore Whitespace** : Whitespace within character classes `([])` and escaped whitespace `(\)` is not ignored. Only whitespace outside of these constructs is ignored.

Here's an example to illustrate the usage of re.VERBOSE:

In [14]:
pattern = r'''                   # Start of the pattern
    ^                           # Match the start of the string
    \d{3}-\d{3}-\d{4}           # Match a phone number in the format xxx-xxx-xxxx
    $                           # Match the end of the string
'''

phone_number = '123-456-7890'

regex = re.compile(pattern, re.VERBOSE)
match = regex.match(phone_number)

if match:
    print('Valid phone number.')
else:
    print('Invalid phone number.')


Valid phone number.


#Question 20

How would you write a regex that match a number with comma for every three digits? It must
match the given following:

'42'

'1,234'

'6,368,745'


but not the following:

'12,34,567'  (which has only two digits between the commas)

'1234' (which lacks commas)

.................


Answer 20 - 

To write a regex that matches a number with a comma for every three digits, you can use the following pattern:

In [15]:
pattern = r'^\d{1,3}(,\d{3})*$'

numbers = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for number in numbers:
    if re.match(pattern, number):
        print(f'Match: {number}')
    else:
        print(f'No match: {number}')


Match: 42
Match: 1,234
Match: 6,368,745
No match: 12,34,567
No match: 1234


Explanation of the pattern r'^\d{1,3}(,\d{3})*$':

- `^` and `$` are anchors that match the start and end of the string, respectively.

- `\d{1,3}` matches one to three digits at the beginning of the string.

- `(,\d{3})*` is a group that matches zero or more occurrences of a comma followed by exactly three digits.

- The pattern allows for the first group of one to three digits, followed by zero or more groups of a comma and three digits.

This pattern ensures that the number has commas separating every three digits, except for the first group of one to three digits. Numbers like '42', '1,234', and '6,368,745' match the pattern and are considered valid. On the other hand, numbers like '12,34,567' (which has only two digits between the commas) and '1234' (which lacks commas) do not match the pattern and are considered invalid.

#Question 21

How would you write a regex that matches the full name of someone whose last name is
Watanabe? You can assume that the first name that comes before it will always be one word that
begins with a capital letter. The regex must match the following:

'Haruto Watanabe'

'Alice Watanabe'

'RoboCop Watanabe'

but not the following:

haruto Watanabe' (where the first name is not capitalized)

'Mr. Watanabe' (where the preceding word has a nonletter character)

'Watanabe' (which has no first name)

'Haruto watanabe' (where Watanabe is not capitalized)

..............

Answer 21 - 



In [16]:
pattern = r'^[A-Z]\w*\sWatanabe$'

names = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe', 'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

for name in names:
    if re.match(pattern, name):
        print(f'Match: {name}')
    else:
        print(f'No match: {name}')

Match: Haruto Watanabe
Match: Alice Watanabe
Match: RoboCop Watanabe
No match: haruto Watanabe
No match: Mr. Watanabe
No match: Watanabe
No match: Haruto watanabe


Explanation of the pattern `r'^[A-Z][a-zA-Z]*\sWatanabe$'` :

- `^` and `$` are anchors that match the start and end of the string, respectively.

- `[A-Z]` matches a capital letter, representing the first letter of the first name.

- `[a-zA-Z]*` matches zero or more letters (both lowercase and uppercase) for the remaining part of the first name.

- `\s` matches a whitespace character.

- Watanabe matches the last name "Watanabe".


#Question 22

How would you write a regex that matches a sentence where the first word is either Alice, Bob,
or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs;
and the sentence ends with a period? This regex should be case-insensitive. It must match the
following:

&#39;Alice eats apples.&#39;

&#39;Bob pets cats.&#39;

&#39;Carol throws baseballs.&#39;

&#39;Alice throws Apples.&#39;

&#39;BOB EATS CATS.&#39;

but not the following:

&#39;RoboCop eats apples.&#39;

&#39;ALICE THROWS FOOTBALLS.&#39;

&#39;Carol eats 7 cats.&#39;


...................


Answer 22 - 


In [17]:
pattern = r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$'

sentences = [
    'Alice eats apples.',
    'Bob pets cats.',
    'Carol throws baseballs.',
    'Alice throws Apples.',
    'BOB EATS CATS.',
    'RoboCop eats apples.',
    'ALICE THROWS FOOTBALLS.',
    'Carol eats 7 cats.'
]

for sentence in sentences:
    if re.match(pattern, sentence, re.IGNORECASE):
        print(f'Match: {sentence}')
    else:
        print(f'No match: {sentence}')


Match: Alice eats apples.
Match: Bob pets cats.
Match: Carol throws baseballs.
Match: Alice throws Apples.
Match: BOB EATS CATS.
No match: RoboCop eats apples.
No match: ALICE THROWS FOOTBALLS.
No match: Carol eats 7 cats.


Explanation of the pattern r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$':

- `^` and `$` are anchors that match the start and end of the string, respectively.

- `(Alice|Bob|Carol)` matches either **"Alice"** , **"Bob"** , or **"Carol"** as the first word. The `|` character acts as an OR operator for the options.

- `\s` matches a whitespace character.

- `(eats|pets|throws)` matches either **"eats"** , **"pets"** , or **"throws"** as the second word.

- `(apples|cats|baseballs)` matches either **"apples"** , **"cats"** , or **"baseballs"** as the third word.

- `\.` matches a period at the end of the sentence.

The **re.IGNORECASE** flag is used to make the pattern case-insensitive, allowing it to match sentences regardless of the letter case.
