1. What is the name of the feature responsible for generating Regex objects?

Ans: The feature responsible for generating Regex objects in Python is the re module, which provides support for regular expressions.

2. Why do raw strings often appear in Regex objects?

Ans: 
Raw strings are often used in Regex objects because they treat backslashes (\) as literal characters rather than escape characters. Regular expressions often contain many backslashes, which are used to specify special characters or sequences. By using raw strings (prefixing the string with r), backslashes are not treated as escape characters, simplifying the representation of regular expressions and avoiding the need for excessive escaping.

3. What is the return value of the search() method?

Ans: The return value of the search() method in regular expressions is an object of the re.Match class. This object represents the first occurrence of a pattern within a string that matches the regular expression. If a match is found, you can access information about the match using various methods and attributes of the re.Match object. If no match is found, the search() method returns None.

4. From a Match item, how do you get the actual strings that match the pattern?

Ans: 
To get the actual strings that match the pattern from a Match object, you can use the group() method. The group() method returns the specific substring that matched the pattern.

In [1]:
import re

text = "Hello, World!"

# Match the word "Hello" in the text
match = re.search(r"Hello", text)

if match:
    # Get the matched substring
    matched_string = match.group()
    print(matched_string)  # Output: Hello

Hello


5. In the regex which created from the r&#39;(\d\d\d)-(\d\d\d-\d\d\d\d)&#39;, what does group zero cover?
Group 2? Group 1?

Ans: 
In the regex pattern r'(\d\d\d)-(\d\d\d-\d\d\d\d)', the groups are defined using parentheses. Let's break down the groups:

1)Group 0 (or group zero): It covers the entire matched substring. In this case, it covers the entire pattern (\d\d\d)-(\d\d\d-\d\d\d\d).

2)Group 1: It covers the first capturing group (\d\d\d). This group captures a sequence of three digits.

3)Group 2: It covers the second capturing group (\d\d\d-\d\d\d\d). This group captures a sequence of three digits followed by a hyphen and then another sequence of four digits.

In [2]:
import re

text = "Phone number: 123-456-7890"

# Match the phone number pattern and capture groups
match = re.search(r'(\d\d\d)-(\d\d\d-\d\d\d\d)', text)

if match:
    # Group 0: Entire matched substring
    print("Group 0:", match.group(0))  

    # Group 1: First capturing group
    print("Group 1:", match.group(1))  

    # Group 2: Second capturing group
    print("Group 2:", match.group(2))  


Group 0: 123-456-7890
Group 1: 123
Group 2: 456-7890


6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell
a regex that you want it to fit real parentheses and periods?

Ans: 
To tell a regex that you want to match literal parentheses and periods instead of using their special meanings in regular expression syntax, you can use the backslash \ to escape them. The backslash \ is used to indicate that the following character should be treated as a literal character and not as a special metacharacter.

Here are the escape sequences for parentheses and periods:

\( and \): Matches literal parentheses.

\.: Matches a literal period.

For example, if you want to match the string "(example.com)", including the parentheses and the period, you can use the following regex:

In [3]:
import re

text = "(example.com)"

# Match literal parentheses and period
match = re.search(r'\(example\.com\)', text)

if match:
    print("Match found:", match.group(0))


Match found: (example.com)


7. The findall() method returns a string list or a list of string tuples. What causes it to return one of
the two options?

Ans: The findall() method in regular expressions returns different types of results depending on the pattern used in the regex:

If the regex pattern contains no capturing groups (parentheses), findall() returns a list of strings. Each element in the list represents a complete match of the pattern.

If the regex pattern contains one or more capturing groups, findall() returns a list of tuples. Each tuple corresponds to a match of the entire pattern, and each element of the tuple represents a captured group.

8. In standard expressions, what does the '|' character mean?

Ans: In standard regular expressions, the '|' character (pipe) is used to denote the logical OR operation. It allows you to specify multiple alternative patterns, and the regular expression engine will match any of the patterns.

9. In regular expressions, what does the character stand for?

Ans: In regular expressions, the . character (dot) is a special metacharacter that represents any character except a newline. It matches any single character in the input string, except for newline characters.

10.In regular expressions, what is the difference between the + and * characters?

Ans: + matches one or more occurrences, while * matches zero or more occurrences. The main difference is that + requires at least one occurrence of the preceding element, while * allows for zero occurrences as well.

11. What is the difference between {4} and {4,5} in regular expression?

Ans: 1) {4} matches exactly four occurrences, while {4,5} matches between four and five occurrences.

12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular
expressions?

Ans: In regular expressions, the shorthand character classes \d, \w, and \s have the following meanings:

1) \d:

Matches any digit character (0-9).
Equivalent to the character class [0-9].
Example: The pattern r"\d\d\d" matches any three consecutive digits.

2) \w:

Matches any alphanumeric character (a-z, A-Z, 0-9) and underscore (_).
Equivalent to the character class [a-zA-Z0-9_].
Example: The pattern r"\w+" matches one or more consecutive alphanumeric characters or underscores.

3) \s:

Matches any whitespace character (space, tab, newline).
Equivalent to the character class [\t\n\r\f\v ].
Example: The pattern r"\s+" matches one or more consecutive whitespace characters.
These shorthand character classes provide a convenient way to match commonly used character groups in regular expressions.








13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

Ans: In regular expressions, the shorthand character classes \D, \W, and \S have the following meanings:

1)\D:

Matches any non-digit character.
Equivalent to the negated character class [^0-9].
Example: The pattern r"\D+" matches one or more consecutive non-digit characters.

2)\W:

Matches any non-alphanumeric character (excluding underscore).
Equivalent to the negated character class [^a-zA-Z0-9_].
Example: The pattern r"\W+" matches one or more consecutive non-alphanumeric characters.

3)\S:

Matches any non-whitespace character.
Equivalent to the negated character class [^\t\n\r\f\v ].
Example: The pattern r"\S+" matches one or more consecutive non-whitespace characters.
These shorthand character classes provide a convenient way to match characters that are not included in specific character groups in regular expressions.

14. What is the difference between .*? and .*?

Ans: .*? performs a non-greedy match, while .* performs a greedy match. The presence of the ? makes the quantifier non-greedy, causing it to match as little as possible.

15. What is the syntax for matching both numbers and lowercase letters with a character class?

Ans: To match both numbers and lowercase letters using a character class in regular expressions, you can use the following syntax:

    [0-9a-z]


16. What is the procedure for making a normal expression in regax case insensitive?

Ans:To make a regular expression case insensitive in Python, you can use the re.IGNORECASE flag or the re.I flag. This flag enables case-insensitive matching when used with the regular expression functions in the re module.

Here's the procedure to make a regular expression case insensitive:

1)Import the re module: import re

2)efine your regular expression pattern.

3)Use the re.IGNORECASE flag or re.I flag as the second argument when calling the appropriate re function (re.match(), re.search(), re.findall(), etc.).



17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd
argument in re.compile()?

Ans: In a regular expression, the . (dot) character normally matches any character except a newline character (\n). It matches any single character in the input string.

However, if you pass re.DOTALL or re.S as the second argument when compiling the regular expression pattern using re.compile(), then the . (dot) character will match any character including newline characters.

18. If numReg = re.compile(r&#39;\d+&#39;), what will numRegex.sub(&#39;X&#39;, &#39;11 drummers, 10 pipers, five rings, 4
hen&#39;) return?

Ans: The numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') statement will return the string 'X drummers, X pipers, five rings, X hen'.

The sub() method of a compiled regular expression (numRegex in this case) is used to substitute occurrences of the pattern in a string with a specified replacement. In this case, the pattern r'\d+' matches one or more digits.

By calling numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen'), all occurrences of one or more digits in the input string are replaced with the character 'X'. Therefore, the resulting string will have all the numeric values replaced with 'X', while the other parts of the string remain unchanged.

19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

Ans: Passing re.VERBOSE as the second argument to re.compile() allows you to add whitespace and comments to the regular expression pattern for better readability.

By default, in a regular expression pattern, whitespace characters are significant and contribute to the pattern matching. However, when re.VERBOSE flag is used, whitespace characters (excluding those within character classes) are ignored, allowing you to add spaces, line breaks, and comments to make the pattern more readable and organized.

20. How would you write a regex that match a number with comma for every three digits? It must
match the given following:
&#39;42&#39;
&#39;1,234&#39;
&#39;6,368,745&#39;

Ans: You can write a regex pattern to match a number with commas for every three digits using the following pattern:

In [5]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')

strings = ['42', '1,234', '6,368,745']

for string in strings:
    if pattern.match(string):
        print(f"Matched: {string}")
    else:
        print(f"Not matched: {string}")


Matched: 42
Matched: 1,234
Matched: 6,368,745
