Q1. What is the name of the feature responsible for generating Regex objects ?

Ans ->

The `re` module in Python is responsible for creating Regex (regular expression) objects.

- `re` module: This module offers functionalities for handling regular expressions in Python. It encompasses functions and methods for formulating, compiling, and utilizing regular expressions to search, match, and modify text strings.

For instance:
```python
import re

pattern = re.compile(r'\d+')  # Compile a regular expression pattern
matches = pattern.findall('There are 123 apples and 456 oranges.')  # Find all matches of digits in the string
```

By using the `re` module, you can formulate Regex objects by compiling regular expression patterns, which can subsequently be used to execute various operations such as searching, matching, and replacing text based on the established pattern.

Q2. Why do raw strings often appear in Regex objects ?

Ans ->

Raw strings (`r''`) are frequently used in Regex objects in Python as they prevent backslashes (`\`) from being interpreted as escape characters.

In regular expressions, backslashes are typically used to escape special characters and denote metacharacters or special sequences. However, in Python string literals, backslashes also serve as escape characters for special sequences such as newline (`\n`) or tab (`\t`).

Utilizing raw strings (`r''`) in Regex objects ensures that backslashes are treated as literal characters, simplifying the writing and reading of regular expressions by eliminating the need for double escaping. This is particularly beneficial when working with complex regular expressions that contain numerous backslashes, as it prevents confusion and reduces the chance of errors.

For instance:
```python
import re

# Without raw string
pattern = re.compile('\\d+')  # Compiles a regular expression to match digits

# With raw string
pattern = re.compile(r'\d+')  # Compiles the same regular expression using a raw string
```

Employing a raw string (`r''`) in regular expressions enhances the expression's simplicity and readability by eliminating the need for extra escaping of backslashes.

Q3. What is the return value of the search() method ?

Ans ->

The `search()` method in Python's `re` module yields a `Match` object when a match is identified, or `None` if there is no match.

Q4. From a Match item, how do you get the actual strings that match the pattern ?

Ans ->

You can extract the actual strings that correspond to the pattern from a `Match` object using the `group()` method.

Q5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group zero cover? Group 2? Group 1 ?

Ans ->

In the regular expression formed from the pattern `r'(\d\d\d)-(\d\d\d-\d\d\d\d)'`:

- Group 0 encompasses the whole matched string.
- Group 1 encompasses the substring matched by the first set of parentheses `(\d\d\d)`.
- Group 2 encompasses the substring matched by the second set of parentheses `(\d\d\d-\d\d\d\d)`.

Q6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods ?

Ans ->

To specify to a regular expression that you intend to match literal parentheses `()` and periods `.`, you can employ the backslash `\` to escape them. This directs the regex engine to treat these characters as literal characters rather than their typical interpretation as special metacharacters.

Q7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options ?

Ans ->

The `findall()` method yields either a list of strings or a list of tuples of strings, depending on whether capturing groups are present in the regular expression pattern.

- If the regular expression pattern includes capturing groups (indicated by parentheses `()`), `findall()` yields a list of tuples of strings. Each tuple corresponds to a match, and each element within the tuple contains the substring matched by a capturing group in the pattern.

- If the regular expression pattern does not include any capturing groups, `findall()` yields a list of strings. Each string in the list represents the entire matched substring for each match identified in the input string.

The existence or non-existence of capturing groups in the regular expression pattern dictates the structure of the resulting list returned by `findall()`.

Q8. In standard expressions, what does the | character mean ?

Ans ->

In regular expressions, the `|` symbol, also referred to as the pipe or alternation operator, functions as a logical OR operator. It enables you to define multiple alternatives for a match.

For instance, in the pattern `cat|dog`, the regex engine will try to match either `cat` or `dog`. If any of these alternatives are detected in the input string, the pattern is deemed to have matched.

The `|` symbol is beneficial for defining multiple alternative patterns within a single regular expression, offering flexibility in matching a range of possibilities.

Q9. In regular expressions, what does the character stand for ?

Ans ->

In regular expressions, the `|` symbol, also referred to as the pipe or alternation operator, functions as a logical OR operator. It enables you to define multiple alternatives for a match.

For instance, in the pattern `cat|dog`, the regex engine will try to match either `cat` or `dog`. If any of these alternatives are detected in the input string, the pattern is deemed to have matched.

The `|` symbol is beneficial for defining multiple alternative patterns within a single regular expression, offering flexibility in matching a range of possibilities.

Q10. In regular expressions, what is the difference between the + and * characters ?

Ans ->

In regular expressions, the `+` symbol signifies "one or more" instances of the preceding element, whereas the `*` symbol signifies "zero or more" instances of the preceding element.

Q11. What is the difference between {4} and {4,5} in regular expression ?

Ans ->

In regular expressions, `{4}` denotes exactly four instances of the preceding element, whereas `{4,5}` denotes a range of instances for the preceding element, from four to five times, inclusive.

Q12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions ?

Ans ->

In regular expressions, the shorthand character classes `\d`, `\w`, and `\s` represent the following:

- `\d`: Matches any numeric character. It is equivalent to the character class `[0-9]`.
- `\w`: Matches any alphanumeric character (i.e., letters, numbers, or underscores). It is equivalent to the character class `[a-zA-Z0-9_]`.
- `\s`: Matches any whitespace character (i.e., spaces, tabs, newlines, or carriage returns). It is equivalent to the character class `[ \t\n\r]`.

These shorthand character classes offer convenient alternatives for frequently used character classes, enabling more succinct regular expression patterns.

Q13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions ?

Ans ->

In regular expressions, the shorthand character classes `\D`, `\W`, and `\S` represent the following:

- `\D`: Matches any character that is not a numeric digit. It is the opposite of the `\d` shorthand character class.
- `\W`: Matches any character that is not a word character. It is the opposite of the `\w` shorthand character class.
- `\S`: Matches any character that is not a whitespace character. It is the opposite of the `\s` shorthand character class.

These shorthand character classes offer convenient alternatives for matching characters that do not fall into certain categories, enabling more succinct regular expression patterns.

Q14. What is the difference between .*? and .* ?

Ans ->

In the realm of regular expressions, both `.*?` and `.*` serve as quantifiers that can match any character zero or more times. However, their operational behavior differs significantly:

- `.*`: This quantifier is classified as greedy, which implies that it strives to match the maximum number of characters possible while still ensuring that the overall pattern matches. It will absorb as many characters as it can, potentially matching more than what was intended.
- `.*?`: This quantifier is categorized as non-greedy or lazy, which means it aims to match the minimum number of characters possible while still ensuring that the overall pattern matches. It will absorb only the necessary number of characters, potentially matching fewer characters than what was intended.

In essence, `.*` operates in a greedy manner, absorbing as many characters as it can, while `.*?` operates in a non-greedy or lazy manner, absorbing only the necessary number of characters.

Q15. What is the syntax for matching both numbers and lowercase letters with a character class ?

Ans ->

The regular expression `[0-9a-z]` is a character class syntax used to match both numbers and lowercase letters.

The character class `[0-9a-z]` defines a range that encompasses all digits (`0-9`) and all lowercase letters (`a-z`). It corresponds to any single character that is either a digit or a lowercase letter.

The utilization of this character class enables the efficient and succinct matching of alphanumeric characters.

Q16. What is the procedure for making a normal expression in regax case insensitive ?

Ans ->

In regex, if you want to make a regular expression case insensitive, you can employ the `re.IGNORECASE` or `re.I` flag during the compilation of the regular expression pattern. This flag enables the pattern to match both uppercase and lowercase versions of letters indiscriminately.

Here's an example:
```python
import re

pattern = re.compile(r'hello', re.IGNORECASE)
# or equivalently:
# pattern = re.compile(r'hello', re.I)
```

By applying the `re.IGNORECASE` or `re.I` flag, the pattern `r'hello'` would not only match `hello`, but also `Hello`, `HELLO`, `HeLLo`, and so forth, regardless of case. This renders the regular expression case insensitive.

Q17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile() ?

Ans ->

In the context of regular expressions, the `.` (dot) character typically matches any single character, with the exception of a newline character (`\n`). It acts as a wildcard symbol, standing in for any character within a pattern.

However, if `re.DOTALL` is supplied as the second argument in `re.compile()`, the `.` character will match any character, newline characters (`\n`) included. This flag essentially activates the "dot-all" mode, permitting the dot to match newline characters as well.

Here's an illustration:
```python
import re

pattern_normal = re.compile(r'.')
pattern_dotall = re.compile(r'.', re.DOTALL)
```

In the case of `pattern_normal`, the dot `.` would match any character excluding newline characters. On the other hand, with `pattern_dotall`, the dot `.` would match any character, newline characters included.

Q18. If numReg = re.compile(r'\d+'), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return ?

Ans ->

If `numReg = re.compile(r'\d+')`, then executing `numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen')` will yield the string `'X drummers, X pipers, five rings, X hen'`.

Here's the breakdown:
- The regular expression `r'\d+'` is designed to match one or more digits.
- The `sub()` function replaces all instances of the pattern identified by the regular expression with the specified replacement string `'X'`.
- In the provided input string `'11 drummers, 10 pipers, five rings, 4 hen'`, the numbers `'11'`, `'10'`, and `'4'` are matched by the pattern `\d+`.
- Consequently, these numbers are substituted with the string `'X'`, leading to the final output string `'X drummers, X pipers, five rings, X hen'`.

Q19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do ?

Ans ->

By supplying `re.VERBOSE` as the second argument to `re.compile()`, you can construct regular expressions that incorporate whitespace and comments, enhancing their readability.

- With `re.VERBOSE` in use, whitespace within the regular expression pattern is disregarded, unless it's escaped or within a character class `[ ]`.
- Furthermore, you can incorporate comments in the regular expression by starting them with `#`. These comments extend until the line's end.

This feature proves beneficial for crafting intricate regular expressions that are more comprehensible and maintainable.

For instance:
```python
import re

pattern = re.compile(r'''
    \d+  # Matches one or more digits
    \s*  # Matches zero or more whitespace characters
    [a-z]+  # Matches one or more lowercase letters
''', re.VERBOSE)
```

In this example, the `re.VERBOSE` flag facilitates the inclusion of comments (`#`) and whitespace within the regular expression pattern, thereby enhancing its readability.

Q20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
'42'
'1,234'
'6,368,745'
but not the following:
'12,34,567' (which has only two digits between the commas)
'1234' (which lacks commas)

Ans ->

To identify a number that uses commas for every three digits, you can employ the following regular expression:

```python
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')
```

Here's the breakdown:
- `^`: Corresponds to the beginning of the string.
- `\d{1,3}`: Matches one to three digits at the start.
- `(,\d{3})*`: Matches zero or more instances of a comma followed by precisely three digits.
- `$`: Corresponds to the end of the string.

This regular expression will match strings that are composed of one to three digits followed by zero or more sets of a comma and exactly three digits. This enables it to match numbers that use commas for every three digits.

Q21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
'Haruto Watanabe'
'Alice Watanabe'
'RoboCop Watanabe'
but not the following:
'haruto Watanabe' (where the first name is not capitalized)
'Mr. Watanabe' (where the preceding word has a nonletter character)
'Watanabe' (which has no first name)
'Haruto watanabe' (where Watanabe is not capitalized)

Ans ->

To identify a person whose surname is Watanabe, given that the first name always starts with a capital letter, you can utilize the following regular expression:

```python
import re

pattern = re.compile(r'^[A-Z][a-z]*\sWatanabe$')
```

Here's the explanation:
- `^`: Corresponds to the beginning of the string.
- `[A-Z]`: Matches a single uppercase letter (the initial letter of the first name).
- `[a-z]*`: Matches zero or more lowercase letters (the rest of the first name).
- `\s`: Matches a whitespace character (space).
- `Watanabe`: Matches the surname "Watanabe" literally.
- `$`: Corresponds to the end of the string.

This regular expression will match strings that are composed of a single uppercase letter followed by zero or more lowercase letters (the first name), a whitespace character, and the surname "Watanabe". This ensures that the surname is specifically "Watanabe" and that the first name begins with a capital letter.

Q22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
'Alice eats apples.'
'Bob pets cats.'
'Carol throws baseballs.'
'Alice throws Apples.'
'BOB EATS CATS.'
but not the following:
'RoboCop eats apples.'
'ALICE THROWS FOOTBALLS.'
'Carol eats 7 cats.'

Ans ->

To identify a sentence where the initial word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is either apples, cats, or baseballs; and the sentence concludes with a period, while disregarding case, you can employ the following regular expression:

```python
import re

pattern = re.compile(r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$', re.IGNORECASE)
```

Here's the explanation:
- `^`: Corresponds to the beginning of the string.
- `(Alice|Bob|Carol)`: Matches either "Alice", "Bob", or "Carol" as the first word.
- `\s`: Matches a whitespace character (space).
- `(eats|pets|throws)`: Matches either "eats", "pets", or "throws" as the second word.
- `(apples|cats|baseballs)`: Matches either "apples", "cats", or "baseballs" as the third word.
- `\.`: Matches a period at the end of the string.
- `$`: Corresponds to the end of the string.

This regular expression will match strings that adhere to the specified pattern, irrespective of the case of the words.