#  1. What is the name of the feature responsible for generating Regex objects?

# 2. Why do raw strings often appear in Regex objects?

In [None]:
Raw strings are often used in Regex objects to avoid unwanted interpretation of escape characters. In Python, backslashes () have a special meaning in string literals as they are used to represent escape sequences. However, in regular expressions, backslashes are frequently used as part of the pattern syntax.

By using a raw string in a Regex object, denoted by the 'r' prefix, backslashes are treated as literal characters rather than escape characters. This allows you to write regular expressions without having to double-escape backslashes. It simplifies the readability and avoids potential errors caused by misinterpretation of backslashes.

For example, consider a regular expression pattern that matches a backslash followed by a digit: r'\(\d)'. If a non-raw string were used, it would be written as '\(\d)', with each backslash being escaped. However, by using a raw string, only a single backslash is required, making the pattern more concise and easier to understand: r'\(\d)'.

# 3. What is the return value of the search() method?

In [None]:
The search() method of a Regex object in Python returns a match object if a match is found, and None if no match is found.
Here's an example demonstrating the usage of the search() method:

import re

pattern = r'hello'
text = 'Hello, world!'

match = re.search(pattern, text)

if match:
    print('Match found:', match.group())
else:
    print('No match found') 
    
If a match is found, the program prints 'Match found:' followed by the matched string obtained using match.group(). In this case, the output is 'Match found: Hello'.

If no match is found, the program prints 'No match found'.


# 4. From a Match item, how do you get the actual strings that match the pattern?

In [None]:
To obtain the actual strings that match the pattern from a match object in Python, you can use the group() method or its variants.

The group() method, without any arguments, returns the entire matched string. If capturing groups are present in the pattern, you can pass an argument to group() to access the specific captured group.

Here's an example:

import re

pattern = r'(\d+)-(\w+)'
text = '123-abc'

match = re.search(pattern, text)

if match:
    print('Match found:', match.group())      # Output: Match found: 123-abc
    print('Group 1:', match.group(1))          # Output: Group 1: 123
    print('Group 2:', match.group(2))          # Output: Group 2: abc
else:
    print('No match found')

    
In this example, the pattern r'(\d+)-(\w+)' consists of two capturing groups: (\d+) to match one or more digits and (\w+) to match one or more word characters.

After performing the search using re.search(), the match object is assigned to the variable match. By calling match.group(), the entire matched string '123-abc' is returned.

You can also access specific captured groups using match.group(n), where n represents the index of the capturing group. In this case, match.group(1) returns '123', and match.group(2) returns 'abc', representing the matched substrings within the respective capturing groups.

# 5. In the regex which created from the r'(\d\d\d)-(\d\d\d-\d\d\d\d)'', what does group zero cover?
# Group 2? Group 1?

In [None]:
In the regular expression r'(\d\d\d)-(\d\d\d-\d\d\d\d)', the groups are defined by the parentheses (). Here's the breakdown of the groups:

Group 0: The entire matched string.
Group 1: The first capturing group (\d\d\d), which matches three digits.
Group 2: The second capturing group (\d\d\d-\d\d\d\d), which matches a pattern of three digits followed by a hyphen and four digits.
To illustrate this, let's consider an example:
import re

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
text = '123-456-7890'

match = re.search(pattern, text)

if match:
    print('Match found:', match.group())      # Output: Match found: 123-456-7890
    print('Group 0:', match.group(0))         # Output: Group 0: 123-456-7890
    print('Group 1:', match.group(1))         # Output: Group 1: 123
    print('Group 2:', match.group(2))         # Output: Group 2: 456-7890
else:
    print('No match found')
    
In this example, the input string '123-456-7890' matches the pattern. Using re.search(), we obtain the match object assigned to the variable match.

match.group() returns the entire matched string '123-456-7890'.
match.group(0) is equivalent to match.group(), returning the entire matched string '123-456-7890'.
match.group(1) returns the string '123', which represents the content of the first capturing group (\d\d\d).
match.group(2) returns the string '456-7890', corresponding to the second capturing group (\d\d\d-\d\d\d\d).


# 6. In standard expression syntax, parentheses and intervals have distinct meanings. How can you tell a regex that you want it to fit real parentheses and periods?

In [None]:
In regular expression syntax, parentheses and periods have special meanings, representing grouping and any character (except newline) respectively. To match real parentheses and periods as literal characters in a regex, you can use the backslash \ to escape them.

Here's how you can use the backslash to match real parentheses and periods:

To match a literal parenthesis, use \( to escape the opening parenthesis and \) to escape the closing parenthesis. For example, to match the string "(Hello)", the regex pattern would be \(Hello\).

To match a literal period, use \. to escape the period. For example, to match the string "www.example.com", the regex pattern would be www\.example\.com.

By escaping parentheses and periods with a backslash, you can indicate to the regex that you want them to be treated as literal characters rather than having their special meanings.

Here's an example illustrating the usage of escaped parentheses and periods:

import re

pattern = r'\(Hello\)'
text = '(Hello)'

match = re.search(pattern, text)

if match:
    print('Match found:', match.group())   # Output: Match found: (Hello)
else:
    print('No match found')

    In this example, the regex pattern r'\(Hello\)' matches the string "(Hello)" exactly. The escaped parentheses \( and \) represent literal parentheses. If a match is found, the program prints 'Match found: (Hello)'

# 7. The findall() method returns a string list or a list of string tuples. What causes it to return one of the two options?

In [None]:
The findall() method in Python's regular expression module, re, returns different results based on the presence of capturing groups in the regular expression pattern.

If the regular expression pattern has no capturing groups (i.e., no parentheses), findall() returns a list of strings. Each element in the list represents a complete match found in the input string.

For example:
import re

pattern = r'\d+'
text = 'I have 42 apples and 7 oranges.'

matches = re.findall(pattern, text)

print(matches)  # Output: ['42', '7']

In this example, the pattern \d+ matches one or more digits. Since there are no capturing groups in the pattern, findall() returns a list of strings ['42', '7'], representing the complete matches found in the input string.

# 8. In standard expressions, what does the | character mean?

In [None]:
In regular expressions, the | character is known as the pipe or alternation operator. It is used to specify alternatives or choices within a pattern. The | character allows you to match either the pattern to its left or the pattern to its right.

Here's an example to illustrate the usage of the | character:

import re

pattern = r'apple|banana'
text = 'I like apple and banana.'

matches = re.findall(pattern, text)

print(matches)  # Output: ['apple', 'banana']


# 9. In regular expressions, what does the character stand for? 

In [None]:
. (dot): Matches any character except a newline.
* (asterisk): Matches zero or more occurrences of the preceding element.
+ (plus): Matches one or more occurrences of the preceding element.
? (question mark): Matches zero or one occurrence of the preceding element.
^ (caret): Matches the start of a string.
$ (dollar sign): Matches the end of a string.
[ ] (square brackets): Defines a character class, matching any single character within the brackets.
| (pipe): Acts as the alternation operator, allowing the choice between two or more patterns.
\ (backslash): Used for escaping special characters or indicating special sequences.
These are just a few examples of the special characters used in regular expressions. Each character has its own significance and meaning, enabling you to construct powerful patterns for matching and manipulating strings. If you have a specific character in mind or need more information about a particular character, please let me know, and I'll be happy to provide further details.


# 10.In regular expressions, what is the difference between the + and * characters?

In [None]:
1- * (asterisk):

Matches zero or more occurrences of the preceding element.
The preceding element can occur zero times, making it optional.
Examples: ab* matches "a", "ab", "abb", "abbb", and so on.

2- + (plus):

Matches one or more occurrences of the preceding element.
The preceding element must occur at least once.
Examples: ab+ matches "ab", "abb", "abbb", and so on, but not "a" alone.
To summarize:

* matches zero or more occurrences, including the possibility of zero occurrences.
+ matches one or more occurrences, requiring at least one occurrence.
Here's an example to illustrate the difference:

import re

pattern1 = r'ab*'  # Matches "a", "ab", "abb", "abbb", ...
pattern2 = r'ab+'  # Matches "ab", "abb", "abbb", ...

text = 'ab abb a'

matches1 = re.findall(pattern1, text)
matches2 = re.findall(pattern2, text)

print(matches1)  # Output: ['a', 'ab', 'abb']
print(matches2)  # Output: ['ab', 'abb']


# 11. What is the difference between {4} and {4,5} in regular expression?

In [None]:
In regular expressions, the curly braces {} are used as quantifiers to specify the exact number of occurrences of the preceding element in a pattern. The difference between {4} and {4,5} is as follows:

{4}:

Matches exactly four occurrences of the preceding element.
Examples: a{4} matches "aaaa", (\d{4}) matches a group of four digits.
{4,5}:

Matches a range of occurrences of the preceding element, from the minimum specified to the maximum specified.
Matches at least four occurrences and up to five occurrences.
Examples: a{4,5} matches "aaaa" and "aaaaa", (\d{4,5}) matches a group of four or five digits.
To summarize:

{4} matches exactly four occurrences of the preceding element.
{4,5} matches a range of occurrences, from four to five occurrences.
Here's an example to illustrate the difference:
import re

pattern1 = r'a{4}'    # Matches exactly four 'a' characters.
pattern2 = r'a{4,5}'  # Matches four or five 'a' characters.

text = 'aaaa aaaaa aa'

matches1 = re.findall(pattern1, text)
matches2 = re.findall(pattern2, text)

print(matches1)  # Output: ['aaaa']
print(matches2)  # Output: ['aaaa', 'aaaaa']

In this example, the pattern a{4} matches "aaaa" because it matches exactly four occurrences of the letter "a". The pattern a{4,5} matches "aaaa" and "aaaaa" because it matches a range of occurrences from four to five.

# 12. What do you mean by the \d, \w, and \s shorthand character classes signify in regular expressions?

In [None]:
\d:

Represents any digit character (0-9).
It is equivalent to the character class [0-9].
Example: \d matches any single digit.

\w:

Represents any word character (alphanumeric character or underscore).
It includes uppercase and lowercase letters, digits, and underscore (_).
It is equivalent to the character class [a-zA-Z0-9_].
Example: \w matches any single alphanumeric character or underscore.

\s:

Represents any whitespace character, including spaces, tabs, and newline characters.
It is equivalent to the character class [\t\n\r\f\v ].
Example: \s matches any single whitespace character.

# 13. What do means by \D, \W, and \S shorthand character classes signify in regular expressions?

In [None]:
\D:

Represents any non-digit character.
It is equivalent to the character class [^0-9].
Example: \D matches any single non-digit character.

\W:

Represents any non-word character.
It matches any character that is not an alphanumeric character or underscore (_).
It is equivalent to the character class [^a-zA-Z0-9_].
Example: \W matches any single non-word character.

\S:

Represents any non-whitespace character.
It matches any character that is not a whitespace character (spaces, tabs, newlines, etc.).
It is equivalent to the character class [^\t\n\r\f\v ].
Example: \S matches any single non-whitespace character.

# 14. What is the difference between .*? and .*?

In [None]:
.*? (Lazy or Non-greedy):

The ? after .* makes the .* pattern match lazily or non-greedily.
It matches the shortest possible substring that satisfies the pattern.
It stops as soon as the subsequent part of the pattern can be matched.
Example: a.*?b matches "ab" in the string "aabab".
.* (Greedy):

Without the ?, the .* pattern matches greedily.
It matches the longest possible substring that satisfies the pattern.
It continues matching until the subsequent part of the pattern cannot be matched.
Example: a.*b matches "aabab" in the string "aabab".

# 15. What is the syntax for matching both numbers and lowercase letters with a character class?

In [None]:
To match both numbers and lowercase letters using a character class in regular expressions, you can use the range of characters within the character class. The syntax for matching both numbers (0-9) and lowercase letters (a-z) in a character class is [0-9a-z]. 
Here is example:

import re

text = 'abc123xyz'

pattern = r'[0-9a-z]'  # Matches any single digit or lowercase letter

matches = re.findall(pattern, text)

print(matches)  # Output: ['a', 'b', 'c', '1', '2', '3', 'x', 'y', 'z']

In this example, the pattern [0-9a-z] matches any single character that is a digit (0-9) or a lowercase letter (a-z) in the string text. The re.findall() function is used to find all matches of the pattern in the text. The output will be ['a', 'b', 'c', '1', '2', '3', 'x', 'y', 'z'], which represents the matching characters in the text.

# 16. What is the procedure for making a normal expression in regax case insensitive?

In [None]:
To make a regular expression case-insensitive in Python, you can use the re.IGNORECASE flag or the re.I flag as an argument to the regex functions. Here's the procedure:

Import the re module:
import re

Define your regular expression pattern using the re module functions.

Use the re.IGNORECASE flag as an argument to the regex functions.

Alternatively, you can use the re.I flag, which is an abbreviation for re.IGNORECASE.
Here's an example that demonstrates how to make a regular expression case-insensitive:

import re

text = 'Hello, World!'

pattern = r'hello'  # Case-sensitive pattern

matches = re.findall(pattern, text)
print(matches)  # Output: []

pattern = r'hello'  # Case-insensitive pattern

matches = re.findall(pattern, text, re.IGNORECASE)
print(matches)  # Output: ['Hello']


# 17. What does the . character normally match? What does it match if re.DOTALL is passed as 2nd argument in re.compile()?

In [None]:
In regular expressions, the . (dot) character normally matches any character except a newline character (\n). It matches a single character from the input text.

However, if the re.DOTALL flag is passed as the second argument in the re.compile() function or used as a flag in other regex functions, it changes the behavior of the . character. With the re.DOTALL flag, the . character matches any character, including newline characters.
Example:
    
import re

text = '''Hello
World'''

pattern = r'H.llo'  # Matches 'H' followed by any character followed by 'llo'
matches = re.findall(pattern, text)
print(matches)  # Output: ['Hello']

pattern = r'H.llo'  # Matches 'H' followed by any character followed by 'llo' (with re.DOTALL)
regex = re.compile(pattern, re.DOTALL)
matches = regex.findall(text)
print(matches)  # Output: ['Hello\n']

In the first example without the re.DOTALL flag, the pattern H.llo matches the substring 'Hello' because the . matches any character except a newline character. The newline character between 'Hello' and 'World' prevents a match.

In the second example with the re.DOTALL flag, the pattern H.llo is compiled with re.DOTALL, which makes the . match any character, including newline characters. As a result, the pattern matches the entire substring 'Hello\n', including the newline character.

# 18. If numReg = re.compile(r'\d+''), what will numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') return?

In [None]:
If numReg is defined as numReg = re.compile(r'\d+'), and we use numReg.sub('X', '11 drummers, 10 pipers, five rings, 4 hen'), the sub() method will replace all occurrences of one or more digits with the string 'X'. Here's the result:
import re

numReg = re.compile(r'\d+')
text = '11 drummers, 10 pipers, five rings, 4 hen'
result = numReg.sub('X', text)
print(result)  # Output: 'X drummers, X pipers, five rings, X hen'

In this case, the sub() method replaces all sequences of digits with the string 'X'. The resulting string is 'X drummers, X pipers, five rings, X hen'. All numeric values in the original string are replaced with the letter 'X'.

# 19. What does passing re.VERBOSE as the 2nd argument to re.compile() allow to do?

In [None]:
Passing re.VERBOSE as the second argument to re.compile() allows you to include whitespace and comments in your regular expression pattern for improved readability. It enables the use of multi-line regular expressions with inline comments.

Here's an example to illustrate the usage of re.VERBOSE:

import re

pattern = r'''
    \d{3}  # Match three digits
    -      # Match a dash
    \d{4}  # Match four digits
'''

text = 'Phone numbers: 123-4567, 987-6543'

regex = re.compile(pattern, re.VERBOSE)
matches = regex.findall(text)
print(matches)  # Output: ['123-4567', '987-6543']

In this example, the regular expression pattern for matching phone numbers is defined with re.VERBOSE as the second argument to re.compile(). The pattern includes whitespace and inline comments to make it more readable and understandable.

The pattern is defined as a multi-line string, allowing you to structure it across multiple lines. Inline comments using the # character can be added to provide explanations for different parts of the pattern.

# 20. How would you write a regex that match a number with comma for every three digits? It must match the given following:
# '42'
# '1,234'
# '6,368,745'
# but not the following:
# '12,34,567' (which has only two digits between the commas)
# '1234' (which lacks commas)

In [None]:
To write a regex that matches a number with a comma for every three digits, but not when there are only two digits between the commas or no commas at all, you can use the following pattern:
import re

pattern = r'^\d{1,3}(,\d{3})*(?<!,\d{2})$'

Let's break down the pattern:

^ asserts the start of the string.
\d{1,3} matches one to three digits at the beginning.
(,\d{3})* matches zero or more occurrences of a comma followed by exactly three digits.
(?<!,\d{2}) is a negative lookbehind assertion that ensures there is not a comma followed by exactly two digits before the end of the string.
$ asserts the end of the string.

Example:
import re

pattern = r'^\d{1,3}(,\d{3})*(?<!,\d{2})$'
strings = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for string in strings:
    if re.match(pattern, string):
        print(f"'{string}' is a match.")
    else:
        print(f"'{string}' is not a match.")
        
output:
'42' is a match.
'1,234' is a match.
'6,368,745' is a match.
'12,34,567' is not a match.
'1234' is not a match.


 

# 21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:
# 'Haruto Watanabe'
# 'Alice Watanabe'
# 'RoboCop Watanabe'
# but not the following:
# 'haruto Watanabe' (where the first name is not capitalized)
# 'Mr. Watanabe' (where the preceding word has a nonletter character)
# 'Watanabe' (which has no first name)
# 'Haruto watanabe' (where Watanabe is not capitalized)

In [None]:
To write a regex that matches the full name of someone whose last name is "Watanabe" and the first name is one word that begins with a capital letter, you can use the following pattern:

import re

pattern = r'^[A-Z][a-zA-Z]*\sWatanabe$'

import re

pattern = r'^[A-Z][a-zA-Z]*\sWatanabe$'

Example:
import re

pattern = r'^[A-Z][a-zA-Z]*\sWatanabe$'
strings = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe', 'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

for string in strings:
    if re.match(pattern, string):
        print(f"'{string}' is a match.")
    else:
        print(f"'{string}' is not a match.")

Output:
'Haruto Watanabe' is a match.
'Alice Watanabe' is a match.
'RoboCop Watanabe' is a match.
'haruto Watanabe' is not a match.
'Mr. Watanabe' is not a match.
'Watanabe' is not a match.
'Haruto watanabe' is not a match.


# 22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:
# 'Alice eats apples.'
# 'Bob pets cats.'
# 'Carol throws baseballs.'
# 'Alice throws Apples.'
# 'BOB EATS CATS.'
# but not the following:
# 'RoboCop eats apples.'
# 'ALICE THROWS FOOTBALLS.'
# 'Carol eats 7 cats.'

In [None]:
To write a regex that matches a sentence with specific word patterns, you can use the following pattern:
import re

pattern = r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$'

Let's break down the pattern:

^ asserts the start of the string.
(Alice|Bob|Carol) matches either "Alice", "Bob", or "Carol" as the first word.
\s matches a whitespace character.
(eats|pets|throws) matches either "eats", "pets", or "throws" as the second word.
\s matches a whitespace character.
(apples|cats|baseballs) matches either "apples", "cats", or "baseballs" as the third word.
\. matches a period at the end of the sentence.
$ asserts the end of the string.
Here's an example of using this regex pattern to match the given strings:
Let's break down the pattern:

import re

pattern = r'^(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.$'
strings = ['Alice eats apples.', 'Bob pets cats.', 'Carol throws baseballs.', 'Alice throws Apples.', 'BOB EATS CATS.', 'RoboCop eats apples.', 'ALICE THROWS FOOTBALLS.', 'Carol eats 7 cats.']

for string in strings:
    if re.match(pattern, string, re.IGNORECASE):
        print(f"'{string}' is a match.")
    else:
        print(f"'{string}' is not a match.")

output:
'Alice eats apples.' is a match.
'Bob pets cats.' is a match.
'Carol throws baseballs.' is a match.
'Alice throws Apples.' is a match.
'BOB EATS CATS.' is a match.
'RoboCop eats apples.' is not a match.
'ALICE THROWS FOOTBALLS.' is not a match.
'Carol eats 7 cats.' is not a match.
