# 1.

In Python, the feature responsible for generating regular expression (regex) objects is the re module, which provides support for working with regular expressions.

The re.compile() function is specifically used to compile a regular expression pattern into a regex object, which can then be used for various operations such as matching, searching, and replacing text based on the pattern specified.

In [1]:
import re

pattern = re.compile(r'\b\w+\b')  # Compile a regex pattern to match words


# 2.

Raw strings are commonly used in regular expression (regex) objects in Python because they allow you to specify regular expression patterns without the need to escape backslashes (\).

In Python, backslashes are used as escape characters in string literals. However, in regular expressions, backslashes are also used as escape characters to represent special characters such as \d (digit), \w (word character), \s (whitespace), etc.

When you use a raw string (prefixing the string literal with r), Python treats backslashes as literal characters and does not interpret them as escape characters. This is particularly useful in regular expressions, where backslashes are frequently used to represent special characters.

In [2]:
import re

pattern = re.compile(r'\b\w+\b')  # Using a raw string to specify the regex pattern


# 3.

The search() method in Python's re module returns a match object if the pattern is found in the string being searched. If the pattern is not found, it returns None.

pattern: The regular expression pattern to search for.

string: The string in which to search for the pattern.

In [4]:
import re

pattern = r'fox'
string = 'The quick brown fox jumps over the lazy dog.'

match_object = re.search(pattern, string)

if match_object:
    print('Pattern found:', match_object.group())  
else:
    print('Pattern not found')


Pattern found: fox


# 4.

In [5]:
import re

pattern = r'fox'
string = 'The quick brown fox jumps over the lazy dog.'

match_object = re.search(pattern, string)

if match_object:
    print('Pattern found:', match_object.group())  
else:
    print('Pattern not found')


Pattern found: fox


If you have capturing groups in your regular expression pattern, you can also use group(n) to access the matched substring of a specific capturing group, where n is the index of the capturing group.

In [6]:
import re

pattern = r'(\w+)\s(\w+)'
string = 'John Doe'

match_object = re.search(pattern, string)

if match_object:
    print('First name:', match_object.group(1))  
    print('Last name:', match_object.group(2))   
else:
    print('Pattern not found')


First name: John
Last name: Doe


# 5.

In the regular expression r'(\d\d\d)-(\d\d\d-\d\d\d\d)', which contains two capturing groups denoted by parentheses, the groups are numbered based on the order of their opening parentheses from left to right.

In [7]:
import re

pattern = r'(\d\d\d)-(\d\d\d-\d\d\d\d)'
string = '123-456-7890'

match_object = re.search(pattern, string)

if match_object:
    print('Group 0 (entire match):', match_object.group(0))  
    print('Group 1:', match_object.group(1))                
    print('Group 2:', match_object.group(2))                
    
else:
    print('Pattern not found')


Group 0 (entire match): 123-456-7890
Group 1: 123
Group 2: 456-7890


# 6.

In regular expressions, parentheses and periods have special meanings. If you want to match literal parentheses and periods in a regex pattern, you need to escape them using a backslash (\). This tells the regex engine to treat them as literal characters rather than as part of the regex syntax.

To match a literal parentheses, use \( to match an opening parenthesis and \) to match a closing parenthesis.

To match a literal period, use \..

In [8]:
import re

pattern = r'\(123\)\.456'
string = '(123).456'

match_object = re.search(pattern, string)

if match_object:
    print('Match found:', match_object.group())  
else:
    print('Match not found')


Match found: (123).456


# 7.

The findall() method in Python's re module returns a list of strings when the regular expression pattern being searched for contains no capturing groups. It returns a list of tuples of strings when the pattern contains one or more capturing groups.

No Capturing Groups:

If the regular expression pattern passed to findall() does not contain any capturing groups (i.e., no parentheses), the method returns a list of strings.
Each string in the list represents a non-overlapping match of the pattern within the searched string.

In [11]:
import re

pattern = r'\d+'  # Matches one or more digits
string = '123 456 789'

matches = re.findall(pattern, string)
print(matches)  


['123', '456', '789']


With Capturing Groups:

If the regular expression pattern contains one or more capturing groups (i.e., one or more pairs of parentheses), the findall() method returns a list of tuples of strings.
Each tuple in the list represents a match of the entire pattern, and the strings within the tuple correspond to the substrings matched by the capturing groups.

In [13]:
import re

pattern = r'(\d+)-(\d+)'  # Matches two sequences of digits separated by a hyphen
string = '123-456 789-012'

matches = re.findall(pattern, string)
print(matches)  


[('123', '456'), ('789', '012')]


# 8.                                                

In regular expressions, the | character, known as the "pipe" or "alternation" operator, is used to specify alternatives within a pattern. It allows you to match one of several possible patterns at a particular location in the input string.

# 9.

Regular expressions use various special characters, each with its own meaning and functionality.

. (dot): Matches any single character except newline characters.
^ (caret): Matches the start of the string.
$ (dollar): Matches the end of the string.
* (asterisk): Matches zero or more occurrences of the preceding character.
+ (plus): Matches one or more occurrences of the preceding character.
? (question mark): Matches zero or one occurrence of the preceding character (makes it optional).
| (pipe): Alternation operator, matches either the pattern on its left or the pattern on its right.
[ ] (square brackets): Character class, matches any one of the characters within the brackets.
{ } (curly braces): Quantifiers, specify the number of occurrences of the preceding character or group.

# 10.

+ (plus):

Matches one or more occurrences of the preceding character or group.
The preceding character or group must appear at least once for the match to be successful.
Example: a+ matches one or more occurrences of the character 'a'. It would match 'a', 'aa', 'aaa', and so on, but not an empty string.

* (asterisk):

Matches zero or more occurrences of the preceding character or group.
The preceding character or group may appear zero or more times for the match to be successful.
Example: a* matches zero or more occurrences of the character 'a'. It would match an empty string, 'a', 'aa', 'aaa', and so on.

# 11.

In regular expressions, {4} and {4,5} are both quantifiers that specify the exact number of occurrences of the preceding character or group. However, they have different meanings:

{4}: Matches exactly 4 occurrences of the preceding character or group.

{4,5}: Matches between 4 and 5 occurrences (inclusive) of the preceding character or group.

# 12.

In regular expressions, shorthand character classes such as \d, \w, and \s are used to represent certain types of characters. They are shortcuts that match specific character sets, making it easier to specify patterns in regular expressions.


In regular expressions, shorthand character classes such as \d, \w, and \s are used to represent certain types of characters. They are shortcuts that match specific character sets, making it easier to specify patterns in regular expressions.

Here's what each of these shorthand character classes signifies:

\d: Matches any digit character. Equivalent to the character range [0-9].

\w: Matches any alphanumeric character (i.e., letters, digits, or underscores). Equivalent to the character range [a-zA-Z0-9_].

\s: Matches any whitespace character (i.e., space, tab, newline, etc.).

# 13.

n regular expressions, the \D, \W, and \S shorthand character classes are used to represent negated versions of the \d, \w, and \s character classes, respectively. They match any character that is not in the specified character class.

Here's what each of these shorthand character classes signifies:

\D: Matches any character that is not a digit. It is the negation of the \d shorthand.
    
\W: Matches any character that is not a word character (i.e., not alphanumeric or underscore). It is the negation of the \w shorthand.
    
\S: Matches any character that is not a whitespace character. It is the negation of the \s shorthand.

# 14.

In regular expressions, .*? and .*? are actually identical. Both patterns use the combination of .* and ? to create a non-greedy match for any character (except newline characters).

Here's a breakdown of each part:

.*: Matches zero or more occurrences of any character (except newline characters).
    
?: Modifies the behavior of .* to make it non-greedy, meaning it matches as few characters as possible while still allowing the overall pattern to match.

# 15.

To match both numbers and lowercase letters with a character class in a regular expression, you can use the character class [0-9a-z]. This character class matches any single character that is a digit (0-9) or a lowercase letter (a-z)

Combining them within a single character class [0-9a-z] creates a character class that matches any single character that is either a digit or a lowercase letter.

In [14]:
import re

pattern = r'[0-9a-z]'
string = 'Hello123World'

matches = re.findall(pattern, string)
print(matches)  


['e', 'l', 'l', 'o', '1', '2', '3', 'o', 'r', 'l', 'd']


# 16.

import the re module.

Compile the regular expression pattern using re.compile() and pass re.IGNORECASE as the second argument.

Use the resulting compiled pattern object or pass the re.IGNORECASE flag to the desired regex function.

In [15]:
import re

pattern = re.compile(r'hello', re.IGNORECASE)  # Compile the pattern with the IGNORECASE flag
string = 'Hello World'

match = pattern.search(string)  # Search for the pattern in the string (case insensitive)

if match:
    print('Pattern found:', match.group())  
else:
    print('Pattern not found')


Pattern found: Hello


# 17.

The . (dot) character normally matches any single character except newline characters (\n).

If re.DOTALL (or re.S) is passed as the second argument in re.compile(), the . (dot) character matches any character including newline characters (\n).

In [18]:
 # Normal Behavior (without re.DOTALL):
    
import re

pattern = re.compile(r'.')
string = 'abc\ndef'

matches = pattern.findall(string)
print(matches)  


['a', 'b', 'c', 'd', 'e', 'f']


In [19]:
# With re.DOTALL (or re.S):

import re

pattern = re.compile(r'.', re.DOTALL)
string = 'abc\ndef'

matches = pattern.findall(string)
print(matches) 


['a', 'b', 'c', '\n', 'd', 'e', 'f']


# 18.

The numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') statement will replace all sequences of digits with the letter 'X' in the input string.

re.compile(r'\d+'): This compiles a regular expression pattern \d+, which matches one or more digits.
sub('X', '11 drummers, 10 pipers, five rings, 4 hen'): This method replaces all matches of the pattern \d+ in the input string with the replacement string 'X'.
So, in the input string '11 drummers, 10 pipers, five rings, 4 hen', the following replacements will occur:

'11' will be replaced with 'X'

'10' will be replaced with 'X'

'4' will be replaced with 'X'

Therefore, the result of numRegex.sub('X', '11 drummers, 10 pipers, five rings, 4 hen') will be:

# 19.

Passing re.VERBOSE as the second argument to re.compile() in Python allows you to write more readable and organized regular expressions by ignoring whitespace and adding comments within the pattern string. This flag enables "verbose mode" for the regular expression, which provides several benefits:

Ignore Whitespace: Whitespace (spaces, tabs, and newline characters) in the pattern string is ignored, except when it is escaped or within a character class. This allows you to format the regular expression pattern more clearly, making it easier to read and maintain.

Add Comments: You can add comments to the pattern string using the # character. Comments can be used to explain parts of the regular expression pattern, making it easier for others (and yourself) to understand its purpose and logic.

Multiline Patterns: You can spread the regular expression pattern across multiple lines, making it easier to organize and visualize complex patterns.

# 20.

In [20]:
import re

pattern = re.compile(r'^\d{1,3}(,\d{3})*$')

# Test cases
strings = ['42', '1,234', '6,368,745', '12,34,567', '1234']

for string in strings:
    if pattern.match(string):
        print(f"Match: {string}")
    else:
        print(f"No match: {string}")


Match: 42
Match: 1,234
Match: 6,368,745
No match: 12,34,567
No match: 1234


# 21.

In [21]:
import re

pattern = re.compile(r'^[A-Z][a-zA-Z]*\sWatanabe$')

# Test cases
names = ['Haruto Watanabe', 'Alice Watanabe', 'RoboCop Watanabe',
         'haruto Watanabe', 'Mr. Watanabe', 'Watanabe', 'Haruto watanabe']

for name in names:
    if pattern.match(name):
        print(f"Match: {name}")
    else:
        print(f"No match: {name}")


Match: Haruto Watanabe
Match: Alice Watanabe
Match: RoboCop Watanabe
No match: haruto Watanabe
No match: Mr. Watanabe
No match: Watanabe
No match: Haruto watanabe


# 22.

In [22]:
import re

pattern = re.compile(r'^(Alice|Bob|Carol)\s+(eats|pets|throws)\s+(apples|cats|baseballs)\.$', re.IGNORECASE)

# Test cases
sentences = [
    'Alice eats apples.',
    'Bob pets cats.',
    'Carol throws baseballs.',
    'Alice throws Apples.',
    'BOB EATS CATS.',
    'RoboCop eats apples.',
    'ALICE THROWS FOOTBALLS.',
    'Carol eats 7 cats.'
]

for sentence in sentences:
    if pattern.match(sentence):
        print(f"Match: {sentence}")
    else:
        print(f"No match: {sentence}")


Match: Alice eats apples.
Match: Bob pets cats.
Match: Carol throws baseballs.
Match: Alice throws Apples.
Match: BOB EATS CATS.
No match: RoboCop eats apples.
No match: ALICE THROWS FOOTBALLS.
No match: Carol eats 7 cats.
