# 1.

In [1]:
# In visual terms, a greedy match grabs as much as possible, like a monster eating everything in its path. On the other hand,
# a non-greedy match grabs as little as necessary, like a bird carefully picking up only one seed.

# To transform a greedy pattern into a non-greedy one, you add a ? after the quantifier (such as *, +, ?, or {}) in the regular
# expression. This ? modifier makes the quantifier non-greedy.

# 2.

In [2]:
# The distinction between greedy and non-greedy matching becomes crucial when dealing with patterns that can match multiple 
#  substrings within a given input text.

# In scenarios where you specifically require a non-greedy match but only a greedy one is available, you may encounter issues 
# such as capturing more text than intended or not capturing the desired substring at all. This mismatch between the desired 
# match behavior and the available pattern can lead to incorrect results or unexpected behavior in your regular expression matches. 
# It's essential to carefully design your regular expressions to ensure they align with your intended matching logic.

# 3.

In [3]:
# In a simple match where you're only looking for one match and not performing any replacements, the use of a non-capturing group 
# (nontagged group) may not make a practical difference in terms of the final match result. 

# However, there are a few nuances to consider:

# a) Group Capturing: Regular expression engines generally capture the matched substrings within capturing groups. If you use a 
#     capturing group (pattern), the matched substring will be stored in memory and can be accessed later in your code if needed.
#     On the other hand, a non-capturing group (?:pattern) does not capture the matched substring, which can be advantageous if 
#     you don't need to retrieve or use that specific match.

# b) Performance Considerations: In some cases, using non-capturing groups can lead to slightly better performance compared to
#     capturing groups because the engine doesn't need to store the matched substring in memory.

# c) Clarity and Readability: Non-capturing groups can also improve the readability and clarity of your regular expressions, 
#     especially when dealing with complex patterns or when you want to explicitly indicate that a particular group is for 
#     grouping purposes only and not for capturing.

# 4.

In [4]:
# One scenario where using a non-capturing group (nontagged category) can have a significant impact on a program's outcomes is
# when you're working with regular expressions in a performance-critical application or dealing with large volumes of data. 

# Here's an example scenario:

# Imagine you're developing a web server that needs to handle incoming HTTP requests and extract certain information from the 
# request headers, such as user agents. You want to identify requests from specific browsers without capturing the entire user 
# agent string.

# example(code):
import re

user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'

# Using a capturing group to extract the browser name
match = re.search(r'Mozilla.*?(\w+)/\d+\.\d+', user_agent)
if match:
    browser_name = match.group(1)
    print(browser_name)  # Output: Chrome

AppleWebKit


# 5.

In [5]:
# One scenario where the non-consuming nature of a look-ahead condition in regular expressions can make a significant difference
# is when you need to match patterns that are followed by specific conditions without including those conditions in the actual
# match.

# For example, consider a text-processing application that needs to extract email addresses from a given text but only if they 
# are followed by a specific domain name. 

# example(code):
import re

text = "Contact us at email@example.com or support@example.com"

# Using a positive look-ahead to match email addresses followed by "example.com"
matches = re.findall(r'\b\w+@\w+\.\w+(?= example\.com\b)', text)

print(matches)

[]


# 6.

In [6]:
# assertions used to check if a specific pattern exists (positive look-ahead) or does not exist (negative look-ahead) ahead 
# of the current position in the string, without consuming characters.

# The main difference between positive and negative look-ahead is their condition for success:

# a) Positive Look-ahead (?=...):

# 1. Syntax: (?=pattern)
# 2. Condition: The pattern inside the look-ahead must match immediately ahead of the current position for the overall match to 
#     succeed.
# 3. Example: foo(?=bar) matches "foo" only if it is followed by "bar".

# b) Negative Look-ahead (?!...):

# 1. Syntax: (?!pattern)
# 2. Condition: The pattern inside the negative look-ahead must not match immediately ahead of the current position for the overall match to succeed.
# 3. Example: foo(?!bar) matches "foo" only if it is not followed by "bar".

# example(code):
import re

text = "foobar"

# Positive Look-ahead
match_positive = re.search(r'foo(?=bar)', text)
print(bool(match_positive))  # Output: True (matches "foo" followed by "bar")

# Negative Look-ahead
match_negative = re.search(r'foo(?!bar)', text)
print(bool(match_negative))  # Output: False (no match because "foo" is followed by "bar")

True
False


# 7.

In [7]:
# Referring to groups by name rather than by number in a standard regular expression has several benefits:

# a) Readability: Using names makes the regular expression more readable and understandable, especially when dealing with 
#     complex patterns that involve multiple groups. It allows you to give meaningful names to different parts of your pattern,
#     making it easier to maintain and debug.

# b) Clarity: Named groups provide clarity in your code by explicitly stating the purpose of each group. This helps other 
#     developers (and even yourself in the future) understand the intent of the regular expression without having to decipher 
#     group numbers or their meanings.

# c) Flexibility: Group names are more flexible than numbers because they are independent of the order of groups in the regular 
#     expression. If you modify the pattern and change the order of groups, named references will still work correctly, whereas
#     numeric references may become invalid.

# d) Self-Documenting: By using meaningful names for groups, your regular expression becomes self-documenting. It acts as a 
#     form of inline documentation that explains what each part of the pattern represents, enhancing code readability and 
#     maintainability.

# e) Avoiding Errors: Named groups reduce the likelihood of errors caused by mistakenly referencing the wrong group number. 
#     They provide a reliable and consistent way to access specific parts of the matched text without relying on position-based 
#     indices.
    
# example:
import re

text = "John Doe"
pattern_with_name = re.compile(r'(?P<first_name>\w+) (?P<last_name>\w+)')
pattern_with_number = re.compile(r'(\w+) (\w+)')

# Using named groups
match_with_name = pattern_with_name.match(text)
print(match_with_name.group('first_name'))  # Output: John
print(match_with_name.group('last_name'))   # Output: Doe

# Using numbered groups (less readable and prone to errors if order changes)
match_with_number = pattern_with_number.match(text)
print(match_with_number.group(1))  # Output: John (if order is not changed)
print(match_with_number.group(2))  # Output: Doe (if order is not changed)

John
Doe
John
Doe


# 8.

In [8]:
# Yes, you can use named groups in regular expressions to identify repeated items within a target string. However, it's important 
# to note that named groups are typically used to capture and extract specific patterns or substrings from the target string,
# rather than identifying repeated items directly.

# If you want to identify repeated words or phrases within a string, you can use backreferences with named groups.

# example:
import re

text = "The cow jumped over the moon moon"
pattern = re.compile(r'(?P<word>\b\w+\b)\s+(?P=word)')

matches = pattern.findall(text)
print(matches)  # Output: ['moon']

# Alternatively, you can capture the repeated word using a named group directly
pattern_with_capture = re.compile(r'(?P<repeated_word>\b\w+\b)\s+(?P=repeated_word)')
match = pattern_with_capture.search(text)
if match:
    print(match.group('repeated_word'))  # Output: 'moon'  

['moon']
moon


# 9.

In [9]:
# Here are two key things that the Scanner interface (depending on the specific implementation) can do for you while parsing
# a string that re.findall cannot:

# a) Maintaining State: Scanner interfaces often allow you to track the parsing position within the string. This means you can 
#     keep record of where you are in the string as you process it character by character or token by token. re.findall simply
#     searches for all occurrences of the pattern and returns them as a list, without keeping track of the location in the string.

# b) Handling Complex Tokenization: Scanners can be designed to recognize and handle more complex tokenization rules beyond simple 
#     regular expressions. This might involve defining custom token types, handling whitespace or delimiters explicitly, 
#     or performing actions based on the type of token encountered. re.findall is limited to matching patterns based on the
#     provided regular expression.

# 10.

In [10]:
# No, a scanner object does not have to be named "scanner".  In fact, it's generally considered good practice to use more 
# descriptive names that reflect the specific purpose of the scanner.

# Here's why using a more descriptive name is preferred:

# a) Clarity and Readability: Using a name that conveys the scanner's function makes your code easier to understand for yourself 
#     and others. For example, if you're parsing a mathematical expression, a name like math_expression_scanner would be clearer
#     than just scanner.

# b) Context-Specific Naming: If you have multiple scanners in your code for different purposes, using descriptive names helps
#     differentiate between them.
    
# Here are some examples of alternative names for scanner objects, depending on the context:

# File scanner: file_reader, text_parser
# Log scanner: log_processor, log_analyzer
# Network scanner: packet_scanner, connection_scanner
# Code scanner: syntax_analyzer, code_tokenizer