## Python_Advanced_Assignment_17
1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?
2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?
3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?
4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.
5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.
6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?
7. What is the benefit of referring to groups by name rather than by number in a standard expression?
8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?
9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?
10. Does a scanner object have to be named scanner?

In [None]:
'''Ans 1:- Greedy syntax aims for the longest match: .* captures everything between the
first and last occurrences. Non-greedy syntax aims for the shortest match: .*?
captures the content between the closest occurrences. To transform greedy to
non-greedy, just add ?. For instance, in the text "ABCDEF", with pattern A.*F, greedily
matches "ABCDEF," while A.*?F non-greedily matches "ABCF." Adding ? minimizes the
matched content.

Greedy: Grabs the longest match: .*
Non-greedy: Grabs the shortest match: .*?
Change * to *?
'''

In [None]:
'''Ans 2:- Greedy versus non-greedy matters when there are multiple occurrences of the
pattern in the text. Greedy seeks the longest match, often spanning multiple
occurrences. Non-greedy seeks the shortest match, limited to the closest occurrences.  If
only a greedy match is available but we need a non-greedy result, we can
introduce constraints using the surrounding context or use a more specific pattern. This
might involve altering the regular expression or combining it with additional logic
to achieve the desired non-greedy outcome.'''

In [None]:
'''Ans 3:- In a simple string match where only one match is being sought and no
replacement is involved, the use of a non-capturing group (also known as a non-tagged
group) typically doesn't make a practical difference. Non-capturing groups are mainly
used to group expressions together for quantifiers or alternations without
capturing the matched content. Since capturing the matched content isn't a concern in
your scenario, using non-capturing groups or not using them won't have a
significant impact on the outcome of the match.'''

In [1]:
'''Ans 4:- Let's consider a scenario involving regular expressions in a text
processing task. Suppose we parsing log files to extract specific information from
each log entry. Each log entry has a standardized format that includes various
fields such as timestamp, severity level, and message content. we want to extract
the timestamp and message content from each log entry.

However, in this case, the severity level information is not relevant to our
extraction task. we only interested in the timestamp and message content. Using a
non-capturing group for the severity level can improve the efficiency of the regex and the
clarity of our intention.

In this example, the non-capturing group (?:.*?) is used for the severity
level. This tells the regex engine not to create a capture group for the severity
level, which can save some memory and processing time. While the difference in a
single log entry might be negligible, in a scenario with a large number of log
entries, using non-capturing groups for unnecessary captures can lead to improved
performance and resource usage.

\[(.*?)\] \[(.*?)\] Message: (.*)

'''

import re

log_entries = [
    "[2023-08-27 12:34:56] [INFO] Message: This is an informational message",
    "[2023-08-27 13:45:23] [ERROR] Message: An error occurred",
    "[2023-08-27 14:56:42] [WARNING] Message: This is a warning",
]

pattern = r'\[(.*?)\] \[(?:.*?)\] Message: (.*)'

for log_entry in log_entries:
    match = re.match(pattern, log_entry)
    if match:
        timestamp = match.group(1)
        message = match.group(2)
        print(f"Timestamp: {timestamp}, Message: {message}")
    else:
        print("No match found")

Timestamp: 2023-08-27 12:34:56, Message: This is an informational message
Timestamp: 2023-08-27 13:45:23, Message: An error occurred


In [3]:
'''Ans 5:- Consider a scenario where we are processing a list of email addresses, and you
want to extract domain names that are followed by a specific keyword. Let's say
you're looking for domain names that are associated with the keyword "business."
However, you don't want to include the keyword itself in the extracted result.

user1@example.com
user2@business.com
user3@personal.biz
user4@business.org

we can use a positive look-ahead assertion to achieve this. 
@(\w+(?=@business\.))

Explanation of the regex:  @ matches the "@" character. \w+ matches one or
more word characters (letters, digits, or underscores). (?=@business\.) is a
positive look-ahead assertion. It ensures that the matched characters are followed by
@business. without consuming the characters matched by \w+.'''

In [23]:
'''Ans 6:-  The main difference between positive lookahead and negative lookahead is that
positive lookahead matches a pattern if it is followed by the specified text, while
negative lookahead matches a pattern if it is not followed by the specified text.

let's clarify the difference between positive look-ahead and
negative look-ahead in regular expressions:-

Positive Look-Ahead (?=...):-

1. Syntax: (?=...)

2. Description: Asserts that the text following the current position matches the
enclosed pattern.

3. Outcome: The match only occurs if the specified pattern is found after the
current position.

4. Example: foo(?=bar) matches "foo" only if it's followed by "bar".

Negative Look-Ahead (?!...):-

1. Syntax: (?!...) 

2. Description: Asserts that the text following the current
position does not match the enclosed pattern.

3. Outcome: The match only occurs if the specified pattern is not found after the
current position.

4. Example: foo(?!bar) matches "foo" only if it's not followed by "bar".

In both cases, these assertions don't consume any characters from the string;
they only assert whether the pattern following the current position matches or
doesn't match the specified condition.

Here's an example Python code snippet that demonstrates the use of positive
look-ahead and negative look-ahead assertions in regular expressions:'''

import re

def main():
    text = "abc def"

  # Positive lookahead
    pattern = r"(?=abc)"
    if re.search(pattern, text):
        print("The string 'abc' is followed by any character.")

  # Negative lookahead
    pattern = r"(?!abc)"
    if re.search(pattern, text):
        print("The string 'abc' is not followed by any character.")

if __name__ == "__main__":
    main()

The string 'abc' is followed by any character.
The string 'abc' is not followed by any character.


In [19]:
'''Ans 7:- Referring to groups by name instead of by number in a regular expression
provides several benefits. Naming groups makes our code more readable and
maintainable, as it adds context to the captured content. This is especially valuable in
complex patterns. Using names also shields your code from potential changes in the
order of capture groups, which can occur when you modify the regex. It improves code
robustness, as named groups are less prone to breakage during pattern updates. Moreover,
named groups facilitate self-documentation, aiding other developers in understanding
your intentions. For instance, in the pattern (?P<name>\w+)\s+(?P<age>\d+), you can
access matched values as match['name'] and match['age']. Overall, naming groups
enhances the readability, maintainability, and stability of your regex code.'''

import re

pattern = r"(?P<name>\w+)\s+(?P<age>\d+)"
text = "John 30"

match = re.match(pattern, text)
print("Name:", match['name'])
print("Age:", match['age'])

Name: John
Age: 30


In [49]:
'''Ans 8:- Named groups are a special type of capturing group in regular expressions that
allow us to give a name to the group. This can be useful for identifying repeated
items within a target string.

This code uses a regular expression to find words that repeat in a given
string. It creates a dictionary to store unique repeated words and their counts. By
comparing each word's count in a case-insensitive manner, it identifies repeated words
and their occurrence frequency within the string. The output displays the repeated
words and their counts.'''

import re

target_string = "The cow jumped over the moon"
pattern = r'\b(?P<word>\w+)\b(?=.*\b\1\b)'

matches = re.finditer(pattern, target_string, re.IGNORECASE)
repeated_items = {match.group('word').lower() for match in matches}

repeated_counts = {word: target_string.lower().count(word) for word in repeated_items}

print("Repeated words:", repeated_items)
print("Counts:", repeated_counts)

Repeated words: {'the'}
Counts: {'the': 2}


In [50]:
'''Ans 9:- Certainly! Here's a Python code snippet that demonstrates the difference
between using the Scanner equivalent (re.Scanner) and re.findall() for parsing and
extracting specific patterns from a string:-

In this example, the Scanner equivalent re.Scanner is used to tokenize the
input string into different patterns like integers, floats, and words. The
re.findall() function is used to extract the same patterns. The key difference lies in the
ability of Scanner to handle the processing of tokens and potentially complex actions
while scanning, which isn't directly achievable with re.findall().'''

import re

# Using re.Scanner equivalent
scanner_pattern = r'\d+\.\d+|\d+|[a-zA-Z]+'
scanner = re.Scanner([(scanner_pattern, lambda s, t: t)])

input_string = "42 apples, 3.14 oranges, and 7 bananas"
tokens, remainder = scanner.scan(input_string)
print("Scanner tokens:", tokens)

# Using re.findall()
findall_pattern = r'\d+\.\d+|\d+|[a-zA-Z]+'
findall_tokens = re.findall(findall_pattern, input_string)
print("findall tokens:", findall_tokens)

Scanner tokens: ['42']
findall tokens: ['42', 'apples', '3.14', 'oranges', 'and', '7', 'bananas']


In [None]:
'''Ans 10:- No, a scanner object doesn't have to be named "scanner." we can choose any
valid variable name for the object. Naming it appropriately helps improve code
readability and understanding, making our code more maintainable. The variable name should
reflect the purpose of the object, aiding in identifying its role within the code.'''