Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words
as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy
one? What characters or characters can you introduce or change?



ANS--->
        In regular expressions, greedy syntax tries to match as much text as possible while non-greedy syntax tries to match as little text as possible. To transform a greedy pattern into a non-greedy one, you can introduce a question mark (?) after the quantifier character. This will make the quantifier non-greedy. For example, if you want to transform a greedy pattern of "." to non-greedy, you can change it to ".?" which will now match the minimum possible text.

Q2. When exactly does greedy versus non-greedy make a difference?  What if you&#39;re looking for a
non-greedy match but the only one available is greedy?
















ANS--->
        Greedy and non-greedy syntax in regular expressions make a difference when there is more than one possible match for a pattern in a string.

Greedy syntax tries to match the longest possible substring that matches the pattern. In contrast, non-greedy (or lazy) syntax matches the shortest possible substring that matches the pattern.

For example, given the string "Hello world, how are you?" and the pattern "H.*o", the greedy match would return the substring "Hello world, ho" while the non-greedy match would return the substring "He".

If the only available match is greedy and you need a non-greedy match, you can transform the greedy pattern into a non-greedy one by appending a question mark "?" to the repetition operator. For example, the greedy pattern "H.o" can be transformed into the non-greedy pattern "H.?o".

Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is
the use of a nontagged group likely to make any practical difference?


ANS--->In a simple match of a string, which looks only for one match and does not do any replacement, the use of a non-tagged group may not make any practical difference in terms of the final output. However, non-tagged groups can be useful for more complex regular expressions that use multiple groups and where it is important to avoid capturing unwanted matches. Non-tagged groups can also help to improve the performance of the regular expression by reducing the number of captured groups.

Q4. Describe a scenario in which using a nontagged category would have a significant impact on the
program&#39;s outcomes.

ANS--->
        A nontagged category, also known as a negative lookahead or lookbehind, can have a significant impact on the program's outcomes when searching for patterns that meet certain criteria but do not match specific characters or strings. For example, suppose you want to search for all occurrences of the word "cat" in a document but exclude instances where the word is followed immediately by the word "dog". In this case, you can use a negative lookahead to search for "cat" only if it is not followed by "dog". The use of a nontagged category in this scenario would be crucial in order to obtain accurate search results.

Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it
examines. Describe a situation in which this could make a difference in the results of your
programme.

A look-ahead assertion in regular expressions is a way of checking if a pattern exists ahead of the current match without including it in the match. This means that the pattern will be checked, but not included in the final match. This feature can be useful in various scenarios where we want to match a pattern only if it's followed by a specific pattern, without including the latter in the match. For example, suppose we have a list of email addresses, and we want to match all email addresses that end with ".com", but we don't want to include ".com" in the final match. In this scenario, a look-ahead assertion can be used as follows:

In [63]:
import re

emails = ['john@example.com', 'jane@example.net', 'mike@example.com']

# Using a positive look-ahead assertion to match email addresses that end with ".com"
pattern = r'\b\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}(?=\.com)\b'

for email in emails:
    match = re.search(pattern, email)
    if match:
        print(match.group())


Q6. In standard expressions, what is the difference between positive look-ahead and negative look-
ahead?

ANS--->In regular expressions, a positive lookahead is a construct that matches a group of characters only if they are followed by another specific group of characters. Positive lookahead is denoted by (?=...), where ... is the lookahead pattern.

On the other hand, a negative lookahead is a construct that matches a group of characters only if they are not followed by another specific group of characters. Negative lookahead is denoted by (?!...), where ... is the lookahead pattern.

For example, let's consider the string "apple pie" and the following regular expressions:

apple(?=\s) matches "apple" only if it is followed by a whitespace character.
pie(?!crust) matches "pie" only if it is not followed by the string "crust".
In the first example, the positive lookahead ensures that "apple" is only matched if it is followed by a space character, allowing us to match "apple" in "apple pie" without also matching "apple" in "applepie". In the second example, the negative lookahead ensures that "pie" is only matched if it is not followed by the string "crust", allowing us to match "pie" in "apple pie" without also matching "pie" in "apple pie crust"

Q7. What is the benefit of referring to groups by name rather than by number in a standard
expression?

ANS--->Referring to groups by name rather than by number in a standard expression provides better readability, maintainability, and ease of understanding to the regular expression. It helps to make the code self-documenting and easier to modify. Using named groups also allows accessing specific groups by their name instead of their index, which can be more intuitive and less error-prone. Additionally, named groups can be referred to multiple times within the same expression and can be reused in other parts of the program.

Q8. Can you identify repeated items within a target string using named groups, as in &quot;The cow
jumped over the moon&quot;?

ANS--->Yes, we can identify repeated items within a target string using named groups in regular expressions. We can use the syntax (?P<name>...) to define a named group where name is the name of the group, and ... is the regular expression pattern for the group. We can then refer to the named group later in the pattern using \g<name> or \g<0> (for the entire match).

For example, let's say we want to match any word that appears twice in a row in the string "The cat cat jumped over the dog dog". We can use a named group to match the repeated word as follows:

In [64]:
import re

pattern = r'\b(?P<word>\w+)\s+(?P=word)\b'
text = 'The cat cat jumped over the dog dog'
matches = re.findall(pattern, text)

print(matches)  # Output: ['cat', 'dog']


['cat', 'dog']


Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the
re.findall feature does not?

ANS--->In Python, the Scanner interface is not a built-in feature. It is available in Java and is used to parse input text into tokens. In Python, the closest equivalent would be the re.Scanner class.

However, to answer the question, one thing that Scanner interface does for you that re.findall does not is that it allows you to specify patterns for different tokens and perform custom actions for each token. This can be useful when you need to parse complex input text and perform different actions depending on the token type. On the other hand, re.findall only returns a list of all non-overlapping matches of a pattern in a string, without any additional parsing or custom actions.

Q10. Does a scanner object have to be named scanner?

ANS--->No, a scanner object can be named anything as long as it follows the rules for valid identifier names in Python.