# Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?

In regular expressions, greedy syntax matches the longest possible string, while non-greedy syntax matches the shortest possible string. Greedy syntax is represented by adding a `+` or `*` after a character, while non-greedy syntax is represented by adding `?` after a character. 

To transform a greedy pattern into a non-greedy one, you only need to add a `?` after the greedy quantifier (`+` or `*`). This will change the quantifier from greedy to non-greedy.

For example, the greedy pattern `.*` would match the entire string, while the non-greedy pattern `.*?` would match the shortest possible string.

# Q2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?

The difference between greedy and non-greedy matching occurs when the regular expression pattern contains a quantifier, such as * or +, that allows for multiple matches of the preceding expression. In a greedy match, the pattern matches as much of the input string as possible, whereas in a non-greedy match, the pattern matches as little as possible while still allowing the overall pattern to match.

If you're looking for a non-greedy match but the only one available is greedy, you can use a workaround by including a negative character set in the pattern that excludes the characters that would cause the greedy match to overshoot the intended match. For example, if you wanted to match the text between the first and second occurrences of a string "foo" in the input string "xfoofoofoo", but the only match available was the greedy match "xfoofoofoo", you could use the non-greedy pattern "x(.*?)foo", where the .*? matches any character except for a newline character, but the following "foo" is required to complete the match.

# Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?

In a simple match of a string that looks only for one match and does not do any replacement, the use of a nontagged group is not likely to make any practical difference. Nontagged groups are used for capturing portions of a pattern, which can then be used in replacement strings or for further processing. If the match is not being used for any further processing, the use of a nontagged group will not have any effect. However, it may still be useful for readability and organization purposes.

# Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.

Let's consider a scenario where we want to extract all the URLs from a large text file. We can use a regular expression pattern with a tagged group to extract the URLs, but we can also use a non-tagged category for the domain name. For example, we can use the pattern `https?://(?:www\.)?([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)/` to extract the URLs and capture only the domain name without the protocol and any subdomains. The non-tagged category `(?:)` is used to group the `www.` part of the domain name and to avoid capturing it. By using a non-tagged category, we can simplify the regular expression pattern and make it more efficient. This can be especially useful when dealing with large amounts of data, where even a small improvement in efficiency can make a significant impact on the program's outcomes.

# Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.

A look-ahead condition in a regular expression allows you to match a pattern only if it is followed by another pattern, without actually including the following pattern in the match. This can be useful in situations where you need to search for a specific pattern that is only valid if it is followed by another pattern.

For example, suppose you have a text file containing email addresses, and you want to extract only those addresses that are followed by the word "example" in the same line. You could use a look-ahead condition to match the email address only if it is followed by the word "example":

```
import re

text = "jane@example.com, john@example.org, alice@foo.com, bob@example.com"

pattern = r"\w+@\w+\.\w+(?=example)"

matches = re.findall(pattern, text)

print(matches)
```

In this case, the regular expression `r"\w+@\w+\.\w+(?=example)"` matches any email address that is followed by the string "example". The look-ahead condition `(?=example)` checks for the presence of the word "example" after the email address, but it does not include it in the match.

If you were to use a normal regex pattern that includes the word "example" in the match, it would consume those characters and not match email addresses that do not have the word "example" immediately following them.

# Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?

In regular expressions, a look-ahead is a type of zero-length assertion. It does not match any characters, but it asserts whether a match is possible or not based on the presence or absence of a pattern. 

Positive look-ahead is denoted by `(?=pattern)` and it asserts that the pattern following it must exist for the match to occur. It returns a match if the pattern exists ahead of the current match position. For example, the regular expression `foo(?=bar)` would match "foo" only if it is followed by "bar".

Negative look-ahead is denoted by `(?!pattern)` and it asserts that the pattern following it must not exist for the match to occur. It returns a match if the pattern does not exist ahead of the current match position. For example, the regular expression `foo(?!bar)` would match "foo" only if it is not followed by "bar". 

In summary, positive look-ahead ensures that a pattern exists ahead of the current match position, while negative look-ahead ensures that a pattern does not exist ahead of the current match position.

# Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?

Referring to groups by name rather than by number in a standard expression has several benefits:

1. Readability: Using names makes the expression more readable, especially when dealing with complex expressions with many groups.

2. Maintainability: When the expression changes, it is easier to update the code that refers to named groups than to update the code that relies on group numbers.

3. Reusability: Named groups can be reused in the same expression or in other expressions.

4. Self-documentation: Named groups can help document the purpose of the group and the intended use of the captured text.

5. Flexibility: The names can be any valid Python identifier, so they can be used to provide more meaningful names than the numbered groups.

# Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?

Yes, you can identify repeated items within a target string using named groups in regular expressions. For example, to identify repeated words in the string "The cow jumped over the moon", you could use a named group like this:

```
import re

string = "The cow jumped over the moon"
pattern = r'\b(?P<word>\w+)\b\s+(?P=word)\b'
matches = re.findall(pattern, string)

print(matches)  # Output: [('jumped', 'over')]
```

In this example, the regular expression `r'\b(?P<word>\w+)\b\s+(?P=word)\b'` uses a named group called `word` to match any word characters (`\w+`) that occur at least once (`\b` denotes word boundaries). It then matches one or more whitespace characters (`\s+`), followed by the same word captured by the named group `word` (`(?P=word)`). The final `\b` denotes another word boundary.

When the regular expression is applied to the string "The cow jumped over the moon" using the `re.findall()` function, it returns a list of tuples containing the repeated words, which in this case is `[('jumped', 'over')]`.

# Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?

The `Scanner` interface in Python's `re` module provides a way to tokenize a string into individual tokens based on a pattern. It is useful when parsing a string that contains different types of tokens such as keywords, identifiers, operators, etc. One thing that `Scanner` does for you that `re.findall` does not is that it allows you to associate a regular expression pattern with a specific action to take on the token that matches the pattern. This makes it easier to process each token as you encounter it, rather than having to process the entire string and then filter out the tokens of interest.

# Q10. Does a scanner object have to be named scanner?

No, a scanner object does not have to be named "scanner." You can name it anything you want as long as it follows the rules for valid variable names in Python.