**Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?**

**Ans:** `Greedy` syntax matches the **longest possible sequence** that satisfies the pattern, while `non-greedy` syntax matches the **shortest possible sequence** that satisfies the pattern.

To transform a greedy pattern into a non-greedy one, you can add a **question mark** `(?)` after the **quantifier** `(*, +, or ?)`. This makes the pattern non-greedy.

For example, the greedy pattern `"."` matches the entire string, while the non-greedy pattern `".?"` matches the shortest possible sequence of characters.

Check out the code sample for further understanding:

In [1]:
import re

# Example string
string = "hello world"

# Greedy pattern
greedy_pattern = re.compile("h.*o")
greedy_match = greedy_pattern.search(string)
print(greedy_match.group())

# Non-greedy pattern
non_greedy_pattern = re.compile("h.*?o")
non_greedy_match = non_greedy_pattern.search(string)
print(non_greedy_match.group())

hello wo
hello


**Q2. When exactly does greedy versus non-greedy make a difference?  What if you're looking for a non-greedy match but the only one available is greedy?**

**ans:** Greedy versus non-greedy syntax makes a difference when a regular expression pattern has multiple possible matches in a string. In such cases, the greedy syntax matches the longest possible substring that satisfies the pattern, while the non-greedy syntax matches the shortest possible substring that satisfies the pattern.

If you're looking for a non-greedy match but the only one available is greedy, you can modify the pattern to make it non-greedy by adding a question mark after the quantifier. However, this may not always be possible or desirable depending on the specific requirements of the pattern. In some cases, it may be necessary to use additional regular expressions or other methods to extract the desired substring.

In [9]:
import re
string = "abc<123>def<456>ghi"

#Extracting the substrings that are enclosed in angle brackets

##Greedy pattern
greedy_pattern = re.compile("<.*>")
greedy_match = greedy_pattern.search(string)
print(greedy_match.group())  # Output: "<123>def<456>"

##Non-greedy pattern
non_greedy_pattern = re.compile("<.*?>")
non_greedy_match = non_greedy_pattern.search(string)
print(non_greedy_match.group())  # Output: "<123>"
#we have to call the search method again, in order to find the next match.
non_greedy_match = non_greedy_pattern.search(string, non_greedy_match.end())
print(non_greedy_match.group())  # Output: "<456>"

<123>def<456>
<123>
<456>


**Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a nontagged group likely to make any practical difference?**

**Ans:** In a simple match of a string that looks for only one match and does not do any replacement, the use of a non-tagged group is not likely to make any practical difference. Non-tagged groups, also known as non-capturing groups, are used to group parts of a regular expression pattern together without capturing them as a separate group.

Non-tagged groups can be useful in more complex regular expression patterns where you want to group parts of the pattern together for repetition, alternation, or other purposes, but you don't need to capture the group as a separate match. However, in a simple match of a string that only looks for one match and does not do any replacement, the use of a non-tagged group would not provide any significant advantage over simply using the regular expression pattern without the group.

For example, consider the following code that matches a string "hello" followed by either "world" or "python" using a non-tagged group:

In [10]:
import re

string = "hello python"
pattern = re.compile("hello (?:world|python)")
match = pattern.search(string)
print(match.group())  # Output: "hello python"

hello python


In this case, the non-tagged group `(?:world|python)` is used to group the options "world" and "python" together for alternation, but the group is not captured as a separate match. However, since we are only looking for one match and not doing any replacement, we could achieve the same result without using the non-tagged group:

In [11]:
import re

string = "hello python"
pattern = re.compile("hello (world|python)")
match = pattern.search(string)
print(match.group())  # Output: "hello python"

hello python


Here, we use a tagged group `(world|python)` to group the options together and capture them as a separate match, which doesn't make a practical difference since we are only looking for one match and not doing any replacement.

**Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.**

**Ans:** Non-tagged categories can be useful in a variety of scenarios where you want to match a specific set of characters that are not part of a predefined category. For example matching phone numbers with hyphens or matching strings with special characters. 

In [18]:
import re
phone_number = "123-456-7890"
pattern = re.compile(r"[-]") # matches hyphens
cleaned_number = pattern.sub("", phone_number)
print(cleaned_number)


1234567890


**Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme.**

**Ans:** Look ahead conditions do not consume the character they examine, this can make a difference in the results in the situations like matching overlapping patterns or validating inputs such as passwords.

Here is an example illustrating validation of password:

In [19]:
import re

password = "Abcd1234"
# validates password criteria
pattern = re.compile(r"(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}")
is_valid = pattern.search(password) is not None
print(is_valid)

True


In this example, the regular expression pattern uses three positive look-ahead conditions `(?=.*[A-Z])`, `(?=.*[a-z])`, and `(?=.*\d)` to match the criteria for a valid password, without consuming the password itself. The pattern also matches any string that is at least 8 characters long, using the non-consuming .`{8,}` pattern. This allows you to validate the password without actually consuming it, so you can use the same password in subsequent steps of the program.

**Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?**

**Ans:** A positive look-ahead `(?=...)` condition **matches the search pattern only if it is followed by the specified pattern**. A negative look-ahead `(?!...)` condition, on the other hand, **matches the search pattern only if it is not followed by the specified pattern**.

See the below example:

In [22]:
import re
text = "Hello World"

# using positive look-ahead
pattern1 = re.compile(r"Hello(?= World)") # match "Hello" only if it is followed by " World"
match1 = pattern1.search(text)
if match1:
    print("Positive look-ahead: Match found:", match1.group(0))
else:
    print("Positive look-ahead: No match found")

# using negative look-ahead
pattern2 = re.compile(r"Hello(?! Universe)") # match "Hello" only if it is not followed by " Universe"
match2 = pattern2.search(text)
if match2:
    print("Negative look-ahead: Match found:", match2.group(0))
else:
    print("Negative look-ahead: No match found")

Positive look-ahead: Match found: Hello
Negative look-ahead: Match found: Hello


**Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?**

**Ans:** Referring to groups by name rather than by number in a regular expression has several benefits:

1. Readability: Naming groups can make the regular expression more readable and easier to understand, especially for complex patterns with multiple groups.
2. Self-documenting: By naming groups, the regular expression becomes self-documenting, making it easier for others (or your future self) to understand the intent of the pattern.
3. Flexibility: If the order of the groups changes, named groups can still be referenced by name, whereas using group numbers would require updating the regular expression.
4. Code reuse: Named groups can be referenced by name in the replacement string when doing string substitutions, making it easier to reuse the same regular expression in different contexts.

**Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?**

**Ans:** Yes, we can use named groups to identify repeated items within a target string.

In [31]:
import re
text = "The cow jumped over the moon"
pattern=re.compile(r'(?P<w1>The)',re.I)
pattern.findall(text)

['The', 'the']

**Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?**

**Ans:** `re.findall()` module is used to search for all occurrences that match a given pattern. In contrast, `re.search()` will only return the first occurrence that matches the specified pattern. `re.findall()` will iterate over all the lines of the file and will return all non-overlapping matches of pattern in a single step.

**Q10. Does a scanner object have to be named scanner?**

No