Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words
as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy
one? What characters or characters can you introduce or change?

**Greedy vs. Non-Greedy Matching**:

- **Greedy**: Greedy matching tries to match as much text as possible while still allowing the overall pattern to match. It grabs the longest possible substring that fits the pattern.

   Visual Representation: `.*` (Matches the longest possible substring)
  
- **Non-Greedy (Lazy)**: Non-greedy matching, also known as lazy or minimal matching, tries to match as little text as possible while still allowing the pattern to match. It grabs the shortest possible substring that fits the pattern.

   Visual Representation: `.*?` (Matches the shortest possible substring)

**Transforming Greedy to Non-Greedy**:

To transform a greedy pattern into a non-greedy one, you can introduce a `?` character immediately after a quantifier such as `*`, `+`, `?`, or `{}` in the pattern. For example:

- Greedy: `.*` (Matches the longest substring)
- Non-Greedy: `.*?` (Matches the shortest substring)

By adding `?` after the quantifier, you change the behavior from greedy to non-greedy, making it match the shortest possible substring instead of the longest.

Q2. When exactly does greedy versus non-greedy make a difference?  What if you&#39;re looking for a
non-greedy match but the only one available is greedy?

**Greedy vs. Non-Greedy Matching**:

Greedy versus non-greedy matching makes a difference when you are working with regular expressions that contain quantifiers (e.g., `*`, `+`, `?`, `{}`) and when you want to match a specific part of a string:

- **Greedy Matching**: Greedy matching tries to match as much text as possible while still allowing the overall pattern to match. It grabs the longest possible substring that fits the pattern.

- **Non-Greedy (Lazy) Matching**: Non-greedy matching, on the other hand, tries to match as little text as possible while still allowing the pattern to match. It grabs the shortest possible substring that fits the pattern.

**When It Makes a Difference**:

Greedy versus non-greedy matching makes a significant difference when your input text contains multiple occurrences of the pattern you are trying to match, and you want to capture specific occurrences. In such cases:

- Greedy Matching: It captures the longest substring that fits the pattern. This can result in capturing more than you intended, especially when there are multiple occurrences of the pattern.

- Non-Greedy Matching: It captures the shortest substring that fits the pattern. This is useful when you want to capture individual occurrences of the pattern without including extra text.

**Example**:

Consider the input string: `"The cat and the dog are friends. The cat is black."`

If you want to capture the text between "The" and "cat," you can use a regular expression:

- Greedy: `The.*cat` (Matches the longest substring)
   - Result: "The cat and the dog are friends. The cat"

- Non-Greedy: `The.*?cat` (Matches the shortest substring)
   - Result: "The cat"

**If Only Greedy Matches Are Available**:

If the only available matches are greedy and you need a non-greedy match, you may need to adapt your approach. This can involve modifying the regular expression pattern or using post-processing to extract the desired information from the greedy match. However, it's generally more efficient and precise to use the non-greedy approach when it's applicable to your use case.

Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is
the use of a nontagged group likely to make any practical difference?

In a simple match of a string where you are looking for one match and not performing any replacement, the use of a non-tagged group is unlikely to make a practical difference in the result of the match. Non-tagged groups (also known as non-capturing groups) are primarily used for grouping and specifying the structure of the regular expression but do not capture the matched text.

Here's a brief explanation:

1. **Tagged Group (Capturing Group)**: A tagged group is created by enclosing a part of the regular expression in parentheses `( )`. It captures the text that matches the enclosed pattern and allows you to access the captured text as a separate group in the match result. Tagged groups are useful when you want to extract specific parts of the matched text.

   Example with tagged group:
   ```python
   import re

   text = "The price is $50 and $25 for additional items."
   pattern = r"\$(\d+)"  # Capturing group captures the digits following '$'

   match = re.search(pattern, text)
   if match:
       captured_text = match.group(1)
       print(captured_text)  # Output: '50'
   ```

2. **Non-Tagged Group (Non-Capturing Group)**: A non-tagged group is created by using `(?: )` instead of `( )`. It groups the enclosed pattern without capturing the matched text as a separate group in the match result. Non-tagged groups are used when you want to group parts of the regular expression for logical grouping or applying quantifiers without capturing the text.

   Example with non-tagged group:
   ```python
   import re

   text = "The price is $50 and $25 for additional items."
   pattern = r"\$(?:\d+)"  # Non-capturing group for grouping digits following '$'

   match = re.search(pattern, text)
   if match:
       matched_text = match.group()  # Access the entire match
       print(matched_text)  # Output: '$50'
   ```

In a simple match where you are not interested in capturing and extracting specific parts of the matched text, the choice between tagged (capturing) and non-tagged (non-capturing) groups is often a matter of personal coding style and readability. If you don't need to capture the matched text for later use, using a non-tagged group is a reasonable choice for logical grouping or applying quantifiers without affecting the match result. However, it is unlikely to make a practical difference in the outcome of the match itself.

Q4. Describe a scenario in which using a nontagged category would have a significant impact on the
program&#39;s outcomes.


**Scenario: Extracting URLs from Text*:

```
Here are some links:
- http://www.example.com
- (https://www.example2.com)
- Visit our website (https://www.example3.com)
```

In this scenario, if you want to capture the URLs but exclude any surrounding parentheses, you can use non-capturing groups. Here's how it can be done:

```python
import re

text = """
Here are some links:
- http://www.example.com
- (https://www.example2.com)
- Visit our website (https://www.example3.com)
"""

# Regular expression with non-capturing group
pattern = r"(?:\()(https?://\S+)(?:\))"

matches = re.findall(pattern, text)
print(matches)
```

In this regular expression, we use non-capturing groups `(?: )` to enclose the opening and closing parentheses surrounding the URLs. This allows us to match the URLs within parentheses while excluding the parentheses themselves from the captured text.

As a result, the program's outcome is a list of extracted URLs:

```
['https://www.example2.com', 'https://www.example3.com']
```

Using non-capturing groups in this scenario is crucial because it ensures that only the URLs within parentheses are captured, while the parentheses themselves are not included in the extracted URLs. This significantly impacts the program's outcome by providing a clean and accurate list of URLs, addressing the specific requirements of the task.m

Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it
examines. Describe a situation in which this could make a difference in the results of your
programme.

Look-ahead conditions in regular expressions, specifically positive lookaheads (`(?= )`), indeed do not consume the characters they examine. This unique behavior can make a significant difference in the results of your program in scenarios where you need to match patterns that are followed by specific conditions without consuming the characters that follow. Here's a situation to illustrate the impact:

**Scenario: Validating Passwords**

Suppose you are developing a program that validates user passwords according to specific criteria, such as requiring at least one uppercase letter, one digit, and a minimum length. However, you want to ensure that these criteria are met without consuming the characters that follow the password in the input string.

Here's a sample program:



In this scenario, the regular expression `^(?=.*[A-Z])(?=.*\d).{8,}$` is used to validate passwords. Let's break down the components:

- `^`: Anchors the match at the start of the string.
- `(?=.*[A-Z])`: Positive lookahead that checks for at least one uppercase letter without consuming characters.
- `(?=.*\d)`: Positive lookahead that checks for at least one digit without consuming characters.
- `.{8,}`: Matches any characters (password) with a minimum length of 8 characters.
- `$`: Anchors the match at the end of the string.

The key here is the use of positive lookaheads `(?= )`. They allow you to check specific conditions within the password pattern (uppercase and digit requirements) without consuming characters beyond the password. This is crucial because you want to ensure that the password criteria are met while leaving the rest of the input string intact.

As a result, the program accurately validates passwords without affecting the characters that follow the password in the input string. This behavior is essential for password validation and ensures that the program's results are correct and that user input is properly validated.

In [2]:
import re
def valid_password(password):
    pattern = r"^(?=.*[A-Z])(?=.*\d).{8,}$"
    
    if re.match(pattern, password):
        return "valid password"
    else:
        return "invalid password"

In [6]:
valid_password('123Ab@rgrgs')

'valid password'

Q6. In standard expressions, what is the difference between positive look-ahead and negative look-
ahead?

In regular expressions, both positive look-ahead and negative look-ahead are used to assert whether a specific condition is met at a particular position in the string without consuming characters. However, they serve opposite purposes:

1. **Positive Look-Ahead (`(?= )`)**:
   - Positive look-ahead asserts that a particular condition **must be true** at a specific position in the string.
   - It succeeds if the condition is met, and it does not consume characters.
   - Syntax: `(?=expression)`

   Example: `foo(?=bar)` matches "foo" only if it is followed by "bar" without including "bar" in the match.

2. **Negative Look-Ahead (`(?! )`)**:
   - Negative look-ahead asserts that a particular condition **must not be true** at a specific position in the string.
   - It succeeds if the condition is not met, and it does not consume characters.
   - Syntax: `(?!expression)`

   Example: `foo(?!bar)` matches "foo" only if it is not followed by "bar."

**Positive Look-Ahead Example**:

Suppose you want to match email addresses in a string that are followed by a specific domain (e.g., "@example.com") but without including the domain in the match:

```python
import re

text = "Send an email to john@example.com and jane@example.net."
pattern = r"\b\w+@(?=example\.com)\w+\.\w+\b"

matches = re.findall(pattern, text)
print(matches)  # Output: ['john']
```

In this example, positive look-ahead `(?=example\.com)` ensures that the email address must be followed by "example.com" without consuming "example.com" in the match.

**Negative Look-Ahead Example**:

Suppose you want to find instances of "apple" in a string but only if they are not followed by "pie." You want to match "apple" but exclude cases like "apple pie":

```python
import re

text = "I like apple, but not apple pie."
pattern = r"apple(?! pie)"

matches = re.findall(pattern, text)
print(matches)  # Output: ['apple']
```

In this example, negative look-ahead `(?! pie)` ensures that "apple" is matched only if it is not followed by " pie."

In summary, positive look-ahead and negative look-ahead are powerful tools in regular expressions to assert conditions at specific positions in the string, allowing you to control and fine-tune your pattern matching based on whether a condition is met or not met at those positions.

Q7. What is the benefit of referring to groups by name rather than by number in a standard
expression?

Referring to groups by name rather than by number in a regular expression provides several benefits, particularly in terms of code readability, maintainability, and self-documentation. Here are the key advantages of using named groups:

1. **Improved Readability**: Named groups make the regular expression pattern more human-readable. Instead of referring to groups by numeric indices, you can use descriptive names that convey the purpose of each group. This makes it easier for others (and your future self) to understand the intent of the pattern.

2. **Self-Documenting**: Named groups act as self-documentation for the regular expression. By using meaningful names for groups, you provide a clear explanation of what each group captures, which can serve as inline documentation for the pattern.

3. **Easier Maintenance**: When you revisit or update your regular expressions, using named groups reduces the risk of errors. You don't have to worry about renumbering groups if you add, remove, or reorder them in the pattern. Named groups are less prone to breaking when the pattern changes.

4. **Accessibility**: Named groups are accessible in a more structured and intuitive way within your code. You can access captured text using the names as keys, which is more intuitive than using numeric indices, especially when dealing with complex patterns.

5. **Avoid Confusion**: In patterns with multiple capturing groups, using numeric indices can lead to confusion, especially when groups are nested or when the pattern evolves over time. Named groups help prevent such confusion by providing clear labels.

6. **Reuse**: Named groups allow you to reuse the same group name in different parts of the pattern, capturing different content. This can be valuable when you need to extract various pieces of information with similar formats.

**Example**:

Consider a scenario where you want to extract dates from a text in the format "MM/DD/YYYY." Using named groups, you can create a more readable and maintainable pattern:

```python
import re

text = "Event on 07/15/2023 and meeting on 12/31/2023"
pattern = r"(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{4})"

matches = re.finditer(pattern, text)
for match in matches:
    print(match.group("year"), match.group("month"), match.group("day"))
```

In this example, named groups like `(?P<month>...)`, `(?P<day>...)`, and `(?P<year>...)` make it clear what each part of the pattern captures. When accessing captured text, you use the group names as keys, enhancing code clarity and maintainability.

In summary, using named groups in regular expressions offers clarity, documentation, and ease of maintenance, making your patterns more accessible and less error-prone, especially in complex matching scenarios.

Q8. Can you identify repeated items within a target string using named groups, as in &quot;The cow
jumped over the moon&quot;?

In [7]:
import re

text = "The cow jumped over the moon moon."

# Define a pattern with a named group 'word' and a backreference '\1'
pattern = r"\b(?P<word>\w+)\b.*\b(?P=word)\b"

matches = re.finditer(pattern, text)
for match in matches:
    repeated_word = match.group("word")
    print(f"Repeated word: {repeated_word}")


Repeated word: moon
