#Question 1

Explain the difference between greedy and non-greedy syntax with visual terms in as few words
as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy
one? What characters or characters can you introduce or change?

..............

Answer 1 -

Greedy vs. Non-Greedy:

- `Greedy` : Matches the longest possible substring.

- `Non-Greedy`: Matches the shortest possible substring.

Transform Greedy to Non-Greedy:

Change `*` or `+` to `*?` or `+?`

#Question 2

When exactly does greedy versus non-greedy make a difference? What if you're looking for a
non-greedy match but the only one available is greedy?

...............

Answer 2 -

Greedy versus non-greedy behavior makes a difference when you're dealing with patterns that can match multiple occurrences in the input text. It affects how much text the pattern consumes during the matching process.

**Greedy Behavior** :

- Greedy quantifiers (`*` , `+ , `?` , `{m,n}`) match as much text as possible while still allowing the overall pattern to match.

- They try to maximize the length of the matched substring.

**Non-Greedy (Lazy) Behavior** :

- Non-greedy quantifiers (*?, +?, ??, {m,n}?) match as little text as possible while still allowing the overall pattern to match.

- They try to minimize the length of the matched substring.

**When It Matters** :

- `Greedy` : When you want to capture the longest possible substring that satisfies the pattern.

- `Non-Greedy` : When you want to capture the shortest possible substring that satisfies the pattern, often used in cases where the pattern is repeated.

**What If Non-Greedy Isn't Available?**

- If you're looking for a non-greedy match, but the only option is a greedy quantifier, you might need to adjust your pattern or use additional techniques to achieve the desired behavior.

For example, consider the following input text and pattern:

In [1]:
text = "apple banana orange apple"
pattern = r'apple.*orange'

- Greedy match (`.*`): apple banana orange

- Non-greedy match (`.*?`): apple

If a non-greedy option isn't available, you might need to modify your pattern or post-process the matched substring to achieve the desired result. In some cases, you might even consider using negative character sets or lookahead/lookbehind assertions to create a more specific pattern.

#Question 3

In a simple match of a string, which looks only for one match and does not do any replacement, is
the use of a nontagged group likely to make any practical difference?

..............

Answer 3 -

In a simple match of a string where you are only looking for one match and not performing any replacement, the use of a non-capturing group (also known as a non-tagged group) is not likely to make a significant practical difference in most cases.

Non-capturing groups (`?:...`) are used when you want to group a part of the pattern for quantification or alternation purposes but don't need to capture the matched content for later use.

In a simple match where you're only interested in whether the pattern matches or not, and you're not capturing any groups for extraction or manipulation, the use of capturing or non-capturing groups is less critical. The regular expression engine's main focus is to determine if the pattern matches the input string, and the choice between capturing and non-capturing groups doesn't significantly impact this aspect.

The main benefits of using non-capturing groups come into play when you're dealing with more complex patterns, alternations, or repetition. They can help improve the performance of the regular expression engine and make the intent of your pattern clearer to other developers who read your code.

In summary, for simple matching scenarios where you're not capturing groups for later use, the choice between using a non-capturing group or not is not likely to have a practical impact on the outcome of the match. However, using non-capturing groups can be a good practice for maintaining code consistency and clarity, especially as your regular expressions become more complex.

#Question 4

Describe a scenario in which using a nontagged category would have a significant impact on the
program's outcomes.

...............

Answer 4 -

Here's a scenario where using a non-capturing group ((?:...)) can have a significant impact on a program's outcomes:

**Scenario: Extracting URLs from Text**

Suppose you are building a program that needs to extract URLs from a large text document. URLs can have various formats, such as `"http://", "https://", "www."` , etc. You want to extract and process these URLs.

Using a non-capturing group can make a difference in the accuracy and efficiency of your extraction process:

In [2]:
import re

# Text containing URLs
text = "Visit my website at http://www.example.com and check out https://www.example.org"

# Without non-capturing group
pattern_without_non_capturing = r'(http|https)://(?:www\.)?(\S+)'
matches_without_non_capturing = re.findall(pattern_without_non_capturing, text)

# With non-capturing group
pattern_with_non_capturing = r'(http|https)://(?:www\.)?(\S+)'
matches_with_non_capturing = re.findall(pattern_with_non_capturing, text)

print("Without non-capturing group:", matches_without_non_capturing)
print("With non-capturing group:", matches_with_non_capturing)

Without non-capturing group: [('http', 'example.com'), ('https', 'example.org')]
With non-capturing group: [('http', 'example.com'), ('https', 'example.org')]


In this example, we are using regular expressions to extract URLs from the text. The pattern (http|https)://(?:www\.)?(\S+) captures the URL protocol (http or https) and the domain portion. Notice the use of the non-capturing group (?:www\.)?.

Impact of Using Non-Capturing Group:

- **Accuracy** : The non-capturing group ensures that the optional "`www.`" part of the domain is not captured. Without the non-capturing group, the capturing group would include the "www." as part of the captured domain.

- **Efficiency**: The non-capturing group improves efficiency by avoiding unnecessary capture of the "`www.`" portion. It reduces the number of captured groups, making the processing faster and potentially saving memory.

In this scenario, using a non-capturing group improves the accuracy of the extracted URLs and can have a noticeable impact on the program's outcomes, especially when processing a large amount of text with complex patterns.

#Question 5

Unlike a normal regex pattern, a look-ahead condition does not consume the characters it
examines. Describe a situation in which this could make a difference in the results of your
programme.

...............

Answer 5 -

Consider a scenario where you're working on a program that needs to validate passwords based on specific criteria. Let's say the criteria include the following:

a) The password must be at least 8 characters long.

b) The password must contain at least one uppercase letter.

c) The password must contain at least one lowercase letter.

d) The password must contain at least one digit.

Using look-ahead assertions in this scenario can make a significant difference in the results of your program:

In [4]:
import re

passwords = ["P@ssw0rd", "Secr3t", "12345678", "AbC!123"]

for password in passwords:
    # Using look-ahead assertions
    is_valid = re.match(r"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$", password)
    status = "Valid" if is_valid else "Invalid"

    print(f"Password: {password}, Status: {status}")

Password: P@ssw0rd, Status: Valid
Password: Secr3t, Status: Invalid
Password: 12345678, Status: Invalid
Password: AbC!123, Status: Invalid


In this example, the regular expression `^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$` uses three positive look-ahead assertions to check the password against the criteria without consuming any characters:

- `(?=.*[A-Z])` : Positive look-ahead for at least one uppercase letter.

- `(?=.*[a-z])` : Positive look-ahead for at least one lowercase letter.

- `(?=.*\d)` : Positive look-ahead for at least one digit.

The main advantage of using look-ahead assertions is that they allow you to check multiple conditions without advancing the cursor in the string. This is crucial when you want to enforce certain criteria without altering the string itself.

If you were to use normal capturing groups instead of look-ahead assertions, you would need to scan the string separately for each condition. This could lead to incorrect results, as consuming characters in one condition could affect the matching process of the subsequent conditions.

#Question 6

In standard expressions, what is the difference between positive look-ahead and negative look-
ahead?

..............

Answer 6 -

Positive look-ahead and negative look-ahead are both types of look-ahead assertions in regular expressions, but they have different purposes and behaviors:

1) **Positive Look-Ahead (`?=...`)** :

- Positive look-ahead asserts that a specific pattern must be present ahead in the string, without actually consuming any characters.

- It is used to ensure that a particular condition is met without including the matched content in the final match.

- Example: `foo(?=bar)` matches "`foo`" only if it is
`followed` by "`bar`".

2) **Negative Look-Ahead (`?!...`)** :

- Negative look-ahead asserts that a specific pattern must NOT be present ahead in the string, without consuming characters.

- It is used to ensure that a particular condition is not met without including the matched content in the final match.

- Example: `foo(?!bar)` matches "`foo`" only if it is `NOT followed` by "`bar`".

Here's a visual comparison of the two:

In [None]:
Input String: "foobar"

Positive Look-Ahead:
- Pattern: foo(?=bar)
- Matches: foo

Negative Look-Ahead:
- Pattern: foo(?!bar)
- Matches: fo

In the positive look-ahead example, "foo" is matched only when followed by "bar", but "bar" is not included in the match.

In the negative look-ahead example, "foo" is matched only when NOT followed by "bar".

Positive and negative look-ahead assertions are powerful tools for specifying conditions that must or must not be met ahead in the string, without consuming characters. They are useful for complex pattern matching and can significantly enhance the capabilities of regular expressions.

#Question 7

What is the benefit of referring to groups by name rather than by number in a standard
expression?

...............

Answer 7 -

Referring to groups by name in a regular expression, as opposed to by number, offers several benefits that enhance code readability, maintainability, and flexibility:

1) **Improved Readability** : Using named groups makes your regular expressions more self-explanatory and easier to understand. The names you assign to groups can convey the purpose or meaning of each captured portion, making the pattern's intent clearer.

2) **Self-Documenting Patterns** : Named groups act as documentation within the pattern itself. Someone reading the code can understand the significance of each group without having to refer to external documentation.

3) **Enhanced Maintenance** : Named groups make your code more resilient to changes. If you modify the regular expression or reorder capturing groups, the references to named groups remain accurate, whereas numbered references could become incorrect.

4) **Flexibility in Group Order** : Named groups allow you to reorder, add, or remove groups without affecting code that references them by name. This flexibility simplifies pattern adjustments and evolution.

5) **Code Clarity** : Named groups reduce the need to remember group indices and make your code more self-explanatory. This is especially helpful when working with complex patterns.

6) **Avoiding Magic Numbers** : Using named groups eliminates the need to remember and manage the numeric indices of groups. This avoids "magic numbers" and reduces the potential for errors.

Here's a comparison of referencing groups by name and by number:

In [7]:
import re

# Using Named Groups
pattern_named = r'(?P<month>\d{2})-(?P<day>\d{2})-(?P<year>\d{4})'
match_named = re.match(pattern_named, '08-14-2023')
print(match_named.group('year'))

# Using Numbered Groups
pattern_numbered = r'(\d{2})-(\d{2})-(\d{4})'
match_numbered = re.match(pattern_numbered, '08-14-2023')

print(match_numbered.group(3))

2023
2023


#Question 8

Can you identify repeated items within a target string using named groups, as in "The cow
jumped over the moon"?

...............

Answer 9 -

Here's how you can do it:

In [20]:
import re
text = "The cow jumped over the moon"
regobj=re.compile(r'(?P<w1>The)',re.I)
regobj.findall(text)

['The', 'the']

#Question 9

When parsing a string, what is at least one thing that the Scanner interface does for you that the
re.findall feature does not?

...............

Answer 9 -

The `Scanner` interface and the `re.findall()` function both serve the purpose of parsing strings to find specific patterns, but there is a key difference in how they operate and the functionality they provide.

One thing that the Scanner interface in Java does for you that the re.findall() feature in Python does not is that the Scanner allows you to iterate through tokens (segments of the input) and apply different processing to each token based on its type. This is especially useful when dealing with more complex parsing scenarios.

For example, with the Scanner interface, you can:

1) **Tokenize** : The Scanner can break down an input string into tokens based on delimiters or regular expressions. Each token can represent a specific part of the input data.

2) **Differentiate Types** : You can use the `hasNextXXX()` and `nextXXX()` methods of the Scanner to determine if the next token matches a specific data type (like integers, floats, or words) and then retrieve that token as the corresponding type.

3) **Conditional Parsing** : You can conditionally process tokens differently based on their types, allowing you to apply custom logic for different types of tokens.

4) **Custom Delimiters** : The Scanner allows you to set custom delimiters to determine how tokens are separated, which can be useful in scenarios where tokens are separated by specific patterns.

5) **Reading from Various Sources** : The Scanner can read from various sources like strings, files, and input streams.

The `re.findall()` function in Python is primarily focused on pattern matching and returns a list of matched substrings. While it's powerful for finding specific patterns in text, it doesn't provide the same level of control and differentiation that the `Scanner` interface does for token-based parsing.

#Question 10

Does a scanner object have to be named scanner?

..............

Answer 10 -

No, a Scanner object does not have to be named "`scanner.` " The name you choose for the Scanner object is entirely up to you. When you create an instance of the Scanner class, you can assign it any valid identifier name that adheres to Java's naming rules.

Here's an example of creating a Scanner object with a different name:

In [None]:
import java.util.Scanner;

public class Example {
    public static void main(String[] args) {
        // Create a Scanner object with a custom name
        Scanner inputScanner = new Scanner(System.in);

        System.out.print("Enter a number: ");
        int number = inputScanner.nextInt();
        System.out.println("You entered: " + number);

        // Remember to close the Scanner when done
        inputScanner.close();
    }
}

In this example, the Scanner object is named `inputScanner` . You can use any meaningful and appropriate name that makes sense in the context of your code. The key is to choose a name that helps improve the readability and understanding of your code.