<h1> <u> <font color= green > Advanced_Assignment_17 </font> </u> </h1>

## Q1. Explain the difference between greedy and non-greedy syntax with visual terms in as few words as possible. What is the bare minimum effort required to transform a greedy pattern into a non-greedy one? What characters or characters can you introduce or change?

> * **Greedy**: Greedy patterns match as much text as possible. For example, the regular expression (.* ) matches any character, zero or more times.
> * **Non-greedy**: Non-greedy patterns match the least amount of text possible. For example, the regular expression ( .*? ) matches any character, zero or more times, but it will stop as soon as it finds a match.

> The bare minimum effort required to transform a greedy pattern into a non-greedy one is to ***add a question mark (?)*** after the quantifier. For example, the regular expression .* can be transformed into a non-greedy pattern by adding a question mark, like this: ( .*? ). <br>
>> The question mark (?) tells the regular expression engine to match the least amount of text possible. This is useful when we want to match a pattern, but you don't want to match too much text.

## Q2. When exactly does greedy versus non-greedy make a difference? What if you're looking for a non-greedy match but the only one available is greedy?

> **Greedy matching** tries to matches the longest possible substring that satisfies the pattern, while<br>
> **non-greedy** (also known as lazy) matching attempts to  matches the shortest possible substring that satisfies the pattern.

>The difference between greedy and non-greedy matching is determined by the presence of quantifiers such as *, +, ?, or {} in the regular expression pattern.

In [None]:
import re

text = "Hello World"

greedy_match = re.search(r'H.*o', text)    # Greedy matching
print(greedy_match.group())  # Output: Hello Wor


non_greedy_match = re.search(r'H.*?o', text)    # Non-greedy matching
print(non_greedy_match.group())  # Output: Hello


## Q3. In a simple match of a string, which looks only for one match and does not do any replacement, is the use of a non-tagged group likely to make any practical difference?

> The use of a non-tagged group is not likely to make any practical difference in a simple match of a string that looks only for one match and does not do any replacement because:
> * **Non-tagged groups are not captured**: Non-tagged groups are not captured in regular expressions. This means that the text matched by a non-tagged group is not stored in memory.
> * **Greedy and non-greedy only matter for capturing**: Greedy and non-greedy only matter for capturing groups. If you are not capturing a group, then the greedy or non-greedy modifier will not make any difference.

> In a simple match of a string that looks only for one match and does not do any replacement, the text matched by the regular expression is not stored in memory. Therefore, the use of a non-tagged group is not likely to make any practical difference.

## Q4. Describe a scenario in which using a nontagged category would have a significant impact on the program's outcomes.

> A scenario in which using a non-tagged group would have a significant impact on the program's outcomes are:
> * **Scenario**: We are writing a program that parses a text file and extracts the email addresses from the file. We want to store the email addresses in a list.
> * **Solution**: We could use a regular expression to match the email addresses in the text file. We could use a non-tagged group to capture the email addresses. The non-tagged group would not be stored in memory, but it would be available to the program.
> * **Impact**: The impact of using a non-tagged group in this scenario is that the program would be able to access the email addresses without storing them in memory. This could improve the performance of the program, especially if the text file is large.


In [None]:
# Here is an example of a regular expression that could be used to match email addresses in a text file:
import re

pattern = '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]+'

text = open('text.txt', 'r').read()

matches = re.findall(pattern, text)
email_addresses = [match.group() for match in matches]
print(email_addresses)

## Q5. Unlike a normal regex pattern, a look-ahead condition does not consume the characters it examines. Describe a situation in which this could make a difference in the results of your programme

> A situation in which a look-ahead condition could make a difference in the results of a program:
> * **Scenario**: You are writing a program that parses a text file and extracts the phone numbers from the file. You want to store the phone numbers in a list.
> * **Solution**: You could use a regular expression to match the phone numbers in the text file. You could use a look-ahead condition to ensure that the phone numbers are valid.
> * **Impact**: The impact of using a look-ahead condition in this scenario is that the program would be able to filter out invalid phone numbers. This would improve the accuracy of the program.


In [None]:
# Here is an example of a regular expression that could be used to match phone numbers in a text file:
import re

pattern = r'^\d{3}-\d{3}-\d{4}(?=\d{1})$'

text = open('text.txt', 'r').read()
matches = re.findall(pattern, text)
phone_numbers = [match.group() for match in matches]
print(phone_numbers)

## Q6. In standard expressions, what is the difference between positive look-ahead and negative look-ahead?

> * **Positive look-ahead**: A positive look-ahead is a regular expression that matches if the specified pattern is found immediately following the current match. The positive look-ahead is denoted by the ?= symbol.
> * **Negative look-ahead**: A negative look-ahead is a regular expression that matches if the specified pattern is not found immediately following the current match. The negative look-ahead is denoted by the ?! symbol.

> *Here is an example of a positive look-ahead:*
```pattern = r'^\d{3}-\d{3}-\d{4}(?=\d{1})$' ```
> This pattern matches a phone number that is followed by a digit. The (?=\d{1}) part of the pattern is a positive look-ahead that ensures that the phone number is followed by a digit.

> *Here is an example of a negative look-ahead:*
```pattern = r'^\d{3}-\d{3}-\d{4}(?!\d{5})$' ```
This pattern matches a phone number that is not followed by a five-digit number. The (?!\d{5}) part of the pattern is a negative look-ahead that ensures that the phone number is not followed by a five-digit number.

## Q7. What is the benefit of referring to groups by name rather than by number in a standard expression?

> Benefits to referring to groups by name rather than by number in a regular expression:
> * **Readability**: Regular expressions with named groups are easier to read and understand, especially for complex patterns.
> * **Flexibility**: Named groups can be reused in different parts of a regular expression, which can make the code more concise and easier to maintain.
> * **Extensibility**: Named groups can be used to create more complex patterns, such as patterns that capture multiple pieces of information.


## Q8. Can you identify repeated items within a target string using named groups, as in "The cow jumped over the moon"?

> We can identify repeated items within a target string using named groups

In [None]:
import re

pattern = r'(?P<item>\w+) (?P<repeat>\d+)'
text = 'The cow jumped over the moon'

match = re.search(pattern, text)

if match:
    item = match.group('item')
    repeat = match.group('repeat')
    print(f'The word "{item}" repeated {repeat} times in the text.')


## Q9. When parsing a string, what is at least one thing that the Scanner interface does for you that the re.findall feature does not?

> Here is one thing that the Scanner interface does for you when parsing a string that the re.findall feature does not:
> * **The Scanner interface allows you to iterate over the matches in a string**: The re.findall() function returns a list of all the matches in a string. However, we cannot iterate over the matches in the list. The Scanner interface allows us to iterate over the matches in a string, which can be useful for processing the matches one at a time.


In [None]:
import re

text = "Numbers: 123 456 789"
numbers = [int(num) for num in re.findall(r'\d+', text)]

for number in numbers:
    print("Parsed integer:", number)


## Q10. Does a scanner object have to be named scanner?

> No, a scanner object does not have to be named scanner. We can name a scanner object anything you want. However, the name scanner is a common convention for naming scanner objects. <br>
When using the re.Scanner object in Python's ***re module***, you can assign any valid variable name to the scanner object.

In [None]:
import re

text = "Hello, World!"

my_scanner = re.Scanner([(r'\w+', lambda scanner, token: ('WORD', token))])   # Creating a scanner object with a different name

tokens, remainder = my_scanner.scan(text)   # Using the scanner object
for token in tokens:
    print(token)
