# 1.

In [1]:
# Regular expressions (regex) are a powerful tool for pattern matching and text manipulation in programming languages like Python.

# Here are some key benefits of using regular expressions:

# a) Pattern Matching: Regular expressions allow you to define complex patterns to search for specific substrings or patterns 
#     within text data. This includes simple tasks like finding all occurrences of a word, to more complex tasks like extracting 
#     email addresses or phone numbers from a block of text.

# b) Flexible and Expressive: Regular expressions provide a concise and expressive way to specify patterns. They allow you to
#     define rules for matching strings based on characters, character classes, repetitions, alternations, anchors, and more.
#     This flexibility makes it easier to handle various text processing tasks efficiently.

# c) Text Validation: Regular expressions are commonly used for validating user input. For example, you can use regex to validate
#     email addresses, URLs, phone numbers, dates, and other formats, ensuring that the input data meets specific criteria before
#     processing.

# d) Efficiency: When working with large volumes of text data, regular expressions can significantly improve efficiency. 
#     They enable you to perform complex text processing tasks in a single pass, avoiding the need for multiple iterations 
#     through the data.

# e) Standardized Syntax: Regular expressions have a standardized syntax across different programming languages and tools. 
#     Once you understand regex basics, you can apply the same knowledge to work with text data in various environments.

# f) Versatility: Regular expressions are widely supported in programming languages, text editors, databases, command-line tools,
#     and other software. This versatility allows you to use regex in different contexts and integrate text processing 
#     functionalities seamlessly into your applications.

# 2.

In [2]:
# Differences regular expressions :

# 1. (ab)c+:

# a) This regular expression matches the pattern where "ab" is followed by one or more occurrences of the character "c".
# b) For example, it matches strings like "abc", "abcc", "abccc", and so on.
# c) The parentheses (ab) create a capturing group, which captures the sequence "ab" as a unit.
# d) The c+ part specifies that the character "c" must appear one or more times consecutively after the captured group "(ab)".

# 2. a(bc)+:

# a) This regular expression matches the pattern where the character "a" is followed by one or more occurrences of the sequence "bc".
# b) For example, it matches strings like "abc", "abcbc", "abcbcbc", and so on.
# c) The a part specifies that the string must start with the character "a".
# d) The (bc)+ part creates a capturing group for the sequence "bc" and specifies that this group must appear one or more times 
# consecutively after the "a".

# 3.

In [3]:
# 1. You need to use the import re statement at the beginning of your Python code whenever you want to work with regular expressions.
# 2. This statement imports the re module, which provides support for working with regular expressions in Python. 
# 3. Without this import statement, you won't have access to the regular expression functions and methods provided by the re module.

# 4.

In [4]:
# In a regular expression pattern, square brackets ([]) are used to define a character class, which matches any single 
#  character within the brackets. Inside square brackets, certain characters have special significance when expressing a range:

# a) Hyphen (-): When a hyphen - appears between two characters inside square brackets, it represents a range of characters. 

# example:
# [a-z]: Matches any lowercase letter from 'a' to 'z'.
# [0-9]: Matches any digit from '0' to '9'.

# b) Caret (^): When the caret ^ appears as the first character inside square brackets, it negates the character class, 
#     meaning it matches any character that is not in the specified range. 
    
# example:
# [^0-9]: Matches any character that is not a digit.

# c) Backslash (): In some cases, you may need to escape certain characters with a backslash \ inside square brackets to match
#     them literally. 

# example:
# [\[\]]: Matches either '[' or ']'.

# 5.

In [5]:
# Compiling a regular expression object in Python using the re.compile() function provides several benefits:

# a) Performance Improvement: Compiling a regular expression pattern into a regex object allows Python to optimize the pattern 
#     and create a bytecode version that can be executed more efficiently. This can lead to faster matching when using the 
#     compiled regex object multiple times.

# b) Code Readability and Reusability: By compiling a regular expression pattern into a regex object, you can assign a meaningful
#     name to the object, making your code more readable and self-explanatory. It also allows you to reuse the compiled regex
#     object multiple times without recompiling the pattern each time.

# c) Precompilation Checks: When you compile a regex pattern using re.compile(), Python checks the syntax of the pattern and 
#     raises any syntax errors immediately. This helps catch regex pattern errors early in the development process.

# d) Reduced Redundancy: If you have multiple places in your code where you need to use the same regex pattern, compiling it once
#     into a regex object eliminates the redundancy of writing the same pattern multiple times, leading to cleaner and more
#     maintainable code.

# e) Flags and Options: The re.compile() function allows you to specify flags and options such as case-insensitivity (re.IGNORECASE),
#     multiline mode (re.MULTILINE), and others, which can be applied to the compiled regex object for consistent behavior 
#     across multiple regex matches.

# 6.

In [6]:
# Here are some examples of how to use the match object returned by re.match and re.search in Python:

# a) Using Match Object Attributes:
import re

text = "Hello, World!"
pattern = r"Hello"

# Using re.match to find pattern at the beginning of the string
match_obj = re.match(pattern, text)
if match_obj:
    print("Match found at the beginning:", match_obj.group())
else:
    print("No match at the beginning")

# Using re.search to find pattern anywhere in the string
search_obj = re.search(pattern, text)
if search_obj:
    print("Match found anywhere:", search_obj.group())
else:
    print("No match anywhere")

Match found at the beginning: Hello
Match found anywhere: Hello


In [7]:
# b) Accessing Matched Groups:Accessing Matched Groups:
import re

text = "The price is $10.50"
pattern = r"\$\d+\.\d+"

# Using re.search to find pattern and access the matched group
search_obj = re.search(pattern, text)
if search_obj:
    print("Matched price:", search_obj.group())
else:
    print("Price not found")

Matched price: $10.50


In [9]:
# c) Iterating Over Matches:
import re

text = "apple banana cherry"
pattern = r"\w+"

# Using re.finditer to find all matches and iterate over them
for match_obj in re.finditer(pattern, text):
    print("Match:", match_obj.group())


Match: apple
Match: banana
Match: cherry


# 7.

In [10]:
# The vertical bar | and square brackets [] serve different purposes in regular expressions.

# a) Vertical Bar (|) as an Alteration:

# 1. The vertical bar | is used to specify alternatives or alterations in a regular expression.
# 2. It allows you to match either one pattern or another.
# 3. For example, the pattern a|b will match either the character 'a' or the character 'b'.
# 4. This is useful when you want to match different patterns in a single regular expression.

# example:
import re

text = "apple banana cherry"
pattern = r"apple|banana"

matches = re.findall(pattern, text)
print(matches)  # Output: ['apple', 'banana']

['apple', 'banana']


In [12]:
# b) Square Brackets ([]) as a Character Set:

# 1. Square brackets [] are used to create a character set, which matches any single character inside the brackets.
# 2. It allows you to specify a range of characters or specific characters that you want to match.
# 3. For example, the pattern [aeiou] will match any vowel ('a', 'e', 'i', 'o', 'u').
# 4. This is useful when you want to match a specific set of characters in a string.

# example:
import re

text = "apple banana cherry"
pattern = r"[aeiou]"

matches = re.findall(pattern, text)
print(matches)  # Output: ['a', 'e', 'a', 'a', 'a', 'e']

['a', 'e', 'a', 'a', 'a', 'e']


# 8.

In [13]:
# In regular expressions, using the raw-string indicator (r) is not strictly necessary, but it's a good practice to do so, 
# especially when dealing with backslashes (\) in search patterns or replacement strings. 

# Here's why:

# a) Search Patterns:

# 1. When defining a regular expression pattern, you may need to use backslashes to escape certain characters like \d for digits,
#     \s for whitespace, or \. to match a literal dot.
# 2. If you use a raw string (prefixed with r) to define your regular expression pattern, Python treats backslashes as literal 
#     characters rather than escape characters.
# 3. This prevents Python from interpreting backslashes in unexpected ways, ensuring that your regular expression pattern matches
#     exactly what you intend.
# 4. Example without raw string: pattern = "\d+" (interpreted by Python as d+)
# 5. Example with raw string: pattern = r"\d+" (interpreted as \d+ in the regular expression)

# b) Replacement Strings:

# 1. In regular expressions, replacement strings are often used with functions like re.sub() to replace matched patterns with 
#     specific content.
# 2. Backreferences to captured groups in the search pattern are commonly used in replacement strings (e.g., \1, \2, etc.).
# 3. Using a raw string for replacement strings ensures that backslashes are treated as literal characters and not escape
#     characters in the replacement text.
# 4. Example without raw string: replacement = "\\1" (interpreted by Python as \1)
# 5. Example with raw string: replacement = r"\1" (interpreted as \1 in the replacement string)