## Regular Expression

- `Definition`
  - Regular expressions are sequences of characters that define a search pattern. 
  - other names
    - regex
    - regexp
  - They are commonly used for 
    - string manipulation 
    - pattern matching
    - text search operations. 
- `Library in Python`
  - The `re` module provides functions and methods for working with regular expressions.

## Basic Patterns:

- `Literal Characters:` 
  - Match the characters exactly as they appear.
  
        - Example: apple matches the string "apple".
  
  - `Metacharacters:` 
    - Special characters with a reserved meaning.

    - Examples: . ^ $ * + ? { } [ ] \ | ( )

- `Quantifiers:`
 
  - `*:` Matches 0 or more occurrences of the preceding character or group.

    - Example: a* matches "", "a", "aa", "aaa", ...

  - `+:` Matches 1 or more occurrences of the preceding character or group.

    - Example: a+ matches "a", "aa", "aaa", ...

  - `?:` Matches 0 or 1 occurrence of the preceding character or group.

    - Example: a? matches "", "a".
  
  - `{n}:` Matches exactly n occurrences of the preceding character or group.

    - Example: a{3} matches "aaa".

- `{n,}:` Matches n or more occurrences of the preceding character or group.

  - Example: a{2,} matches "aa", "aaa", ...

- `{n,m}:` Matches between n and m occurrences of the preceding character or group.

  - Example: a{2,4} matches "aa", "aaa", "aaaa".

- `Character Classes:`

  - `[ ]:` Matches any single character within the square brackets.

    - Example: [aeiou] matches any vowel.

  - `[^ ]:` Matches any single character not within the square brackets.

    - Example: [^aeiou] matches any non-vowel.

- `Ranges:` Specify a range of characters within square brackets.

  - Example: [a-z] matches any lowercase letter.

- `Anchors:`

  - `^:` Matches the start of a string.

    - Example: ^abc matches "abc" at the start of a string.

  - `$:` Matches the end of a string.

    - Example: xyz$ matches "xyz" at the end of a string.

- `Escape Characters:`
  
  - `\:` Escapes a metacharacter, allowing it to be treated as a literal character.

    - Example: \. matches a literal dot.

- `Grouping and Capturing:`
  
  - `( ):` Groups patterns together. Captures the matched text for later use.

    - Example: (ab)+ matches "ab", "abab", "ababab", ...

- `Special Sequences:`
  
  - `\d:` Matches any digit (0-9).
  
  - `\D:` Matches any non-digit character.
  
  - `\w:` Matches any word character (alphanumeric + underscore).

  - `\W:` Matches any non-word character.
  
  - `\s:` Matches any whitespace character (space, tab, newline).
  
  - `\S:` Matches any non-whitespace character.

- `Quantifiers (Greedy and Non-Greedy):`
  
  - `Greedy:` Matches as much as possible.

    - Example: .* matches the entire string.

  - `Non-Greedy:` Matches as little as possible.

    - Example: .*? matches the smallest part possible.

  - `Lookahead and Lookbehind:`
  
    - `(?= ...):` Positive lookahead assertion. Matches the preceding pattern only if it is followed by the specified pattern.

      - Example: apple(?= pie) matches "apple" only if it is followed by " pie".

    - `(?! ...):` Negative lookahead assertion. Matches the preceding pattern only if it is not followed by the specified pattern.

      - Example: apple(?! pie) matches "apple" only if it is not followed by " pie".

    - `(?<= ...):` Positive lookbehind assertion. Matches the following pattern only if it is preceded by the specified pattern.

      - Example: (?<=good )morning matches "morning" only if it is preceded by "good ".

    - `(?<! ...):` Negative lookbehind assertion. Matches the following pattern only if it is not preceded by the specified pattern.

      - Example: (?<!bad )apple matches "apple" only if it is not preceded by "bad ".

- `Flags:`
  
  - `re.IGNORECASE (re.I):` Case-insensitive matching.
    
    - Example: re.compile(r'apple', re.I) matches "apple", "Apple", "APPLE", etc.

In [2]:
# import Regular Expression
import re

### Literal Characters

In [4]:
# Example: Matching the literal string "apple". 
import re

pattern = re.compile(r'apple')
text = 'I love apples and oranges.'

match = pattern.search(text)
if match:
    print("Match found:", match.group())
else:
    print("No match.")


Match found: apple


### Quantifiers:

In [7]:
# Example: Matching one or more occurrences of the letter 'a'.

import re

pattern = re.compile(r'a+')
text = 'aaabbbccc'

matches = pattern.findall(text)
print("Matches:", matches)


Matches: ['aaa']


In [9]:
# Example: Matching one or more occurrences of the letter 'a'.

import re

pattern = re.compile(r'a+')
text = 'aaabbbcccadaahjga' # if a is not in sequence then it will break the match and will proceed for next character if found it will match and make a new sequence

matches = pattern.findall(text)
print("Matches:", matches)


Matches: ['aaa', 'a', 'aa', 'a']


### Character Classes

In [11]:
# Example: Matching any vowel.
import re

pattern = re.compile(r'[aeiou]')
text = 'Hello, World!'

matches = pattern.findall(text)
print("Vowels found:", matches)

Vowels found: ['e', 'o', 'o']


### Anchors

In [12]:
# Example: Matching the string that starts with 'abc'.
import re

pattern = re.compile(r'^abc')
text = 'abc123'

if pattern.match(text):
    print("Match found!")
else:
    print("No match.")

Match found!


### Escape Characters:


In [13]:
# Example: Matching a literal dot in a string
import re

pattern = re.compile(r'\.')
text = 'file.txt'

if pattern.search(text):
    print("Match found!")
else:
    print("No match.")

Match found!


### Grouping and Capturing

In [3]:
# Example : Capturing repeated patterns of "ab"
import re

pattern = re.compile(r'(ab)+')
text = 'abababab'
text1 = 'ababaababaab'

matches = pattern.findall(text)
matches1 = pattern.findall(text1)
print("Matches:", matches)
print("Matches:", matches1)


Matches: ['ab']
Matches: ['ab', 'ab', 'ab']


### Special Sequences

In [17]:
# Example : Matching digits in a string
import re

pattern = re.compile(r'\d+')
text = 'There are 123 apples in the basket.'

matches = pattern.findall(text)
print("Digits found:", matches)


Digits found: ['123']


### Quantifiers (Greedy and Non-Greedy)

In [5]:
# Example: Using a non-greedy quantifier to match the smallest part possible.
import re

pattern = re.compile(r'.*?end')
text = 'Start of the sentence and the end of the sentence.'

match = pattern.search(text)
if match:
    print("Match found:", match.group())  
else:
    print("No match.")


Match found: Start of the sentence and the end


## Lookahead and Lookbehind

In [19]:
# Example: Using positive lookahead to match "apple" only if it is followed by " pie".
import re

pattern = re.compile(r'apple(?= pie)')
text = 'I like apple pie.'

match = pattern.search(text)
if match:
    print("Match found:", match.group())
else:
    print("No match.")


Match found: apple


### Flags

In [20]:
# Example: Using the case-insensitive flag to match "apple" regardless of case.
import re

pattern = re.compile(r'apple', re.I)
text = 'I have an Apple tree.'

match = pattern.search(text)
if match:
    print("Match found:", match.group())
else:
    print("No match.")


Match found: Apple
