# Using Dot (`.`) Metacharacter

The dot (`.`) metacharacter is used to match any character except a newline. Let's see how this works with a simple example:

In [10]:
import re

# Sample text
text = "cat bat hat mat rat pat"

# Pattern using dot to match any character between 'c' and 't'
pattern = r"c.t"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['cat']


### Explanation
- **Text:** `"cat bat hat mat rat pat"`
- **Pattern:** `"c.t"`
  - The pattern `c.t` means:
    - `c` - Match the letter 'c'
    - `.` - Match any character (like 'a', 'o', 'e', etc.)
    - `t` - Match the letter 't'

- **Result:** `['cat']`
  - The `findall` function will return a list of all words that start with 'c', have any single character in between, and end with 't'. 
  - In this example, it finds "cat" because the dot (`.`) matches any character between 'c' and 't'.

# Using Caret (`^`) Metacharacter

The caret (`^`) metacharacter is used to match the **start** of a string. Let's see how this works with a simple example:

In [13]:
import re

# Sample text
text = "Python is fun. Python is powerful. Learning Python is great."

# Pattern using caret to match the word "Python" at the start of a string
pattern = "^Python is fun"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['Python is fun']


### Explanation
- **Text:** `"Python is fun. Python is powerful. Learning Python is great."`
- **Pattern:** `"^Python"`
  - The pattern `^Python` means:
    - `^` - Match the start of the string.
    - `Python` - Match the exact word "Python".

- **Result:** `['Python']`
  - The `findall` function will return a list of all occurrences of the word "Python" **only if it appears at the very beginning** of the string.
  - In this example, it finds "Python" because it matches the word "Python" at the start of the string.

#### Note
If the word "Python" appears anywhere else in the string, but not at the start, it won't be matched.

# Using Dollar Sign (`$`) Metacharacter

The dollar sign (`$`) metacharacter is used to match the **end** of a string. Let's see how this works with a simple example:

In [14]:
import re

# Sample text
text = "Learning is fun. Python is powerful. I love Python."

# Pattern using dollar sign to match the word "Python" at the end of the string
pattern = "Python.$"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['Python.']


### Explanation
- **Text:** `"Learning is fun. Python is powerful. I love Python."`
- **Pattern:** `"Python$"`
  - The pattern `Python$` means:
    - `Python` - Match the exact word "Python".
    - `$` - Match only if it appears at the end of the string.

- **Result:** `['Python']`
  - The `findall` function will return a list containing the word "Python" **only if it appears at the very end** of the string.
  - In this example, it matches "Python" because "Python" is at the end of the string.

#### Note
If the word "Python" appears anywhere else in the string but not at the end, it won’t be matched by this pattern.

# Using Asterisk (`*`) Metacharacter

The asterisk (`*`) metacharacter is used to match **zero or more occurrences** of the character or pattern before it. This means it will match if the character appears any number of times, including zero times. Let's see how this works with a simple example:

In [24]:
import re

# Sample text
text = "cat, caat, caaat, ct"

# Pattern using asterisk to match 'c' followed by zero or more 'a's, then 't'
pattern = "ca*t"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['cat', 'caat', 'caaat', 'ct']


#### Explanation
- **Text:** `"cat, caat, caaat, ct"`
- **Pattern:** `"ca*t"`
  - The pattern `ca*t` means:
    - `c` - Match the letter 'c'.
    - `a*` - Match zero or more occurrences of the letter 'a'.
    - `t` - Match the letter 't'.

- **Result:** `['cat', 'caat', 'caaat', 'ct']`
  - The `findall` function will return a list of all occurrences where the letter 'c' is followed by zero or more 'a's and then the letter 't'.
  - In this example, it matches:
    - "cat" (one 'a'),
    - "caat" (two 'a's),
    - "caaat" (three 'a's),
    - "ct" (zero 'a's).

#### Note
The `*` metacharacter is very flexible because it can match any number of the specified character, including none at all.

# Using Plus (`+`) Metacharacter

The plus (`+`) metacharacter is used to match **one or more occurrences** of the character or pattern before it. This means it will match if the character appears at least once or more. Let's see how this works with a simple example:

In [4]:
import re

# Sample text
text = "cat, caat, caaat, ct"

# Pattern using plus to match 'c' followed by one or more 'a's, then 't'
pattern = "ca+t"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['cat', 'caat', 'caaat']


### Explanation
- **Text:** `"cat, caat, caaat, ct"`
- **Pattern:** `"ca+t"`
  - The pattern `ca+t` means:
    - `c` - Match the letter 'c'.
    - `a+` - Match one or more occurrences of the letter 'a'.
    - `t` - Match the letter 't'.

- **Result:** `['cat', 'caat', 'caaat']`
  - The `findall` function will return a list of all occurrences where the letter 'c' is followed by one or more 'a's and then the letter 't'.
  - In this example, it matches:
    - "cat" (one 'a'),
    - "caat" (two 'a's),
    - "caaat" (three 'a's).
  - It does **not** match "ct" because there are zero 'a's, and the plus (`+`) requires at least one occurrence.

#### Note
The `+` metacharacter ensures that there is at least one occurrence of the specified character.

# Using Question Mark (`?`) Metacharacter

The question mark (`?`) metacharacter is used to match **zero or one occurrence** of the character or pattern before it. This means it will match if the character appears either once or not at all. Let's see how this works with a simple example:

In [27]:
import re

# Sample text
text = "color, colour, colr"

# Pattern using question mark to match 'colo' followed by an optional 'u', then 'r'
pattern = "colou?r"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['color', 'colour']


### Explanation
- **Text:** `"color, colour, colr"`
- **Pattern:** `"colou?r"`
  - The pattern `colou?r` means:
    - `colo` - Match the exact letters "colo".
    - `u?` - Match zero or one occurrence of the letter 'u'.
    - `r` - Match the letter 'r'.

- **Result:** `['color', 'colour']`
  - The `findall` function will return a list of all occurrences where:
    - "colo" is followed by zero or one 'u' and then 'r'.
  - In this example, it matches:
    - "color" (zero 'u'),
    - "colour" (one 'u').
  - It does **not** match "colr" because it doesn't follow the pattern "colo".

#### Note
The `?` metacharacter makes the preceding character (in this case, 'u') optional. It will match both the presence and absence of that character.

# Using Square Brackets (`[]`) Metacharacter

The square brackets (`[]`) are used to define a **character class** in regex. A character class matches any **one** of the characters inside the brackets. Let's see how this works with a simple example:

In [28]:
import re

# Sample text
text = "cat, cot, cut, cit, cet"

# Pattern using square brackets to match 'c' followed by any vowel, then 't'
pattern = "c[aeiou]t"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['cat', 'cot', 'cut', 'cit', 'cet']


### Explanation
- **Text:** `"cat, cot, cut, cit, cet"`
- **Pattern:** `"c[aeiou]t"`
  - The pattern `c[aeiou]t` means:
    - `c` - Match the letter 'c'.
    - `[aeiou]` - Match **any one** vowel (a, e, i, o, u).
    - `t` - Match the letter 't'.

- **Result:** `['cat', 'cot', 'cut', 'cit', 'cet']`
  - The `findall` function will return a list of all occurrences where:
    - "c" is followed by any single vowel and then "t".
  - In this example, it matches all words: "cat," "cot," "cut," "cit," "cet," since each word has a single vowel between 'c' and 't'.

#### Note
- You can use square brackets to match a range of characters as well:
  - `[a-z]` matches any lowercase letter.
  - `[0-9]` matches any digit.

# Using Pipe (`|`) Metacharacter

The pipe (`|`) metacharacter is used to match **either** of the patterns separated by it. It works like a logical "OR". Let's see how this works with a simple example:

In [22]:
import re

# Sample text
text = "cat, dog, bat, rat,man,fan,cat,dog"

# Pattern using pipe to match either 'cat' or 'dog'
pattern = "cat|dog|man"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['cat', 'dog', 'man', 'cat', 'dog']


### Explanation
- **Text:** `"cat, dog, bat, rat"`
- **Pattern:** `"cat|dog"`
  - The pattern `cat|dog` means:
    - Match either the exact word "cat" **or** the exact word "dog".

- **Result:** `['cat', 'dog']`
  - The `findall` function will return a list of all occurrences where:
    - The word matches either "cat" or "dog".
  - In this example, it matches:
    - "cat"
    - "dog"
  - It does **not** match "bat" or "rat" because these words are not "cat" or "dog".

#### Note
The `|` metacharacter is useful when you want to search for multiple possible patterns in a single search.

# Parentheses (()) Metacharacter

Parentheses (`()`) in regex are used for **grouping** and **capturing** parts of a pattern. This allows you to create subpatterns within your regular expression, which can be useful when you want to extract specific parts of a match.

In [16]:
import re

# Sample text
text = "apple 123, banana 456, cherry 789"

# Pattern using parentheses to capture the fruit name and its number
pattern = r"(\w+) (\d+)"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

[('apple', '123'), ('banana', '456'), ('cherry', '789')]


#### Explanation
- **Text:** `"apple 123, banana 456, cherry 789"`
- **Pattern:** `r"(\w+) (\d+)"`
  - The pattern `(\w+) (\d+)` means:
    - `(\w+)` - Match and capture one or more word characters (letters, digits, or underscores).
    - `( )` - Parentheses group the matched word characters so that they can be captured separately.
    - `(\d+)` - Match and capture one or more digits.
    - The parentheses around `\w+` and `\d+` create **capture groups**, which store the matched text separately.

- **Result:** `[('apple', '123'), ('banana', '456'), ('cherry', '789')]`
  - The `findall` function returns a list of tuples, where each tuple contains the captured groups.
  - In this example, it matches:
    - "apple" and "123"
    - "banana" and "456"
    - "cherry" and "789"

#### Note
- Parentheses can also be used for grouping without capturing by using `(?: ...)`, which creates a non-capturing group.
- Captured groups can be referenced later in regex patterns or in replacement strings.

# Curly Braces (`{}`) Metacharacter

Curly braces (`{}`) in regex are used to specify the **number of occurrences** of the character or group preceding them. This allows you to match a specific number of times, a minimum number of times, or a range of times.

In [17]:
import re

# Sample text
text = "aa aaaa aaaaa aaaa aaa"

# Pattern using curly braces to match exactly 4 'a' characters
pattern = "a{2,4}"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['aa', 'aaaa', 'aaaa', 'aaaa', 'aaa']


### Explanation
- **Text:** `"aa aaaa aaaaa aaa aaaaaa"`
- **Pattern:** `"a{4}"`
  - The pattern `a{4}` means:
    - `a` - Match the letter 'a'.
    - `{4}` - Match exactly 4 occurrences of the preceding character ('a').

- **Result:** `['aaaa', 'aaaa']`
  - The `findall` function returns a list of all occurrences where:
    - There are exactly 4 'a' characters in a row.
  - In this example, it matches:
    - "aaaa" (exactly 4 'a's)
    - It does **not** match "aa" (2 'a's), "aaaaa" (5 'a's), "aaa" (3 'a's), or "aaaaaa" (6 'a's).

### Different Uses of Curly Braces

1. **Exact Number: `{n}`**
   - Matches exactly `n` occurrences.
   - Example: `"a{3}"` matches "aaa".

2. **At Least Number: `{n,}`**
   - Matches at least `n` occurrences.
   - Example: `"a{3,}"` matches "aaa", "aaaa", "aaaaa", etc.

3. **Range: `{n,m}`**
   - Matches between `n` and `m` occurrences (inclusive).
   - Example: `"a{2,4}"` matches "aa", "aaa", "aaaa" (but not "a" or "aaaaa").

In [19]:
import re

# Sample text
text = "abc abbc abbbc abbbbbc"

# Pattern using curly braces to match between 2 to 4 'b' characters
pattern = "ab{2,4}c"

# Find all matches
matches = re.findall(pattern, text)

print(matches)

['abbc', 'abbbc']


### Explanation
- **Pattern:** `"ab{2,4}c"`
  - Matches:
    - "abbc" (2 'b's)
    - "abbbc" (3 'b's)
    - "abbbbc" (4 'b's)

- **Result:** `['abbc', 'abbbc', 'abbbbc']`