# 🔍 Mastering Regular Expressions (Regex) in Python

---

## 🚀 Introduction

Regular expressions (regex) are powerful tools for pattern matching and string manipulation in Python. Understanding regex allows you to perform advanced text processing tasks with ease and precision.

---

## 🔑 Key Concepts

### 📝 Basic Syntax
- **Literals:** Characters that match themselves.
    - Example: The regex `hello` matches the string "hello" exactly.
- **Metacharacters:** Special characters with special meanings in regex.
    - Example: The regex `.` matches any single character except newline.
- **Character Classes:** Define a set of characters to match.
    - Example: The regex `[aeiou]` matches any vowel.
- **Quantifiers:** Specify the number of occurrences of a character or group.
    - Example: The regex `a{2,4}` matches 'a', 'aa', 'aaa', or 'aaaa'.
- **Anchors:** Specify positions in the text.
    - Example: The regex `^hello` matches 'hello' only if it occurs at the start of a line.
- **Groups and Capture Groups:** Create subpatterns for quantification or capturing matched text.
    - Example: The regex `(ab)+` matches 'ab', 'abab', 'ababab', etc.
- **Escaping:** Use `\` to escape metacharacters if you want to match them literally.
    - Example: The regex `\.` matches a literal period.

### 🌀 Common Patterns
- **Literal Matches:** Match specific characters.
- **Character Classes:** Match any character from a specified set.
- **Quantifiers:** Control the number of occurrences of a character or group.
- **Anchors:** Specify the position in the text.
- **Alternation:** Match one of several alternatives.
- **Groups:** Create subpatterns for quantification or capturing.
- **Escape Sequences:** Match special characters or character classes.

### 💻 Usage in Python
- Python's `re` module provides support for regex operations.
- Common functions include `re.match()`, `re.search()`, `re.findall()`, `re.sub()`, etc.
- Patterns are compiled into regex objects using `re.compile()` for efficient reuse.

---

## 💡 Practical Applications

### 🎯 Text Search and Extraction
- Find specific patterns or substrings within text data.
```python
import re

# Search for email addresses in a text document
text = "Contact us at email@example.com or info@example.org for inquiries."
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
emails_found = re.findall(email_pattern, text)
print("Email addresses found:", emails_found)

print("Email addresses found:", emails_found)



## ✅ Validation-  
Verify the format or structure of user input
.

In [1]:
import re

# Search for URLs in a text document
text = "Visit our website at https://www.example.com for more information."
url_pattern = r'https?://(?:www\.)?\w+\.\w+'
urls_found = re.findall(url_pattern, text)
print("URLs found:", urls_found)

URLs found: ['https://www.example.com']


In [2]:
import re

# Validate phone number format
phone_number = "+1234567890"
phone_pattern = r'^\+\d{1,3}\d{9}$'
is_valid = bool(re.match(phone_pattern, phone_number))
print("Is the phone number valid?", is_valid)


Is the phone number valid? True


In [3]:
import re

# Validate password strength
password = "MySecurePassword123!"
password_pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$'
is_strong = bool(re.match(password_pattern, password))
print("Is the password strong?", is_strong)


Is the password strong? True


## 🧹 Data Cleaning-  
Remove or replace unwanted characters, normalize text data.

In [4]:
import re

# Remove punctuation from text
text_with_punctuation = "Hello, world! How are you?"
cleaned_text = re.sub(r'[^\w\s]', '', text_with_punctuation)
print("Cleaned text:", cleaned_text)


Cleaned text: Hello world How are you


In [5]:
import re

# Remove HTML tags from text
html_text = "<p>This is <b>bold</b> and <i>italic</i>.</p>"
cleaned_text = re.sub(r'<[^>]+>', '', html_text)
print("Cleaned text:", cleaned_text)


Cleaned text: This is bold and italic.


## 🔖 Tokenization-  
Split text into meaningful tokens (words, sentences, etc.).

In [6]:
import re

# Tokenize a sentence into words
sentence = "This is a sample sentence."
words = re.findall(r'\b\w+\b', sentence)
print("Words in the sentence:", words)


Words in the sentence: ['This', 'is', 'a', 'sample', 'sentence']


In [7]:
import re

# Tokenize a document into sentences
document = "This is the first sentence. This is the second sentence."
sentences = re.findall(r'(?<=[.!?])\s+', document)
print("Sentences in the document:", sentences)


Sentences in the document: [' ']


## 📊 Data Parsing-  
Extract structured information from unstructured text.

In [9]:
import re

# Parse log files for specific data fields
log_entry = "2022-02-25 12:30:45 - INFO - User logged in: username123"
log_pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) - (\w+) - (.+)'
parsed_data = re.match(log_pattern, log_entry)
if parsed_data:
    print("Timestamp:", parsed_data.group(1))
    print("Log level:", parsed_data.group(2))
    print("Message:", parsed_data.group(3))


Timestamp: 2022-02-25 12:30:45
Log level: INFO
Message: User logged in: username123


In [12]:
import re

# Extract data from a CSV file
csv_data = "John,Doe,30\nJane,Smith,25\n"
csv_pattern = r'(\w+),(\w+),(\d+)'
matches = re.findall(csv_pattern, csv_data)
print("CSV data:", matches)


CSV data: [('John', 'Doe', '30'), ('Jane', 'Smith', '25')]


# ✅ CODE EXPLAINATION

# Password Generation with Constraints

This Python script generates a random password with specific constraints.

---

## Dependencies

- **re**: Provides support for working with regular expressions.
- **secrets**: Generates cryptographically strong random numbers.
- **string**: Contains string constants and functions for manipulating strings.

---

## Functionality

The `generate_password` function generates a password with the following constraints:

- **length**: Length of the password (default: 16).
- **nums**: Minimum number of digits (default: 1).
- **special_chars**: Minimum number of special characters (default: 1).
- **uppercase**: Minimum number of uppercase letters (default: 1).
- **lowercase**: Minimum number of lowercase letters (default: 1).

The password is generated by randomly selecting characters from letters (both uppercase and lowercase), digits, and special characters. The function then ensures that the generated password meets the specified constraints by using regular expressions to count the occurrences of digits, special characters, uppercase letters, and lowercase letters.

---

## Usage

To use the script, call the `generate_password` function without any arguments:

```python
new_password = generate_password()
print('Generated password:', new_password)
