# Regular Expressions Basics

In this notebook, we will learn about regular expressions (regex) and how they are useful for text processing tasks.

Let's explore how regex can help us identify patterns in text, validate data like emails and phone numbers, and perform text cleaning tasks.

## Common Regex Patterns

Regular expressions use special patterns to match text. Here are some common regex patterns:

- `\d`: Matches any digit (0-9)
- `\w`: Matches word characters (letters, digits, underscore)
- `\s`: Matches whitespace characters
- `+`: One or more occurrences
- `*`: Zero or more occurrences

## String Encoding & Decoding

String encoding converts text into bytes, which is necessary for file handling and web APIs.  

- `encode()`: Converts string to bytes
- `decode()`: Converts bytes back to string

This is especially important for handling text data correctly across different systems.

In [None]:
import re

# Regular expressions for AI text processing
text = "Contact us at support@ai-company.com or call +1-555-123-4567"

# Find email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'
emails = re.findall(email_pattern, text)
print("Emails found:", emails)  # ['support@ai-company.com']

# Find phone numbers
phone_pattern = r'\+\d{1,3}-\d{3}-\d{3}-\d{4}'
phones = re.findall(phone_pattern, text)
print("Phones found:", phones)  # ['+1-555-123-4567']

# String encoding/decoding
message = "Hello AI! ðŸ¤–"
encoded = message.encode('utf-8')
print("Encoded:", encoded)      # b'Hello AI! \xf0\x9f\xa4\x96'
decoded = encoded.decode('utf-8')
print("Decoded:", decoded)      # Hello AI! ðŸ¤–

# AI use case: Clean social media text
social_text = "Check out this amazing #AI tool! ðŸš€ Visit https://example.com @user123"
# Remove URLs
clean_text = re.sub(r'https?://[^\s]+', '', social_text)
# Remove mentions
clean_text = re.sub(r'@\w+', '', clean_text)
print("Cleaned:", clean_text)   # Check out this amazing #AI tool! ðŸš€ Visit

## Key Takeaway

Regular expressions are your **power tools** for advanced text processing! They allow you to efficiently extract, validate, and clean textual data, which is especially useful in AI and data analysis tasks.

## Discussion

"How would regex help in preprocessing social media data for AI analysis?"