# Python Regex & Datetime Cheatsheet

---

## Regular Expressions (Regex)

### Common Regex Metacharacters
| Symbol | Meaning |
|--------|-------------------------------|
| `.`    | Any character except newline |
| `^`    | Start of string              |
| `$`    | End of string                |
| `*`    | 0 or more                    |
| `+`    | 1 or more                    |
| `?`    | 0 or 1 (optional)            |
| `{n}`  | Exactly n times              |
| `{n,}` | n or more times              |
| `{n,m}`| Between n and m times        |
| `[]`   | Character set / class        |
| `[^]`  | Negated set                  |
| `\d`   | Digit (0–9)                  |
| `\D`   | Non-digit                    |
| `\w`   | Word character (a-zA-Z0-9_)  |
| `\W`   | Non-word character           |
| `\s`   | Whitespace (space, tab, etc) |
| `\S`   | Non-whitespace               |
| `\b`   | Word boundary                |
| `()`   | Grouping                     |
| `|`    | OR                           |

---

## Common Regex Patterns (Reusable Snippets)

### Email Regex (strict but practical)
```regex
\b[\w\.-]+@[\w\.-]+\.\w{2,}\b
```

### Phone Numbers (Various Formats)
```regex
\b(?:\+1\s?)?(?:\(?\d{3}\)?[\s.-]?)\d{3}[\s.-]?\d{4}\b
```

### Phone Numbers (Strict Format Only — `XXX-XXX-XXXX`)
```regex
\b\d{3}-\d{3}-\d{4}\b
```

### Product ID (Alphanumeric with optional hyphens)
```regex
^[a-zA-Z0-9\-]+$
```

### Date Matching (Multiple Styles)
```regex
\b(?:\d{4}[-/]\d{2}[-/]\d{2}|\d{2}[-/]\d{2}[-/]\d{4}|\d{4}\d{2}\d{2})\b
```

---

## Regex Utility Functions (Python)

### Validate Email
```python
def validate_email(email: str) -> bool:
    pattern = r"\b[\w\.-]+@[\w\.-]+\.\w{2,}\b"
    return re.match(pattern, email) is not None
```

### Standardize Phone Number
```python
def standardize_phone(phone: str) -> str:
    digits = re.findall(r'\d+', phone)
    number = ''.join(digits)

    if number.startswith('1') and len(number) == 11:
        number = number[1:]

    return f"{number[:3]}-{number[3:6]}-{number[6:]}" if len(number) == 10 else None
```

### Validate Product ID
```python
def is_valid_product_id(product_id: str) -> bool:
    if not product_id:
        return False
    pattern = r"^[a-zA-Z0-9\-]+$"
    return bool(re.match(pattern, product_id))
```

### Extract Dates from Text
```python
def extract_dates(text: str):
    pattern = r"\b(?:\d{4}[-/]\d{2}[-/]\d{2}|\d{2}[-/]\d{2}[-/]\d{4}|\d{4}\d{2}\d{2})\b"
    return re.findall(pattern, text)
```

---

## Python Datetime Handling

### Get Current Date/Time
```python
from datetime import datetime, timedelta, date

now = datetime.now()
today = date.today()
```

### Format Datetime → String
```python
now.strftime("%Y-%m-%d %H:%M:%S")
```

### Parse String → Datetime
```python
datetime.strptime("2025-03-19 12:30", "%Y-%m-%d %H:%M")
```

### Date Arithmetic
```python
delta = timedelta(days=3)
future = now + delta
past = now - timedelta(days=7)
```

### Extract Parts
```python
now.year, now.month, now.day
now.hour, now.minute, now.second
```

### Common Format Codes
| Code | Meaning         | Example     |
|------|------------------|-------------|
| `%Y` | Full year        | 2025        |
| `%m` | Month (01–12)    | 03          |
| `%d` | Day of month     | 19          |
| `%H` | Hour (24hr)      | 14          |
| `%M` | Minute           | 30          |
| `%S` | Second           | 45          |
| `%A` | Weekday name     | Wednesday   |

---

## Pandas Datetime Handling

### Convert Column to Datetime
```python
import pandas as pd
df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")
```

### Extract Date Parts
```python
df["year"] = df["order_date"].dt.year
df["month"] = df["order_date"].dt.month
df["day"] = df["order_date"].dt.day
df["weekday"] = df["order_date"].dt.day_name()
```

### Generate Date Ranges
```python
pd.date_range(start="2025-03-01", end="2025-03-10", freq="D")
```

### Filter by Date
```python
df[df["order_date"] >= "2025-03-01"]
```

### Group by Date
```python
df.groupby(df["order_date"].dt.month)["amount"].sum()
```

---

## Custom Multi-Format Date Parser
```python
from datetime import datetime

def parse_date(date_str: str):
    date_formats = [
        "%Y-%m-%d",
        "%m/%d/%Y",
        "%d-%m-%Y",
        "%Y%m%d",
        "%b %d, %Y",
        "%Y-%m-%dT%H:%M:%S"
    ]
    for fmt in date_formats:
        try:
            return datetime.strptime(date_str, fmt)
        except ValueError:
            continue
    return None
```

---

## Useful Pandas Checks
```python
df.isnull().sum()           # Count NaNs
df.describe()               # Summary stats
df["order_date"].value_counts()  # Unique dates
```

---

## Final Data Cleaning Summary
- Clean `email` using regex
- Standardize `phone_number` formats
- Validate `product_id` format
- Use `pd.to_datetime()` for mixed date formats
- Convert `amount` with `pd.to_numeric(errors='coerce')`
- Filter rows with invalid or missing data
- Group by `region`, `order_date`, etc. for analysis

---

🚀 **Use this markdown as a study guide or drop directly into your code repo for quick reference.**
