In [None]:
# Removing duplicates
# SQL: SELECT DISTINCT * FROM table
df.drop_duplicates()

# SQL: SELECT DISTINCT column1, column2 FROM table
df.drop_duplicates(subset=['column1', 'column2'])

# SQL: UPDATE table SET new_column = CASE WHEN condition THEN value1 ELSE value2 END
df['category'] = np.where(df['total_amount'] > 1000, 'High Value', 'Regular')


# Multiple conditions (SQL: CASE WHEN ... WHEN ... ELSE)
conditions = [
    df['total_amount'] > 2000,
    df['total_amount'] > 1000,
    df['total_amount'] > 500
]
choices = ['Premium', 'High Value', 'Medium Value']
df['customer_tier'] = np.select(conditions, choices, default='Regular')
# A new column customer_tier is added to the DataFrame.
# Each condition is checked in order, and the first match wins. 
# So even if a value is > 2000 and also > 1000, it gets categorized as 'Premium'.
# If none of the conditions match (i.e., total_amount <= 500), it assigns 'Regular'.

# Creating new columns from existing ones
df['profit_margin'] = (df['revenue'] - df['cost']) / df['revenue'] * 100
df['full_name'] = df['first_name'] + ' ' + df['last_name']

# Working with dates
df['order_date'] = pd.to_datetime(df['order_date'])
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
df['day_of_week'] = df['order_date'].dt.day_name()
df['days_since_order'] = (pd.Timestamp.now() - df['order_date']).dt.days

# Binning continuous data (SQL: CASE WHEN with ranges)
df['age_group'] = pd.cut(df['age'], bins=[0, 25, 40, 60, 100], 
                        labels=['Young', 'Adult', 'Middle Age', 'Senior'])


### Binning Continuous Data with `pd.cut()`

This example demonstrates how to convert continuous numeric data (like age) into categorical bins:

```python
df['age_group'] = pd.cut(df['age'],
                         bins=[0, 25, 40, 60, 100],
                         labels=['Young', 'Adult', 'Middle Age', 'Senior'])
```
### 🔍 How it Works

- `df['age']`: The numeric column to bin.
- `bins=[0, 25, 40, 60, 100]`: Defines interval edges (exclusive on left, inclusive on right):
  - `(0, 25]` → `'Young'`
  - `(25, 40]` → `'Adult'`
  - `(40, 60]` → `'Middle Age'`
  - `(60, 100]` → `'Senior'`
- `labels=[...]`: Category labels for each bin.
- A new column `age_group` is added with these labels.

> ℹ️ Default bin behavior is **right-inclusive** — values exactly on a bin edge fall into the **higher bin** (e.g., `25` → `'Young'`, `26` → `'Adult'`).

### Why use `pd.cut()`?

| Use Case                        | Why                                      |
| ------------------------------- | ---------------------------------------- |
| Turn numeric ranges into labels | Useful for reporting, modeling, grouping |
| Easy interval definition        | Cleaner than multiple `np.where()` calls |

