### Day 2 Exercise: Data Enrichment & Basic Fraud Rule Implementation üöÄ

#### Objective
This exercise builds upon the cleaned data from Day 1. You will enrich transaction data with customer information, implement a basic rule-based fraud detection logic using Python and Pandas, and conceptualize the target database schema and initial ETL steps.

#### Scenario
Continuing with the "Real-time Transaction Fraud Detection System," you now need to combine the cleaned customer and transaction data. Based on this enriched dataset, you will apply a simple fraud detection rule and prepare the data for storage.

#### Data to Use üìä
You will use the cleaned Pandas DataFrames, `df_customers` and `df_transactions`, resulting from the Day 1 exercise.

---

### Part 1: Python/Pandas Implementation üíª

#### 1.1 Data Enrichment
* **Merge** `df_transactions` with `df_customers` based on `customer_id`.
* The merged DataFrame should include relevant customer details such as `customer_name`, `customer_tier`, and `registration_date`.
* Name the resulting DataFrame `df_enriched_transactions`.
* **Print** the first 5 rows and the `.info()` of `df_enriched_transactions`.

#### 1.2 Basic Fraud Rule Implementation
* You will implement "**Rule 1: High Transaction Value for New Customers**".
* Define a "**new customer**" as someone whose `registration_date` is within the last 7 days relative to the latest timestamp in `df_transactions`.
* Define a "**high transaction value**" as an `amount` greater than $500.
* Create a new boolean column, `is_fraudulent_rule1`, in `df_enriched_transactions` that is `True` if both conditions are met, and `False` otherwise.
* **Count and print** the number of transactions flagged by this rule.
* **Display** the `transaction_id`, `customer_id`, `amount`, `timestamp`, and `is_fraudulent_rule1` for the flagged transactions.

---

### Part 2: Conceptual Questions ü§î

#### 2.1 Conceptual ETL Pipeline Design & SQL Considerations

Based on the `df_enriched_transactions` DataFrame, describe your design for the following.

* **Target Database Schema** üóÉÔ∏è:
    * `dim_customer` table with columns: `customer_sk` (Surrogate Key, INTEGER, Primary Key), `customer_id` (VARCHAR, Unique), `customer_name` (VARCHAR), `customer_email` (VARCHAR), `registration_date` (DATE), `customer_tier` (VARCHAR), `last_login_date` (DATE).
    * `fact_transactions` table with columns: `transaction_sk` (Surrogate Key, INTEGER, Primary Key), `transaction_id` (VARCHAR, Unique), `customer_sk` (INTEGER, Foreign Key to `dim_customer`), `amount` (DECIMAL), `timestamp` (DATETIME), `currency` (VARCHAR), `ip_address` (VARCHAR), `transaction_hour` (INTEGER), `is_fraudulent_rule1` (BOOLEAN).
    * `fact_fraud_events` table for flagged transactions with columns: `fraud_event_sk` (Surrogate Key, INTEGER, Primary Key), `transaction_sk` (INTEGER, Foreign Key to `fact_transactions`), `rule_id` (VARCHAR), `detection_timestamp` (DATETIME), `severity` (VARCHAR).

* **ETL Steps** ‚öôÔ∏è (Describe in Markdown or with pseudocode):
    * **Extraction**: How would you conceptually extract the `df_enriched_transactions` data for loading?
    * **Transformation (`dim_customer`)**: How would you manage new vs. existing customers in the `dim_customer` table (e.g., INSERT vs. UPDATE)? What SQL operations would be needed?
    * **Transformation (`fact_transactions`)**: How would you look up the `customer_sk` from `dim_customer`? What SQL operations would be required for this table?
    * **Transformation (`fact_fraud_events`)**: How would you select only the flagged transactions to populate this table, including the `rule_id` and `detection_timestamp`?
    * **Loading**: How would you conceptually load the transformed data into the target tables?

#### 2.2 Edge Cases & Robustness

* **Edge Case 1: Referential Integrity Violation** ‚ö†Ô∏è:
    * A `customer_id` in `df_transactions` does not exist in `df_customers` (e.g., 'C999' from the sample data). Describe how a robust ETL pipeline should handle this (e.g., left join and identify unmatched, quarantine data, or use a default customer record).

* **Edge Case 2: Slowly Changing Dimensions (SCD Type 2)** ‚è≥:
    * Imagine a customer's `customer_tier` can change, and you must preserve the historical tier for past transactions. Explain conceptually how you would modify the `dim_customer` table and the ETL process to support SCD Type 2 (e.g., by adding `start_date`, `end_date`, and `is_current` columns).