# Discovery

This document describes the initial design definitions for the simplified Know Your Client (KYC) solution, including source tables, end table, metadata, and typing decisions.

---

### Pipeline Objective

Create a data pipeline for ingestion, transformation and processing of customer and transaction information, with a focus on risk.

---

### Source tables (CSV generated)

#### 1. clients.csv – Costumer Information
| Column      | Type     | Description                      |
|-------------|----------|---------------------------------|
| client_id   | INT      | Unique customer identifier  |
| name        | STRING   | Full Name                   |
| age         | INT      | Customer Age              |
| country     | STRING   | Country of Residence             |

---
#### 2. transactions.csv – Transaction details
| Column      | Type     | Description                                   |
|------------------|------------|--------------------------------------------|
| transaction_id    | INT        | Unique transaction ID |
| client_id | INT | Foreign key referencing customer |
| transaction_amount| FLOAT | Transaction value |
| transaction_date | DATE | Transaction date |


#### 3. high_risk_countries.csv - Countries names 
| Column      | Type     | Description                      |
|-------------|----------|---------------------------------|
| country_name   | STRING      | country name  |

---

### Derived Table: risk_events.csv

This table will be generated during processing, based on business rules.

#### Fields:
| Column      | Type     | Description                                                                 |
|--------------------|----------|--------------------------------------------------------------------------|
| event_id | STRING | Unique Risk Event ID (UUID) |
| client_id | INT | Key for customer |
| risk_score | FLOAT | Calculated risk score |
| trigger_reason | STRING | Alert reason(s) (e.g. "High Amount", "High Risk Country") |
|evaluation_date| TIMESTAMP | Evaluation date |

> *Partition field:* evaluation_date
---

### Final Table: client_risk_summary

Aggregated table with risk per customer.

| Column               | Type   | Description                                                                 |
|--------------------|----------|--------------------------------------------------------------------------|
| client_id | INT | Customer ID |
| name | STRING | Customer name |
| age | INT | Age |
| country | STRING | Country |
| total_transactions | INT | Total transactions carried out |
| total_amount | FLOAT | Sum of transacted values ​​|
| risk_score | FLOAT | Aggregated final score |
| last_risk_event | TIMESTAMP| Date of last risk event |
| evaluation_date | timestamp | partition key |

---