# Lakebridge Analyzer (Simulated)
## Legacy Oracle SQL Analysis & Validation

## Purpose of Lakebridge Analyzer (Simulated)
The purpose of the Lakebridge Analyzer step is to understand legacy Oracle SQL logic before modernization.

In real-world migrations, Lakebridge Analyzer helps:
- Parse legacy SQL / PL-SQL
- Extract business logic (joins, filters, aggregations)
- Validate whether the logic can be re-implemented in a lakehouse
- Identify dependencies (tables, columns)

> In this assessment, Lakebridge Analyzer is simulated by performing a manual analysis of the Oracle schema and PL/SQL scripts.



## Input Artifacts for Analysis
The following legacy artifacts were provided for analysis:

### Oracle Schema (`oracle_schema.sql`)
Defines the structure of the legacy database:
- `CUSTOMERS`
- `ORDERS`

### Oracle PL/SQL / SQL Script (`sample_plsql.sql`)
Defines business metrics calculated on top of the schema.

These two inputs together represent a typical legacy analytics workload.

## Schema Validation (Pre-Analysis Check)
Before analyzing the PL/SQL logic, the schema was validated to ensure:
- All referenced tables exist
- Required columns are available
- Logical relationships can be inferred

### Relevant Tables
CUSTOMERS
- CUSTOMER_ID
- NAME
- EMAIL
- CREATED_DATE

ORDERS
- ORDER_ID
- CUSTOMER_ID
- ORDER_DATE
- AMOUNT


### Identified Logical Relationship
```sql
CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID
```

This relationship enables customer-level aggregations on order data.

Conclusion:
The schema fully supports customer-order analytical metrics.

## PL/SQL Logic Analysis (Core Analyzer Output)
The `sample_plsql.sql` script was analyzed to extract business logic components.

### Tables Used

- CUSTOMERS
- ORDERS

### Join Logic

- Inner Join
- Join condition:
```sql
CUSTOMERS.CUSTOMER_ID = ORDERS.CUSTOMER_ID
```

This join enables enrichment of transactional order data with customer attributes.


### Filters Identified

Typical filters in the script include:

- Date-based filtering on ORDER_DATE
- Optional customer-level constraints (if present)

These filters restrict the dataset to relevant business time windows.

### Aggregations Identified

The following aggregations are applied:
- SUM(AMOUNT) → Total order value
- COUNT(ORDER_ID) → Number of orders

Aggregations are performed per customer, indicating a customer-level metric.

### Grouping Columns

- CUSTOMER_ID
- (Optionally) customer attributes such as NAME

## Business Metric Interpretation

Based on the analyzed logic, the PL/SQL script is designed to calculate:

> Customer-level sales metrics, such as:

- Total revenue per customer
- Total number of orders per customer
- Time-bounded sales performance

This confirms that the legacy SQL implements business-critical analytical logic, not just raw data extraction.

## Analyzer Feasibility Assessment
The following checks were performed:
| Check                       | Result |
| --------------------------- | ------ |
| Required tables exist       | Done   |
| Required columns exist      | Done   |
| Join keys available         | Done   |
| Aggregations supported      | Done   |
| Logic reproducible in Spark | Done   |



Conclusion:

The legacy PL/SQL logic is fully compatible for re-implementation in a modern lakehouse using Spark SQL or PySpark.


## Lakebridge Analyzer (Simulated) – Summary
In a real migration scenario, Lakebridge Analyzer would automatically generate this analysis.

For this assessment:
- The analyzer step is simulated manually
- The same outputs are produced:
  - Dependency analysis
  - Logic extraction
  - Migration feasibility validation

This ensures semantic correctness before translation.