# Transform orders data - string to json
1. Preprocess the json string to fix the data quality issues
2. Transform json string to json object
3. Write transformed data to the silver schema

## 1. Preprocess the json string to fix the data quality issues
Documentation: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/regexp_replace

In [0]:
%sql
create or replace temp view tv_orders_fixed
as
select
  value,
  regexp_replace(value, '"order_date": (\\d{4}-\\d{2}-\\d{2})', '"order_date": "\$1"') as fixed_value
from gizmobox.bronze.v_orders;

## 2. Transform json string to json object
* Function [schema_of_json](https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/schema_of_json)
* Function [from_json](https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/from_json)

In [0]:
%sql
select
  schema_of_json(fixed_value)
from tv_orders_fixed
limit 1;

In [0]:
%sql
select
  from_json(
    fixed_value,
    'STRUCT<customer_id: BIGINT, items: ARRAY<STRUCT<category: STRING, details: STRUCT<brand: STRING, color: STRING>, item_id: BIGINT, name: STRING, price: BIGINT, quantity: BIGINT>>, order_date: STRING, order_id: BIGINT, order_status: STRING, payment_method: STRING, total_amount: BIGINT, transaction_timestamp: STRING>') -- provide the schema from schema_of_json 
    AS json_value
from tv_orders_fixed;

## 3. Write transformed data to the silver schema

In [0]:
%sql
CREATE OR REPLACE TABLE gizmobox.silver.orders_json
as
select
  from_json(
    fixed_value,
    'STRUCT<customer_id: BIGINT, items: ARRAY<STRUCT<category: STRING, details: STRUCT<brand: STRING, color: STRING>, item_id: BIGINT, name: STRING, price: BIGINT, quantity: BIGINT>>, order_date: STRING, order_id: BIGINT, order_status: STRING, payment_method: STRING, total_amount: BIGINT, transaction_timestamp: STRING>'
    ) as json_value
from tv_orders_fixed;

In [0]:
%sql
select *
from gizmobox.silver.orders_json;