# Handle JSON normalization and schema evolution


A practical demo of how dlt flattens deeply nested JSON and handles schema changes for you—no custom pipelines, no headaches.



#### **Install `dlt`⏳**

In [None]:
%%capture
!pip install dlt[duckdb] # Install dlt with all the necessary DuckDB dependencies
!dlt --version

## Chapter 1: From JSON to clean rables

We begin with a nested JSON from a coffee shop API.

In [None]:
# Coffee shop order with nested objects & arrays
coffee_orders = [
    {
        "order_id": 1001,
        "timestamp": "2024-12-18T09:15:30Z",
        "customer": {  # ← Nested object
            "name": "Alice",
            "email": "alice@example.com",
            "loyalty_tier": "gold"
        },
        "items": [  # ← Array of objects
            {
                "item": "Latte",
                "size": "large",
                "price": 5.50,
                "customizations": ["extra shot", "oat milk"]  # ← Nested array
            },
            {
                "item": "Croissant",
                "price": 3.25,
                "customizations": ["warmed"]
            }
        ],
        "payment": {  # ← Another nested object
            "method": "card",
            "tip": 1.50
        }
    }
]


1.1. Create and run your pipeline
Just hit go, dlt handles the heavy lifting:

- Auto detects your JSON schema

- Flattens nested data into clean, queryable tables

- Connects everything with the right foreign keys, no sweat



In [None]:
import dlt

pipeline = dlt.pipeline(
    pipeline_name="coffee_normalization_demo",
    destination="duckdb",
    dataset_name="coffee_shop",
    dev_mode=True  # Allows schema resets during development
)

load_info = pipeline.run(coffee_orders, table_name="orders")
print(load_info)

Pipeline coffee_normalization_demo load step completed in 0.52 seconds
1 load package(s) were loaded to destination duckdb and into dataset coffee_shop_20250730120157
The duckdb destination used duckdb:////content/coffee_normalization_demo.duckdb location to store data
Load package 1753876917.1535316 is LOADED and contains no failed jobs


In [None]:
dataset = pipeline.dataset()
print(dataset.row_counts().df())

print("\n" + "-" * 80 + "\n")

orders_relation = dataset.orders
print("🗂️ Table: orders")
orders_df = orders_relation.df()
display(orders_df)

print("\n" + "-" * 80 + "\n")

orders__items = dataset.orders__items
print("🗂️ Table: orders__items")
orders__items_df = orders__items.df()
display(orders__items_df)

print("\n" + "-" * 80 + "\n")

orders__items__customizations = dataset.orders__items__customizations
print("🗂️ Table: orders__items__customizations")
orders__items__customizations_df = orders__items__customizations.df()
display(orders__items__customizations_df)

print("\n" + "-" * 80 + "\n")


                      table_name  row_count
0                         orders          1
1                  orders__items          2
2  orders__items__customizations          3

--------------------------------------------------------------------------------

🗂️ Table: orders


Unnamed: 0,order_id,timestamp,customer__name,customer__email,customer__loyalty_tier,payment__method,payment__tip,_dlt_load_id,_dlt_id
0,1001,2024-12-18 09:15:30+00:00,Alice,alice@example.com,gold,card,1.5,1753876917.1535316,zvSxg0yG9YbJcg



--------------------------------------------------------------------------------

🗂️ Table: orders__items


Unnamed: 0,item,size,price,_dlt_parent_id,_dlt_list_idx,_dlt_id
0,Latte,large,5.5,zvSxg0yG9YbJcg,0,gU+hxgOQSXHIUQ
1,Croissant,,3.25,zvSxg0yG9YbJcg,1,HYuqkMXYniblag



--------------------------------------------------------------------------------

🗂️ Table: orders__items__customizations


Unnamed: 0,value,_dlt_parent_id,_dlt_list_idx,_dlt_id
0,extra shot,gU+hxgOQSXHIUQ,0,Rq8TbbDagkwBzA
1,oat milk,gU+hxgOQSXHIUQ,1,kBfjzyV6Pl8X2Q
2,warmed,HYuqkMXYniblag,0,s+q79Vy4T2wlaw



--------------------------------------------------------------------------------



## 1.3. Let’s explore the tables dlt generated




## What just happened?

How did dlt figure all this out?

**Step 1: Schema inference and type detection**

dlt scans your JSON, figures out how to structure it into tables, and picks the right types for each field (e.g. timestamp, number, string). [Learn more.]

**Step 2: Automatic normalization**

Nested objects become flat columns (like customer__name), and arrays turn into separate tables with foreign keys.

### Key points

* `orders` – Main table with orders and flattened `customer__*` and `payment__*` fields
* `orders__items` – Table for each item in the `items` array, linked to `orders`
* `orders__items__customizations` – Table for `customizations` inside each item, linked to `orders__items`


## Chapter 2: Schema Evolution

Now, say the restaurant gets a new order, and this one has some extra fields.
Let’s add it and see how dlt updates the schema automatically.

**What’s new in this order:**

* `customer.phone`: customer’s phone number
* `items[].temperature`: temperature preference for each item
* `payment.promo_code`: a promo code used at checkout
* `delivery[]`: a new array with structured delivery info


In [None]:

# New order with additional fields - schema evolution!
new_order = [
    {
        "order_id": 1002,
        "timestamp": "2024-12-18T10:30:00Z",
        "customer": {
            "name": "Bob",
            "email": "bob@example.com",
            "loyalty_tier": "silver",
            "phone": "+1-555-0123"  # 🆕 NEW FIELD!
        },
        "items": [
            {
                "item": "Cappuccino",
                "size": "medium",
                "price": 4.75,
                "temperature": "extra hot",  # 🆕 NEW FIELD!
                "customizations": ["almond milk", "extra foam"]
            }
        ],
        "payment": {
            "method": "mobile",
            "tip": 0.75,
            "promo_code": "WELCOME10"  # 🆕 NEW FIELD!
        },
        "delivery": [  # 🆕 COMPLETELY NEW ARRAY!
            {
                "address": "123 Main St",
                "driver": "Emma",
                "estimated_time": 15
            }
        ]
    }
]

## 2.1 Run the pipeline again to apply changes

In [None]:
load_info = pipeline.run(new_order, table_name="orders")
print(load_info)


Pipeline coffee_normalization_demo load step completed in 0.18 seconds
1 load package(s) were loaded to destination duckdb and into dataset coffee_shop_20250730120157
The duckdb destination used duckdb:////content/coffee_normalization_demo.duckdb location to store data
Load package 1753876959.2236512 is LOADED and contains no failed jobs


## 2.2 Let’s check out evolved schema


In [None]:
dataset = pipeline.dataset()
print(dataset.row_counts().df())

print("\n" + "-" * 80 + "\n")

orders_relation = dataset.orders
print("🗂️ Table: orders")
orders_df = orders_relation.df()
display(orders_df)

print("\n" + "-" * 80 + "\n")

orders__delivery_relation = dataset.orders__delivery
print("🗂️ Table: orders__delivery")
orders__delivery_df = orders__delivery_relation.df()
display(orders__delivery_df)

print("\n" + "-" * 80 + "\n")

orders__items = dataset.orders__items
print("🗂️ Table: orders__items")
orders__items_df = orders__items.df()
display(orders__items_df)

print("\n" + "-" * 80 + "\n")

orders__items__customizations = dataset.orders__items__customizations
print("🗂️ Table: orders__items__customizations")
orders__items__customizations_df = orders__items__customizations.df()
display(orders__items__customizations_df)

print("\n" + "-" * 80 + "\n")


                      table_name  row_count
0                         orders          2
1                  orders__items          3
2  orders__items__customizations          5
3               orders__delivery          1

--------------------------------------------------------------------------------

🗂️ Table: orders


Unnamed: 0,order_id,timestamp,customer__name,customer__email,customer__loyalty_tier,payment__method,payment__tip,_dlt_load_id,_dlt_id,customer__phone,payment__promo_code
0,1001,2024-12-18 09:15:30+00:00,Alice,alice@example.com,gold,card,1.5,1753876917.1535316,zvSxg0yG9YbJcg,,
1,1002,2024-12-18 10:30:00+00:00,Bob,bob@example.com,silver,mobile,0.75,1753876959.2236512,9FmjPmEO+cqXDA,+1-555-0123,WELCOME10



--------------------------------------------------------------------------------

🗂️ Table: orders__delivery


Unnamed: 0,address,driver,estimated_time,_dlt_parent_id,_dlt_list_idx,_dlt_id
0,123 Main St,Emma,15,9FmjPmEO+cqXDA,0,6yAFAlLgZ4zHXw



--------------------------------------------------------------------------------

🗂️ Table: orders__items


Unnamed: 0,item,size,price,_dlt_parent_id,_dlt_list_idx,_dlt_id,temperature
0,Latte,large,5.5,zvSxg0yG9YbJcg,0,gU+hxgOQSXHIUQ,
1,Croissant,,3.25,zvSxg0yG9YbJcg,1,HYuqkMXYniblag,
2,Cappuccino,medium,4.75,9FmjPmEO+cqXDA,0,EqLbgmucVCgh1w,extra hot



--------------------------------------------------------------------------------

🗂️ Table: orders__items__customizations


Unnamed: 0,value,_dlt_parent_id,_dlt_list_idx,_dlt_id
0,extra shot,gU+hxgOQSXHIUQ,0,Rq8TbbDagkwBzA
1,oat milk,gU+hxgOQSXHIUQ,1,kBfjzyV6Pl8X2Q
2,warmed,HYuqkMXYniblag,0,s+q79Vy4T2wlaw
3,almond milk,EqLbgmucVCgh1w,0,XYcb4WK3+UlAfg
4,extra foam,EqLbgmucVCgh1w,1,Xo0GAG9SyrAFFg



--------------------------------------------------------------------------------



## What changed?

✅ New columns were added to existing tables (customer__phone, payment__promo_code, temperature)

✅ A new table was created (orders__delivery) for the delivery array

✅ Old data still works perfectly with the updated schema

✅ No manual work or migrations needed!


# Chapter 3: Under the hood and what’s next

## Under the hood

Here’s how dlt works behind the scenes:

* **Schema inference**: dlt reads your data and builds a relational schema automatically. [Learn more](https://dlthub.com/docs/general-usage/schema)
* **Schema evolution**: New fields? No problem — dlt updates the schema safely and keeps you informed. [Details here](https://dlthub.com/docs/general-usage/schema-evolution)
* **Schema contracts**: Lock your schema or approve changes before they apply. [Docs](https://dlthub.com/docs/general-usage/schema-contracts)
* **Evolution alerts**: Stay in control with notifications when the schema changes. [How it works](https://dlthub.com/docs/running-in-production/alerting)

## What’s next?

Give it a try with the [REST API example](https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api/)

**Great for:**

* ✅ APIs that keep changing
* ✅ Event tracking with flexible data
* ✅ SaaS tools that add new fields often
* ✅ Quick builds without worrying about schemas

Want more? Explore the [dlt docs](https://dlthub.com/docs/intro) or try it out with our [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources/).
