# dlt assessment - Jaffle API ingestion

1. Describe how you would use dlt in a previous job or project. If you don’t have a real example, pick a realistic one (e.g., analytics platform, marketing data, finance reporting, product telemetry). Aim for 5–10 sentences, focused on clarity and trade-offs.

2. Build a dlt pipeline, using our Jaffle Shop API as a source and DuckDB as a destination.
Use the API spec for this API and the “https://jaffle-shop.dlthub.com/api/v1” as an API base URL
Make sure you use some more advanced dlt functionality like the merge write disposition and incremental loading.

3. Assume this pipeline is the starting point for a real engagement with a client. Describe how you would take it to production.
List the key questions you’d ask before recommending a setup.
Highlight the aspects the client needs to pay attention to. You don’t need to go super deep, highlight one or two aspects you would ask the client to consider when running this pipeline in production.
What would you recommend for the best practices for dlt code & deployment. You can talk about env & configuraiton, testing and validation, CI/CD, orchestration & runtime.

## 1. dlt use cases

1. web marketing PoC

2. ingestion pipeline for solo data team

3. EL tool for open table format with DuckLake

## 2. Build a dlt pipeline for Jaffle API


In [None]:
import dlt
from dlt.sources.helpers import requests
import duckdb

base_url = dlt.config["runtime.BASE_URL"]
print(base_url)

# define here the API endpoints
#endpoints = ["customers", "orders" ]

https://jaffle-shop.dlthub.com/api/v1


In [26]:
#def load_api_data() -> None:
#    "Load data from the Rest jaffle api"
#from dlt.source.helpers.rest_client import RESTClient
from dlt.sources.rest_api import rest_api_source

jaffle_config = {
    "client":{
        "base_url":f"{base_url}"
    },

    "resource_defaults": {
        "primary_key": "id",  
        "write_disposition": "replace", 
        "endpoint": {
                "data_selector": "$", 
                "params": {
                "page_size": 100 # Default from api docs
                }
        }
    },
   "resources": [
            { # 1st endpoint: customers
                "name": "customers",
                "endpoint": {
                    "path": "customers",
                }
            },
            {
                "name": "orders",
                "endpoint": {
                    "path": "orders",
                    "params": {
                        "start_date": "2017-01-01",
                        "end_date": "2017-01-10",
                    },
                    "incremental": {
                        "cursor_path": "ordered_at",
                        # Set initial value to the start of the desired range
                        # Note: The API might ignore this if start_date is also present?
                        # Let's keep it aligned with the filter for clarity.
                        "initial_value": "2017-01-01T00:00:00Z", # ISO format often preferred
                        "start_param": "start_date", # API parameter for filtering
                        # end_param could be 'end_date' but we are setting a fixed window initially
                    },
                }
            },
            {
                "name": "products",
                "primary_key": "sku", # PK is 'sku' for products
                "endpoint": {
                    "path": "products",
                }
            },
            {
                "name": "supplies",
                "endpoint": {
                    "path": "supplies",
                }
            },
            {
                "name": "stores",
                 "endpoint": {
                    "path": "stores",
                   
                 }
            },
        ]
}

jaffle_source = rest_api_source(jaffle_config) # we configure the API source


pipeline = dlt.pipeline(
    pipeline_name="jaffle_pipeline", 
    destination='duckdb', 
    dataset_name="jaffle_data",
    progress = "log"
    )


load_info = pipeline.run(jaffle_source)
print(load_info)

------------------------------- Extract rest_api -------------------------------
Resources: 0/5 (0.0%) | Time: 0.00s | Rate: 0.00/s
Memory usage: 222.89 MB (67.30%) | CPU usage: 0.00%

------------------------------- Extract rest_api -------------------------------
Resources: 0/5 (0.0%) | Time: 0.07s | Rate: 0.00/s
customers: 100  | Time: 0.00s | Rate: 52428800.00/s
Memory usage: 222.89 MB (67.30%) | CPU usage: 0.00%

------------------------------- Extract rest_api -------------------------------
Resources: 0/5 (0.0%) | Time: 0.15s | Rate: 0.00/s
customers: 100  | Time: 0.08s | Rate: 1197.50/s
orders: 100  | Time: 0.00s | Rate: 32263876.92/s
Memory usage: 222.95 MB (67.30%) | CPU usage: 0.00%





------------------------------- Extract rest_api -------------------------------
Resources: 0/5 (0.0%) | Time: 0.21s | Rate: 0.00/s
customers: 100  | Time: 0.15s | Rate: 687.90/s
orders: 100  | Time: 0.06s | Rate: 1616.40/s
products: 10  | Time: 0.00s | Rate: 3226387.69/s
Memory usage: 223.03 MB (67.30%) | CPU usage: 0.00%

------------------------------- Extract rest_api -------------------------------
Resources: 0/5 (0.0%) | Time: 0.28s | Rate: 0.00/s
customers: 100  | Time: 0.21s | Rate: 480.80/s
orders: 100  | Time: 0.12s | Rate: 803.32/s
products: 10  | Time: 0.06s | Rate: 159.69/s
supplies: 65  | Time: 0.00s | Rate: 20971520.00/s
Memory usage: 223.16 MB (67.40%) | CPU usage: 0.00%

------------------------------- Extract rest_api -------------------------------
Resources: 0/5 (0.0%) | Time: 0.34s | Rate: 0.00/s
customers: 100  | Time: 0.27s | Rate: 365.44/s
orders: 100  | Time: 0.19s | Rate: 525.93/s
products: 10  | Time: 0.13s | Rate: 77.96/s
supplies: 65  | Time: 0.07s | Rate: 

In [35]:
con = duckdb.connect('/Users/macbook/Development/jaffle_poc/jaffle_pipeline.duckdb') 
print(con)
con.sql('show databases;').show()
con.sql("""
    use jaffle_pipeline;
    show tables;
    """).show()


<_duckdb.DuckDBPyConnection object at 0x1240c1ab0>
┌─────────────────┐
│  database_name  │
│     varchar     │
├─────────────────┤
│ jaffle_pipeline │
└─────────────────┘

┌─────────┐
│  name   │
│ varchar │
├─────────┤
│ 0 rows  │
└─────────┘



## 3. Take the pipeline to production

### Key questions before recommending a setup
- 

### Clients needs to pay attention
- 
- 

### when running this pipeline in production:
- 
- 

