In [1]:
pip show dlt

Name: dlt
Version: 1.12.3
Summary: dlt is an open-source python-first scalable data loading library that does not require any backend to run.
Home-page: https://github.com/dlt-hub
Author: 
Author-email: "dltHub Inc." <services@dlthub.com>
License-Expression: Apache-2.0
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: click, fsspec, gitpython, giturlparse, hexbytes, humanize, jsonpath-ng, orjson, packaging, pathvalidate, pendulum, pluggy, pytz, pyyaml, requests, requirements-parser, rich-argparse, semver, setuptools, simplejson, sqlglot, tenacity, tomlkit, typing-extensions, tzdata
Required-by: cognee
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install -q "dlt[qdrant]" "qdrant-client[fastembed]"

Note: you may need to restart the kernel to use updated packages.


In [3]:
import dlt
import requests
import pandas as pd
from datetime import datetime

# Step 1: Create DLT resource
@dlt.resource(write_disposition="replace", name="zoomcamp_data")
def zoomcamp_data():
    docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
    docs_response = requests.get(docs_url)
    documents_raw = docs_response.json()

    for course in documents_raw:
        course_name = course['course']

        for doc in course['documents']:
            doc['course'] = course_name
            yield doc

In [5]:
# Step 2: Create and run the pipeline
pipeline = dlt.pipeline(
    pipeline_name="zoomcamp_pipeline",
    destination="duckdb",
    dataset_name="zoomcamp_tagged_data"
)
load_info = pipeline.run(zoomcamp_data())
print(pipeline.last_trace)

Run started at 2025-07-08 00:27:43.991045+00:00 and COMPLETED in 3.22 seconds with 4 steps.
Step extract COMPLETED in 0.94 seconds.

Load package 1751934464.3615549 is EXTRACTED and NOT YET LOADED to the destination and contains no failed jobs

Step normalize COMPLETED in 0.48 seconds.
Normalized data for the following tables:
- zoomcamp_data: 948 row(s)
- _dlt_pipeline_state: 1 row(s)

Load package 1751934464.3615549 is NORMALIZED and NOT YET LOADED to the destination and contains no failed jobs

Step load COMPLETED in 1.47 seconds.
Pipeline zoomcamp_pipeline load step completed in 0.51 seconds
1 load package(s) were loaded to destination duckdb and into dataset zoomcamp_tagged_data
The duckdb destination used duckdb:////workspaces/LLMcourseDT/dlt_workshop/zoomcamp_pipeline.duckdb location to store data
Load package 1751934464.3615549 is LOADED and contains no failed jobs

Step run COMPLETED in 3.22 seconds.
Pipeline zoomcamp_pipeline load step completed in 0.51 seconds
1 load package

In [6]:
# Q2
from dlt.destinations import qdrant

qdrant_destination = qdrant(
  qd_path="db.qdrant", 
)

In [7]:
pipeline = dlt.pipeline(
    pipeline_name="zoomcamp_pipeline",
    destination=qdrant_destination,
    dataset_name="zoomcamp_tagged_data"

)
load_info = pipeline.run(zoomcamp_data())
print(pipeline.last_trace)

  from .autonotebook import tqdm as notebook_tqdm
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00,  7.32it/s]


Run started at 2025-07-08 00:28:54.980956+00:00 and COMPLETED in 11.01 seconds with 4 steps.
Step extract COMPLETED in 0.26 seconds.

Load package 1751934539.4975917 is EXTRACTED and NOT YET LOADED to the destination and contains no failed jobs

Step normalize COMPLETED in 0.08 seconds.
Normalized data for the following tables:
- zoomcamp_data: 948 row(s)
- _dlt_pipeline_state: 1 row(s)

Load package 1751934539.4975917 is NORMALIZED and NOT YET LOADED to the destination and contains no failed jobs

Step load COMPLETED in 6.31 seconds.
Pipeline zoomcamp_pipeline load step completed in 6.29 seconds
1 load package(s) were loaded to destination qdrant and into dataset zoomcamp_tagged_data
The qdrant destination used /workspaces/LLMcourseDT/dlt_workshop/db.qdrant location to store data
Load package 1751934539.4975917 is LOADED and contains no failed jobs

Step run COMPLETED in 11.01 seconds.
Pipeline zoomcamp_pipeline load step completed in 6.29 seconds
1 load package(s) were loaded to dest