### Q1 - dlt version

In [1]:
%pip install -q "dlt[qdrant]" "qdrant-client[fastembed]"

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
# Linux: pip freeze | grep "dlt"
%pip freeze | findstr "dlt"

dlt==1.12.3
Note: you may need to restart the kernel to use updated packages.


### Q2 - dlt pipeline

The resource is responsible for yielding data.

In [4]:
import requests
import dlt

@dlt.resource
def zoomcamp_data():
    docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
    docs_response = requests.get(docs_url)
    documents_raw = docs_response.json()

    for course in documents_raw:
        course_name = course['course']

        for doc in course['documents']:
            doc['course'] = course_name
            yield doc

Defininig quadrant destination (local folder). This can also be done using dlt secrets file.

In [5]:
from dlt.destinations import qdrant

qdrant_destination = qdrant(
  qd_path="db.qdrant", 
)

Now we can run the pipeline by providing the resource generator, the destination and dataset name.

In [8]:
pipeline = dlt.pipeline(
    pipeline_name="zoomcamp_pipeline",
    destination=qdrant_destination,
    dataset_name="zoomcamp_tagged_data"
)

load_info = pipeline.run(zoomcamp_data())
print(pipeline.last_trace)

Run started at 2025-07-04 19:34:06.528762+00:00 and COMPLETED in 8.06 seconds with 4 steps.
Step extract COMPLETED in 0.37 seconds.

Load package 1751657647.9283087 is EXTRACTED and NOT YET LOADED to the destination and contains no failed jobs

Step normalize COMPLETED in 0.12 seconds.
Normalized data for the following tables:
- zoomcamp_data: 948 row(s)
- _dlt_pipeline_state: 1 row(s)

Load package 1751657647.9283087 is NORMALIZED and NOT YET LOADED to the destination and contains no failed jobs

Step load COMPLETED in 6.18 seconds.
Pipeline zoomcamp_pipeline load step completed in 6.15 seconds
1 load package(s) were loaded to destination qdrant and into dataset zoomcamp_tagged_data
The qdrant destination used c:\Users\usuario\Programming\llm-zoomcamp\cohorts\2025\workshops\db.qdrant location to store data
Load package 1751657647.9283087 is LOADED and contains no failed jobs

Step run COMPLETED in 8.06 seconds.
Pipeline zoomcamp_pipeline load step completed in 6.15 seconds
1 load pack

Number of rows inserted:

In [22]:
pipeline.last_trace.last_normalize_info.row_counts

{'zoomcamp_data': 948, '_dlt_pipeline_state': 1}

### Q3 - Embedding Model

We can check the embedding model being used thanks to the metadata generated by dlt.

In [28]:
import json

with open("db.qdrant/meta.json", "r") as file:
    metadata = json.load(file)

metadata["collections"]["zoomcamp_tagged_data"]["vectors"]

{'fast-bge-small-en': {'size': 384,
  'distance': 'Cosine',
  'hnsw_config': None,
  'quantization_config': None,
  'on_disk': None,
  'datatype': None,
  'multivector_config': None}}