# Essentials: 5 Minute RAG with Qdrant and DeepSeek

https://qdrant.tech/documentation/rag-deepseek/

In [1]:
from qdrant_client import QdrantClient, models

In [3]:
client = QdrantClient(":memory:")
client.set_model("BAAI/bge-base-en-v1.5")
client.add(
    collection_name="knowledge-base",
    # The collection is automatically created if it doesn't exist.
    documents=[
        "Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!",
        "Docker helps developers build, share, and run applications anywhere — without tedious environment configuration or management.",
        "PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing.",
        "MySQL is an open-source relational database management system (RDBMS). A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database.",
        "NGINX is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. NGINX is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption.",
        "FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.",
        "SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.",
        "The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed times, dates, or intervals.",
    ],
)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/218M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/740 [00:00<?, ?B/s]

['60aabd4c0a9a4323a4d613a068c36e33',
 'b8f3fbe4a4734476a25d6e07ce6f962c',
 '2d5bec3dd79e415187c34dcd23332281',
 '5f30e4332ed745e3921756ba3b6db0f3',
 'e714a72590ee451588adce8f05debdf5',
 '1e2833b9c65a4101918c4b8fd80d311d',
 '908559de554048d4bdd6342e6dfc96f6',
 '7034b9f4df0f4e1bbd849de14b2a8601']

In [4]:
prompt = """
What tools should I need to use to build a web service using vector embeddings for search?
"""

In [8]:
from ollama import ChatResponse, chat


def query_deepseek(prompt) -> str:
    response: ChatResponse = chat(
        model="deepseek-r1:8b",
        messages=[
            {
                "role": "user",
                "content": prompt,
            },
        ],
    )
    return response.message.content

In [9]:
query_deepseek(prompt)

"<think>\nOkay, so I'm trying to figure out what tools I need to build a web service that uses vector embeddings for search. I remember hearing about vector embeddings in the context of machine learning and natural language processing, but I'm not exactly sure how they work or how they can be applied to building a search function.\n\nFirst, I think I need some way to generate these vectors from text inputs. Maybe there are libraries that can create embeddings based on the input data. The assistant mentioned TensorFlow and PyTorch as deep learning frameworks. Since I have some Python experience, maybe I should start with those. But what exactly do they offer in terms of vector embeddings?\n\nI remember there's something called Word2Vec which creates word vectors. So perhaps using a pre-trained model or implementing one could help generate these embeddings for each query. Also, FastText comes up often as an alternative to Word2 Vec, and it can handle subwords, which might be useful if my

In [10]:
results = client.query(
    collection_name="knowledge-base",
    query_text=prompt,
    limit=3,
)
results

[QueryResponse(id='60aabd4c0a9a4323a4d613a068c36e33', embedding=None, sparse_embedding=None, metadata={'document': 'Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!'}, document='Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!', score=0.6944532100424724),
 QueryResponse(id='908559de554048d4bdd6342e6dfc96f6', embedding=None, sparse_embedding=None, metadata={'document': 'SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. You can use this framew

In [None]:
context = "\n".join(r.document for r in results)
context

In [None]:
metaprompt = f"""
You are a software architect. 
Answer the following question using the provided context. 
If you can't find the answer, do not pretend you know it, but answer "I don't know".

Question: {prompt.strip()}

Context: 
{context.strip()}

Answer:
"""

# Look at the full metaprompt
print(metaprompt)

In [None]:
query_deepseek(metaprompt)

In [14]:
def rag(question: str, n_points: int = 3) -> str:
    results = client.query(
        collection_name="knowledge-base",
        query_text=question,
        limit=n_points,
    )

    context = "\n".join(r.document for r in results)

    metaprompt = f"""
    You are a software architect. 
    Answer the following question using the provided context. 
    If you can't find the answer, do not pretend you know it, but only answer "I don't know".
    
    Question: {question.strip()}
    
    Context: 
    {context.strip()}
    
    Answer:
    """

    return query_deepseek(metaprompt)

In [18]:
print(rag("What can the stack for a web api look like?"))

<think>
Okay, so I need to figure out what a stack for a web API might look like. Let me start by understanding the question. The user is asking about the structure of a web API's stack, given some context about FastAPI, NGINX, and Docker.

From the context, FastAPI is described as a modern, high-performance web framework for building APIs using Python 3.7+. So that might be part of the application layer or backend.

NGINX is mentioned as a high-performance HTTP server and reverse proxy, which typically sits in front of other applications. It's known for its performance, so maybe it's part of the infrastructure layer handling incoming requests.

Docker is a containerization platform that helps build, share, and run applications without managing environments. So Docker would be related to the deployment or orchestration layer, allowing the application to be containerized and deployed efficiently.

Putting this together, the stack for a web API might include FastAPI as the backend framew

In [19]:
print(rag("Where is the nearest grocery store?"))

<think>
Okay, so the user is asking where the nearest grocery store is. Hmm, but looking at the context provided, it's all about software stuff: Qdrant, FastAPI, NGINX. None of that mentions anything related to grocery stores or locations. So I can't find any information here that would help answer their question. I should just say I don't know because there's no relevant data in the context. No point in making something up or guessing; better to be honest.
</think>

I don't know.


In [20]:
print(rag(prompt))

<think>
Okay, so I need to figure out the tools required to build a web service that uses vector embeddings for search. Let me start by understanding what each part means.

First, vector embeddings are like converting text or data into numerical vectors, right? These vectors can then be used to find similar items, which is useful for things like search or recommendations.

Looking at the context provided, there are three tools mentioned: Qdrant, SentenceTransformers, and FastAPI. Let me break them down.

Qdrant is described as a vector database and search engine. It acts as an API service, so it probably handles the actual storage and searching of these vectors. That makes sense because building a database from scratch would be time-consuming. So if I need to deploy a vector search service quickly, Qdrant might be the way to go.

Next, SentenceTransformers is a Python framework for generating embeddings. It can handle text in over 100 languages and provides sentence embeddings. This to