Assignment Code: DS-AG-033
VectorDB, Hugging Face & Ollama |
Assignment

Question 1: What is a Vector Database (VectorDB) and how is it different from traditional databases?

Answer:

-	When data (e.g., a sentence or image) is passed through an ML model, it’s converted into a high-dimensional vector (embedding).

A VectorDB stores these vectors and allows similarity search using distance metrics such as cosine similarity, Euclidean distance, or dot product.

-	Vector database differs from a traditional database mainly in how data is stored and searched. Traditional databases store structured data in rows and columns and are optimized for exact-match queries using SQL operations such as WHERE, JOIN, or LIKE, relying on indexes like B-trees or hash indexes. They work well when the query requires precise values or clearly defined conditions but do not understand the semantic meaning of data. In contrast, a vector database stores high-dimensional vector embeddings generated by machine learning models from unstructured data such as text, images, or audio. Instead of exact matching, it performs similarity searches using distance metrics like cosine similarity or Euclidean distance and employs approximate nearest neighbor algorithms for efficiency. As a result, vector databases can capture context and meaning, making them suitable for tasks such as semantic search, recommendation systems, and AI applications like Retrieval-Augmented Generation, where understanding similarity is more important than exact matches.

Question 2: Explain the various types of VectorDBs available and describe their suitability for different use cases.

 Answer:

-	Vector databases can be classified into in-memory, disk-based, standalone systems, vector search libraries, and hybrid databases. In-memory VectorDBs are best for fast, real-time search on small datasets, while disk-based and standalone VectorDBs are suitable for large-scale, production AI applications like semantic search and RAG systems. Vector search libraries are mainly used for research and prototyping, and hybrid VectorDBs are ideal when both structured queries and semantic (vector) search are required in the same application.

Question 3: Why is Chroma DB considered important in the context of AI/ML projects? Describe its key features.

Answer:

-	Chroma DB is considered important in AI/ML projects because it provides a simple, lightweight, and developer-friendly vector database for storing and retrieving embeddings, which is essential for building applications based on semantic search and large language models.

-	Its key features include easy integration with LLM frameworks like LangChain and LlamaIndex, making it highly suitable for Retrieval-Augmented Generation (RAG) systems. Chroma supports efficient similarity search using vector embeddings, along with metadata filtering to refine results. It offers persistent storage, allowing embeddings to be saved and reused across sessions, and can run locally without complex setup, which is ideal for experimentation and small-to-medium AI projects. Overall, Chroma DB helps developers quickly prototype, test, and deploy embedding-based AI applications with minimal overhead.

Question 4: What are the benefits of using Hugging Face Hub for generative AI tasks?

Answer:

-	The Hugging Face Hub offers several benefits for generative AI tasks by providing a unified platform for accessing, sharing, and deploying state-of-the-art models and resources. It hosts a vast collection of pretrained generative models for text, image, audio, and multimodal tasks, which significantly reduces development time and computational cost. The Hub supports easy model loading and fine-tuning through well-documented APIs and libraries like Transformers and Diffusers. It also promotes reproducibility and collaboration by enabling version control, model cards, and community contributions. Additionally, Hugging Face Hub integrates well with modern MLOps workflows, offering tools for evaluation, deployment, and inference, making it highly suitable for both research and production-level generative AI applications.

Question 5: Describe the process and advantages of navigating and using pre-trained models from the Hugging Face Hub.

Answer:

-	Navigating and using pre-trained models from the Hugging Face Hub is a simple and efficient process. Users can browse or search the Hub by task, model type, framework, or language, and review model cards that describe the model’s architecture, training data, intended use, and limitations. Once a suitable model is identified, it can be easily loaded into an application using Hugging Face libraries such as Transformers or Diffusers with just a few lines of code. These models can be used directly for inference or fine-tuned on custom datasets to improve task-specific performance.

-	The main advantages of using pre-trained models from the Hub include significant time and cost savings, as there is no need to train models from scratch. The models are well-tested and community-validated, improving reliability. The Hub also supports reproducibility, version control, and easy sharing, which benefits both research and production workflows. Overall, Hugging Face Hub enables faster experimentation, easier deployment, and scalable development of AI applications.


In [1]:
# Question 6: Install and set up Chroma DB, and insert sample vector data for semantic search.

!pip install chromadb



Collecting chromadb
  Downloading chromadb-1.4.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.4.0-py3-none-any.whl.metadata (5.8 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.39.1-py3-none-any.whl.metadata (2.5 kB)
Collecting pypika>=0.48.9 (from chromadb)
  Downloading pypika-0.50.0-py2.py3-none-any.whl.metadata (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
!pip install sentence-transformers




In [3]:
import chromadb
from sentence_transformers import SentenceTransformer

# Initialize Chroma client
client = chromadb.Client()

# Create or get a collection
collection = client.get_or_create_collection(name="demo_collection")

# Load embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
documents = [
    "Machine learning is a subset of artificial intelligence",
    "Deep learning uses neural networks",
    "Python is widely used in data science",
    "Vector databases enable semantic search"
]

# Generate embeddings
embeddings = model.encode(documents).tolist()

# Add data to Chroma
collection.add(
    documents=documents,
    embeddings=embeddings,
    ids=[f"doc{i}" for i in range(len(documents))]
)


In [5]:
query = "What is semantic search?"
query_embedding = model.encode([query]).tolist()

results = collection.query(
    query_embeddings=query_embedding,
    n_results=2
)

print(results["documents"])


[['Vector databases enable semantic search', 'Machine learning is a subset of artificial intelligence']]


In [2]:
# Question 7: Demonstrate how to download and fine-tune a Hugging Face model for a text generation task.

!pip install transformers datasets torch

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Set the padding token
tokenizer.pad_token = tokenizer.eos_token

from datasets import load_dataset
dataset = load_dataset("imdb", split="train[:1%]")

def tokenize(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

dataset = dataset.map(tokenize, batched=True)

from transformers import Trainer, TrainingArguments
from transformers import DataCollatorForLanguageModeling

args = TrainingArguments(
    output_dir="out",
    per_device_train_batch_size=2,
    num_train_epochs=1
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

trainer.train()


from transformers import pipeline

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(generator("AI is changing the world", max_length=40))




Map:   0%|          | 0/250 [00:00<?, ? examples/s]

  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice:

 3


[34m[1mwandb[0m: You chose "Don't visualize my results"
[34m[1mwandb[0m: Using W&B in offline mode.
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin


`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Step,Training Loss


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'AI is changing the world. Its first film is a masterpiece of cinema. It is a masterpiece. Its movie is a masterpiece. Its movie is a masterpiece. Its film is a masterpiece. Its movie is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its film is a masterpiece. Its movie is a masterpiece. Their movie is a masterpiece. Their film is a masterpiece. Their movie is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a masterpiece. Their film is a. Their film is a masterpiece. Their film is a. Their film is a masterpiece. Their film is a masterpiece. Their

In [6]:
# Question 8: Create a custom LLM using Ollama and Llama2, and run it locally for basic text prompts.

!pip install ollama

# Install zstd, a dependency for Ollama installation
!apt-get install zstd -y

# Install Ollama binary (if not already installed, though it might be from Question 10)
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server in the background
import subprocess
import time


subprocess.Popen(['nohup', 'ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5) # Give the server a few seconds to start

!ollama --version

!ollama pull llama2

# Define the Modelfile content
modelfile_content = """
FROM llama2
SYSTEM You are a helpful teacher who explains in simple words.
"""

# Write the Modelfile content to a file
with open("Modelfile", "w") as f:
    f.write(modelfile_content)

!ollama create mymodel -f Modelfile

# Run the custom model with a prompt
!ollama run mymodel "Explain machine learning."

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  zstd
0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 603 kB of archives.
After this operation, 1,695 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 zstd amd64 1.4.8+dfsg-3build1 [603 kB]
Fetched 603 kB in 1s (581 kB/s)
Selecting previously unselected package zstd.
(Reading database ... 117540 files and directories currently installed.)
Preparing to unpack .../zstd_1.4.8+dfsg-3build1_amd64.deb ...
Unpacking zstd (1.4.8+dfsg-3build1) ...
Setting up zstd (1.4.8+dfsg-3build1) ...
Processing triggers for man-db (2.10.2-1) ...
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding

In [7]:
# Question 9: Implement a basic RAG (Retrieval-Augmented Generation) system using Ollama with Llama3.

!pip install ollama numpy

# Install zstd, a dependency for Ollama installation
!apt-get install zstd -y

# Install Ollama binary
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server in the background
import subprocess
import time

subprocess.Popen(['nohup', 'ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5) # Give the server a few seconds to start

# Pull Ollama models
!ollama pull llama3
!ollama pull nomic-embed-text

# Data for retrieval
doc_data = [
    "Machine learning is a subset of artificial intelligence.",
    "Deep learning uses neural networks with many layers.",
    "RAG combines retrieval and generation for better answers."
]

# Write the data to a file as expected by the code
with open("data.txt", "w") as f:
    for line in doc_data:
        f.write(line + "\n")

import ollama
import numpy as np

# Load text
with open("data.txt") as f:
    documents = f.readlines()

# Create embeddings
doc_embeddings = []
for doc in documents:
    emb = ollama.embeddings(
        model="nomic-embed-text",
        prompt=doc
    )["embedding"]
    doc_embeddings.append(emb)

doc_embeddings = np.array(doc_embeddings)

def retrieve(query, top_k=1):
    query_emb = ollama.embeddings(
        model="nomic-embed-text",
        prompt=query
    )["embedding"]

    scores = np.dot(doc_embeddings, query_emb)
    top_index = np.argmax(scores)
    return documents[top_index]

query = "What is deep learning?"

context = retrieve(query)

prompt = f"""
Use the following context to answer the question.

Context:
{context}

Question:
{query}
"""

response = ollama.chat(
    model="llama3",
    messages=[{"role": "user", "content": prompt}]
)

print(response["message"]["content"])


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zstd is already the newest version (1.4.8+dfsg-3build1).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?20

In [16]:
'''Question 10: A health-tech startup wants to build a chatbot that can answer user
queries based on medical research articles. Propose and explain a solution using
Hugging Face models for understanding, VectorDB for retrieval, and Ollama for
generation.'''


'''Answer:

A suitable solution for the health-tech startup is to build a Retrieval-Augmented Generation (RAG)–based medical chatbot that combines Hugging Face models, a Vector Database, and Ollama for local generation.

First, medical research articles are collected, cleaned, and split into smaller text chunks. A Hugging Face biomedical embedding model (such as BioBERT or sentence-transformers fine-tuned on medical text) is used to convert each chunk into vector embeddings. These embeddings, along with metadata like article title and source, are stored in a VectorDB (e.g., Chroma, Milvus, or Pinecone). This enables fast semantic retrieval of relevant passages rather than keyword-based search.

When a user asks a medical question, the query is first embedded using the same Hugging Face model and sent to the VectorDB, which retrieves the most semantically similar research passages. This ensures that the chatbot’s responses are grounded in trusted medical literature, improving accuracy and reducing hallucinations.

The retrieved passages are then passed as context to a locally running LLM via Ollama (for example, Llama 2 or Mistral). Ollama generates a natural-language answer using the provided evidence while following a system prompt that enforces a medical-safe tone, such as avoiding diagnosis and encouraging professional consultation.

Overall, this architecture combines Hugging Face models for medical understanding, VectorDB for reliable knowledge retrieval, and Ollama for private, controllable text generation, making it well suited for a health-tech chatbot that is accurate, scalable, and compliant with data-privacy requirements.



Below is a minimal end-to-end RAG code example using
Hugging Face (embeddings) + ChromaDB (retrieval) + Ollama (generation) for a medical chatbot.'''


!pip install chromadb sentence-transformers ollama

# Install zstd, a dependency for Ollama installation
!apt-get install zstd -y

# Install Ollama binary
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server in the background
import subprocess
import time

# Start Ollama server as a background process
# nohup: Run a command immune to hangups, with output to a non-tty.
# > /dev/null 2>&1: Redirect stdout and stderr to /dev/null
subprocess.Popen(['nohup', 'ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5) # Give the server a few seconds to start

# Attempt to pull the Llama2 model. This will connect to the Ollama server.
!ollama pull llama2


from sentence_transformers import SentenceTransformer
import chromadb

model = SentenceTransformer("all-MiniLM-L6-v2")
client = chromadb.Client()
col = client.get_or_create_collection("medical")

docs = ["Diabetes causes high blood sugar.", "Vaccines boost immunity."]
emb = model.encode(docs).tolist()

col.add(documents=docs, embeddings=emb, ids=["1","2"])


import ollama

def chatbot(query):
    q_emb = model.encode([query]).tolist()
    ctx = col.query(query_embeddings=q_emb, n_results=1)["documents"][0][0]

    prompt = f"Answer using this context only:\n{ctx}\nQuestion: {query}"
    res = ollama.chat(model="llama2", messages=[{"role":"user","content":prompt}])
    return res["message"]["content"]

print(chatbot("What is diabetes?"))

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zstd is already the newest version (1.4.8+dfsg-3build1).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?20