**Question 1: What is a Vector Database (VectorDB) and how is it different from
traditional databases?**

A Vector Database (VectorDB) is a database that stores data as vector embeddings, which are numerical representations of items like text, images, or audio. It is mainly used to perform similarity search, meaning it finds data that is semantically similar rather than exactly the same.

The key difference from traditional databases is how data is stored and queried. Traditional databases store structured data in tables and are optimized for exact matches using SQL queries. Vector databases store high-dimensional vectors and use distance metrics (like cosine similarity) to find the closest matches, making them ideal for AI use cases such as semantic search, recommendation systems, and chatbots.

**Question 2: Explain the various types of VectorDBs available and describe their
suitability for different use cases**
Standalone (Pure) Vector Databases are designed only for storing vectors and performing fast similarity searches. They are best suited for use cases like semantic search, recommendation systems, and RAG-based AI applications where high-performance vector retrieval is the main requirement, but complex relational queries are not needed.

Hybrid Vector Databases combine vector search with metadata or attribute-based filtering. They are suitable for real-world AI applications where semantic similarity must be combined with conditions like date, category, or user information, such as AI search engines and personalized assistants.

Vector-Enabled Traditional Databases are conventional databases that have added vector search capabilities. They are suitable when vector search is needed alongside existing relational or document-based workloads, allowing teams to integrate AI features without migrating to a completely new database system.

**Question 3: Why is Chroma DB considered important in the context of AI/ML projects?
Describe its key features.**

Chroma DB is considered important in AI/ML projects because it provides a simple and developer-friendly way to store, manage, and retrieve vector embeddings, which are essential for modern AI applications like semantic search, chatbots, and Retrieval-Augmented Generation (RAG) systems. It allows seamless integration between machine-learning models and data storage without heavy infrastructure overhead.

Key features of Chroma DB include efficient vector similarity search, easy embedding storage, and tight integration with popular AI frameworks such as LangChain and LLM pipelines. It supports metadata filtering, persistent storage, and local or in-memory deployment, making it especially suitable for prototyping, experimentation, and small-to-medium-scale AI applications.

**Question 4: What are the benefits of using Hugging Face Hub for generative AI tasks**

Hugging Face Hub is highly beneficial for generative AI tasks because it provides easy access to a vast ecosystem of pre-trained models for text, image, audio, and multimodal generation. This allows developers to quickly experiment, fine-tune, and deploy state-of-the-art generative models without building everything from scratch.

Additionally, the Hub supports model versioning, dataset hosting, and collaboration, making it ideal for research and production workflows. Its seamless integration with popular frameworks like PyTorch and TensorFlow, along with inference APIs and community support, significantly accelerates development and reduces the complexity of building generative AI applications.

**Question 5: Describe the process and advantages of navigating and using pre-trained
models from the Hugging Face Hub.**

The process of using pre-trained models from the Hugging Face Hub involves browsing or searching models by task, selecting a suitable pre-trained model, and loading it using the Transformers or Diffusers libraries with just a few lines of code. Developers can then directly use the model for inference or fine-tune it on their own datasets, making experimentation fast and efficient.

The main advantages include saving development time, access to state-of-the-art architectures, and strong community validation through model cards and benchmarks. The Hub also provides version control, easy sharing, and integration with deployment tools, enabling reliable and scalable use of generative AI models in real-world applications.

**Question 6: Install and set up Chroma DB, and insert sample vector data for semantic
search.**

The ChromaDB can be installedby
pip install ChromaDB

In [None]:
import chromadb

# Create a Chroma client
client = chromadb.Client()

# Create a collection
collection = client.create_collection(name="sample_collection")

# Insert sample data
collection.add(
    documents=[
        "Artificial Intelligence is transforming technology",
        "Machine Learning is a subset of AI",
        "Databases store and manage data efficiently"
    ],
    metadatas=[
        {"topic": "AI"},
        {"topic": "ML"},
        {"topic": "DB"}
    ],
    ids=["doc1", "doc2", "doc3"]
)

ModuleNotFoundError: No module named 'chromadb'

**Question 7: Demonstrate how to download and fine-tune a Hugging Face model for a
text generation task.**


In [None]:
# Install required libraries
# pip install transformers datasets torch

from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from datasets import Dataset

# Step 1: Download pre-trained model and tokenizer
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# GPT-2 has no pad token by default
tokenizer.pad_token = tokenizer.eos_token
model.resize_token_embeddings(len(tokenizer))

# Step 2: Prepare sample dataset
texts = [
    "Artificial intelligence is transforming the world.",
    "Machine learning helps systems learn from data."
]

dataset = Dataset.from_dict({"text": texts})

def tokenize_function(example):
    return tokenizer(
        example["text"],
        truncation=True,
        padding="max_length",
        max_length=64
    )

tokenized_dataset = dataset.map(tokenize_function, batched=True)
tokenized_dataset = tokenized_dataset.remove_columns(["text"])
tokenized_dataset.set_format("torch")

# Step 3: Fine-tuning configuration
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=2,
    num_train_epochs=3,
    logging_steps=5,
    save_steps=50,
    report_to="none"
)

# Step 4: Train the model
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset
)

trainer.train()

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Map:   0%|          | 0/2 [00:00<?, ? examples/s]



ValueError: The model did not return a loss from the inputs, only the following keys: logits,past_key_values. For reference, the inputs it received are input_ids,attention_mask.

**Question 8: Create a custom LLM using Ollama and Llama2, and run it locally for basic
text prompts.**

In [None]:
# Step 1: Install Ollama (Linux / macOS)
curl -fsSL https://ollama.com/install.sh | sh

# Step 2: Pull the Llama 2 model
ollama pull llama2

# Step 3: Create a Modelfile for a custom LLM
cat <<EOF > Modelfile
FROM llama2
SYSTEM You are a helpful AI assistant that gives clear and concise answers.
EOF

# Step 4: Build the custom model
ollama create custom-llama2 -f Modelfile

# Step 5: Run the custom LLM locally
ollama run custom-llama2

# Example prompt (inside the session)
# > Explain vector databases in simple terms

**Question 9: Implement a basic RAG (Retrieval-Augmented Generation) system using Ollama with Llama3.**

In [None]:
# Basic RAG system using Ollama (Llama3) + ChromaDB

# Install dependencies
# pip install chromadb ollama sentence-transformers

import chromadb
from sentence_transformers import SentenceTransformer
import ollama

# Step 1: Load embedding model
embed_model = SentenceTransformer("all-MiniLM-L6-v2")

# Step 2: Create ChromaDB client and collection
client = chromadb.Client()
collection = client.create_collection(name="rag_docs")

# Step 3: Add documents to VectorDB
documents = [
    "Vector databases store embeddings for semantic search.",
    "Retrieval Augmented Generation improves LLM accuracy.",
    "Llama models can be run locally using Ollama."
]

embeddings = embed_model.encode(documents).tolist()

collection.add(
    documents=documents,
    embeddings=embeddings,
    ids=["doc1", "doc2", "doc3"]
)

# Step 4: User query
query = "What is RAG and why is it useful?"

query_embedding = embed_model.encode([query]).tolist()

# Step 5: Retrieve relevant documents
results = collection.query(
    query_embeddings=query_embedding,
    n_results=2
)

retrieved_context = " ".join(results["documents"][0])

# Step 6: Generate answer using Ollama (Llama3)
prompt = f"""
Use the following context to answer the question.

Context:
{retrieved_context}

Question:
{query}
"""

response = ollama.chat(
    model="llama3",
    messages=[{"role": "user", "content": prompt}]
)

print("Answer:")
print(response["message"]["content"])

ModuleNotFoundError: No module named 'chromadb'

**Question 10: A health-tech startup wants to build a chatbot that can answer user
queries based on medical research articles. Propose and explain a solution using
Hugging Face models for understanding, VectorDB for retrieval, and Ollama for
generation.

**Proposed Solution Architecture**

The chatbot can be built using a Retrieval-Augmented Generation (RAG) approach, which combines document retrieval with text generation to produce accurate and context-aware answers grounded in medical research articles.

**Understanding Using Hugging Face Models**

Medical research articles are first processed using Hugging Face embedding models (such as sentence-transformerâ€“based models). These models convert text chunks from medical papers into vector embeddings that capture semantic meaning, enabling the system to understand user queries beyond keyword matching.


**Retrieval Using Vector Database**

All generated embeddings from the medical articles are stored in a VectorDB (such as Chroma or FAISS). When a user asks a question, the query is converted into an embedding and compared with stored vectors to retrieve the most relevant research passages. This ensures that responses are grounded in verified medical literature.



**Generation Using Ollama (Local LLM)**

The retrieved passages are passed as context to a locally running LLM via Ollama (for example, Llama 2). The model uses this context to generate a clear, natural-language response while staying aligned with the retrieved medical information, improving factual accuracy and reducing hallucinations.


**Advantages of This Approach**

Ensures accurate and research-backed responses

Preserves data privacy by running the LLM locally

Scales efficiently with large medical document collections

Supports explainable answers by referencing retrieved articles