<a href="https://colab.research.google.com/github/AnmolHemani/Advanced-Python/blob/main/Vector_Database.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install chromadb google-generativeai sentence-transformers

Collecting chromadb
  Downloading chromadb-1.0.13-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.4 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-5.1.0-py3-none-any.whl.metadata (4.9 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.34.1-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.34.1-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb)
  Downloading opentelemetry_sdk-1.34.1-py3-none-any.whl.metadata (1.6 kB)
Coll

In [2]:
# Set up our gemini api key

In [15]:
import google.generativeai as genai
import os
from google.colab import userdata
os.environ['GOOGLE_API_KEY'] = userdata.get('Google_Api_Key')
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

In [4]:
# Initialize Chromadb

import chromadb

# Initialize ChromaDB client (in-memory or persistent storage)
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_client = chromadb.Client() # In memory storage

# Create a new Collection
collection = chroma_client.get_or_create_collection(name="gemini_embeddings")

In [5]:
# Generate Text Embeddings using Sentence Transformers

# Since Google Gemini does not provide direct embeddings, we use sentence-transformers to generate embeddings from text.

In [6]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

def generate_embedding(text):
  return embedding_model.encode(text).tolist()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [7]:
# Insert Data into ChromaDB

In [8]:
documents = [
    {"id": "1", "text": "What is machine learning?"},
    {"id": "2", "text": "Explain deep learning and its applications."},
    {"id": "3", "text": "What are transformers in NLP?"}
]

# Store documents with their embeddings
for doc in documents:
    embedding = generate_embedding(doc["text"])
    collection.add(
        ids=[doc["id"]],
        embeddings=[embedding],
        metadatas=[{"text": doc["text"]}]
    )

print("Documents inserted successfully!")


Documents inserted successfully!


In [9]:
# Query the Database (Semantic Search)

# To retrieve similar documents, generate an embedding for the query and perform a nearest neighbor search.

In [10]:
query_text = "Tell me about neural networks"
query_embedding = generate_embedding(query_text)

# Perform similarity search
results = collection.query(
    query_embeddings=[query_embedding],
    n_results=2  # Number of closest matches
)

# Display search results
for i, result in enumerate(results["metadatas"][0]):
    print(f"Result {i+1}: {result['text']}")


Result 1: Explain deep learning and its applications.
Result 2: What is machine learning?


In [11]:
# Use Google Gemini for Text Generation

# You can use Google Gemini API to generate answers based on the retrieved documents.

In [16]:
def get_gemini_response(prompt):
    model = genai.GenerativeModel("gemini-2.0-flash")
    response = model.generate_content(prompt)
    return response.text

# Generate response based on retrieved data
if results["metadatas"][0]:
    context = results["metadatas"][0][0]["text"]
    response = get_gemini_response(f"Explain in detail: {context}")
    print("\nAI Response:\n", response)
else:
    print("No relevant results found.")



AI Response:
 ## Deep Learning: A Deep Dive

Deep learning is a subfield of machine learning that focuses on artificial neural networks with multiple layers (hence, "deep"). These layers learn increasingly complex representations of the input data, allowing the network to perform intricate tasks like image recognition, natural language processing, and audio analysis. In essence, deep learning mimics the way the human brain processes information, but on a massive scale.

**Here's a breakdown of deep learning:**

**1. Foundation: Artificial Neural Networks (ANNs)**

At its core, deep learning relies on artificial neural networks.  Think of these as simplified models of the human brain. An ANN consists of interconnected nodes (neurons) organized in layers.

*   **Input Layer:** Receives the raw input data (e.g., pixels of an image, words in a sentence).
*   **Hidden Layers:**  Transform the input data into more abstract and meaningful representations.  This is where the "deep" part comes