In [1]:
%pip install chromadb

Note: you may need to restart the kernel to use updated packages.
Collecting chromadb
  Downloading chromadb-1.1.1-cp39-abi3-win_amd64.whl (19.8 MB)
Collecting orjson>=3.9.12
  Downloading orjson-3.11.3-cp310-cp310-win_amd64.whl (131 kB)
Collecting jsonschema>=4.19.0
  Downloading jsonschema-4.25.1-py3-none-any.whl (90 kB)
Collecting kubernetes>=28.1.0
  Downloading kubernetes-34.1.0-py2.py3-none-any.whl (2.0 MB)
Collecting opentelemetry-api>=1.2.0
  Downloading opentelemetry_api-1.37.0-py3-none-any.whl (65 kB)
Collecting pybase64>=1.4.1
  Downloading pybase64-1.4.2-cp310-cp310-win_amd64.whl (35 kB)
Collecting tenacity>=8.2.3
  Using cached tenacity-9.1.2-py3-none-any.whl (28 kB)
Collecting opentelemetry-sdk>=1.2.0
  Downloading opentelemetry_sdk-1.37.0-py3-none-any.whl (131 kB)
Collecting typer>=0.9.0
  Downloading typer-0.19.2-py3-none-any.whl (46 kB)
Collecting overrides>=7.3.1
  Using cached overrides-7.7.0-py3-none-any.whl (17 kB)
Collecting bcrypt>=4.0.1
  Downloading bcrypt-5.0.

You should consider upgrading via the 'c:\Users\kumar\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


In [2]:
pip show chromadb

Name: chromadb
Version: 1.1.1
Summary: Chroma.
Home-page: 
Author: 
Author-email: Jeff Huber <jeff@trychroma.com>, Anton Troynikov <anton@trychroma.com>
License: 
Location: c:\users\kumar\appdata\local\programs\python\python310\lib\site-packages
Requires: tokenizers, opentelemetry-exporter-otlp-proto-grpc, typer, tqdm, onnxruntime, numpy, pydantic, uvicorn, build, pyyaml, overrides, bcrypt, jsonschema, opentelemetry-api, tenacity, typing-extensions, orjson, mmh3, httpx, posthog, grpcio, pybase64, opentelemetry-sdk, pypika, kubernetes, importlib-resources, rich
Required-by: 
Note: you may need to restart the kernel to use updated packages.


# 🧠 **Local ChromaDB Setup**

This guide explains how to create a **local vector database using ChromaDB**, generate **embeddings from text using an external API**, and perform **semantic search** — all stored **locally on your computer**, not on the cloud.

---

## 📘 **What You’ll Learn**

By the end of this, you’ll know:

1. What embeddings are and why they’re used.
2. How to generate embeddings using an external API.
3. How to store and search data locally using **ChromaDB**.
4. How to run semantic queries to find similar text.

---

## ⚙️ **Step 1: Install Required Libraries**

You need three Python packages:

* `chromadb` → Local vector database
* `requests` → To send API requests
* `numpy` → To handle numerical vectors

Install them using this command:

```bash
pip install chromadb requests numpy
```

---

## 🧩 **Step 2: Import the Libraries**

```python
import requests      # For sending requests to the embedding API
import numpy as np   # For numerical array operations
import chromadb      # For creating and managing local vector database
from chromadb.config import Settings  # For configuring local storage
```

✅ **Explanation:**

* `requests` helps us send text data to an API and receive embeddings (numbers).
* `numpy` stores those numbers in a numerical format that’s easy to process.
* `chromadb` is used to create, store, and search through these embeddings.
* `Settings` lets us specify where ChromaDB should save the data on your system.

---

## 🧱 **Step 3: Initialize a Local ChromaDB Client**

```python
client = chromadb.Client(
    Settings(
        chroma_db_impl="duckdb+parquet",
        persist_directory="./chroma_data"
    )
)
```

✅ **Explanation:**

* This line **starts a local ChromaDB database** on your computer.
* `chroma_db_impl="duckdb+parquet"` tells ChromaDB to use **DuckDB** (a lightweight local database) with **Parquet** file storage.
* `persist_directory="./chroma_data"` means all data will be saved in a folder called `chroma_data` on your machine.

👉 So even if you close your program or restart your system, your stored data won’t be lost.

---

## 📄 **Step 4: Create Your Text Dataset**

```python
texts = [
    "Deepak Kumar Mohanty was born in Balasore, Odisha, India, to a humble and supportive family.",
    "From an early age, he was deeply curious about how machines work and how technology shapes the world.",
    "Despite challenges, Deepak’s determination to learn and grow never faded.",
    "He earned his Bachelor’s degree in Computer Applications (BCA) from Bhadrak Autonomous College, Odisha.",
    "Deepak is a passionate Python developer and aspiring Data Scientist with strong analytical and problem-solving skills.",
    "He has hands-on experience with Python, Django, Flask, HTML, CSS, JavaScript, and various Data Science tools.",
    "Driven by curiosity, he constantly explores AI, machine learning, and data visualization to expand his expertise.",
    "Deepak created multiple real-world projects — from a Netflix clone to data analysis dashboards — showcasing both creativity and logic.",
    "He actively shares valuable Python insights and learning tips on LinkedIn to help others grow in their tech journey.",
    "Deepak’s story reflects passion, perseverance, and the belief that continuous learning can transform one’s life."
]
```

✅ **Explanation:**
This is your **dataset** — a list of sentences (called *documents* in ChromaDB).
Each sentence will be converted into a **numerical embedding**, stored in the database, and later used for **similarity searches**.

---

## 🧮 **Step 5: Generate Embeddings Using an External API**

```python
def generate_embeddings(text):
    url = "https://api.euron.one/api/v1/euri/embeddings"  # API endpoint
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer euri-47df70dff217e205cf4b860bbb11ff1556a1ab993f374b1de33cd037823e0abf"
    }
    payload = {
        "input": text,
        "model": "text-embedding-3-small"
    }

    response = requests.post(url, headers=headers, json=payload)  # Send POST request
    data = response.json()  # Convert API response to Python dictionary
    
    embedding = np.array(data['data'][0]['embedding'])  # Extract the actual numbers
    return embedding
```

✅ **Explanation:**

1. We define a function `generate_embeddings(text)` that takes a sentence as input.
2. The API (Euron’s Embedding API) converts your text into an **embedding vector** (a list of floating-point numbers).
3. `requests.post()` sends the text to the API, and `response.json()` gets the reply.
4. `embedding` stores the numeric vector returned by the API as a **NumPy array** for easy handling.

👉 **Why embeddings?**
Embeddings turn words into numbers that capture meaning.
For example:

* “Python developer” and “Software engineer” will have similar embeddings.
* “Banana” and “Car” will have very different embeddings.

---

## 🔢 **Step 6: Generate Embeddings for All Sentences**

```python
embeddings = [generate_embeddings(t) for t in texts]
```

✅ **Explanation:**
This line **loops through every sentence** in the `texts` list and calls the `generate_embeddings()` function for each one.
The result is a list of numerical vectors — one for each text.

Now you have:

* `texts` → the original text data
* `embeddings` → the numerical version of each text

---

## 🧱 **Step 7: Create a Collection in ChromaDB**

```python
collection = client.create_collection(name="kumar_collection")
```

✅ **Explanation:**

* A **collection** is like a folder or table inside ChromaDB.
* You can store multiple documents and embeddings inside it.
* Here, the collection is named `"kumar_collection"`.

👉 You can later create other collections like `"resume_data"`, `"project_notes"`, etc.

---

## 🧩 **Step 8: Add Your Data to the Collection**

```python
collection.add(
    documents=texts,                        # Original text data
    embeddings=embeddings,                  # Numeric embeddings
    ids=[str(i) for i in range(len(texts))] # Unique IDs for each text
)
```

✅ **Explanation:**
This stores everything inside ChromaDB.

* `documents`: the actual sentences.
* `embeddings`: the numerical representations.
* `ids`: unique identifiers (`"0"`, `"1"`, `"2"`, …) for each text.

Now ChromaDB knows which embedding belongs to which document.

---

## 📊 **Step 9: Check What’s Stored**

```python
print(collection.count())   # Shows total number of stored items
print(collection.get())     # Retrieves stored data (texts, embeddings, IDs)
```

✅ **Explanation:**

* `count()` tells you how many documents exist in your collection.
* `get()` returns everything that’s stored — including your documents and IDs.

---

## 🔍 **Step 10: Query Your Database (Semantic Search)**

Now let’s find which stored sentences are **most similar** to a new input query.

```python
query = "hands-on experience with Python"
embed_query = generate_embeddings(query)

results = collection.query(
    query_embeddings=[embed_query],
    n_results=2
)

print(results)
```

✅ **Explanation:**

1. You create a new query text (`"hands-on experience with Python"`).
2. Convert it into an embedding using the same API (`generate_embeddings(query)`).
3. `collection.query()` searches inside your database for **the most similar stored embeddings**.
4. `n_results=2` means you want the top 2 most similar sentences.

---

## 🧾 **Sample Output**

```python
{
  'ids': [['5', '4']],
  'documents': [[
      "He has hands-on experience with Python, Django, Flask, HTML, CSS, JavaScript, and various Data Science tools.",
      "Deepak is a passionate Python developer and aspiring Data Scientist with strong analytical and problem-solving skills."
  ]],
  'distances': [[0.7931717038154602, 1.0989384651184082]]
}
```

✅ **Explanation:**

* `'ids'`: IDs of the most similar documents found.
* `'documents'`: The actual text results.
* `'distances'`: How close each result is to your query.

  * Smaller distance = higher similarity.

So the result shows that:

> The query “hands-on experience with Python” is most similar to the sentence about Deepak’s hands-on Python experience (ID 5).

---

## 💾 **Step 11: Saving and Reloading Data**

Your data is already saved locally (because we used `persist_directory="./chroma_data"`).
If you restart your Python program, you can reload the same collection like this:

```python
from chromadb.config import Settings
import chromadb

client = chromadb.Client(
    Settings(
        chroma_db_impl="duckdb+parquet",
        persist_directory="./chroma_data"
    )
)

collection = client.get_collection("kumar_collection")
```

✅ **Explanation:**

* This reopens the database stored in `./chroma_data`.
* You can now continue querying or adding new data without losing anything.

---

## 🧠 **Step 12: Full Working Code (Everything Together)**

```python
import requests
import numpy as np
import chromadb
from chromadb.config import Settings

# 1️⃣ Initialize ChromaDB (local persistent mode)
client = chromadb.Client(
    Settings(chroma_db_impl="duckdb+parquet", persist_directory="./chroma_data")
)

# 2️⃣ Prepare the text dataset
texts = [
    "Deepak Kumar Mohanty was born in Balasore, Odisha, India, to a humble and supportive family.",
    "From an early age, he was deeply curious about how machines work and how technology shapes the world.",
    "Despite challenges, Deepak’s determination to learn and grow never faded.",
    "He earned his Bachelor’s degree in Computer Applications (BCA) from Bhadrak Autonomous College, Odisha.",
    "Deepak is a passionate Python developer and aspiring Data Scientist with strong analytical and problem-solving skills.",
    "He has hands-on experience with Python, Django, Flask, HTML, CSS, JavaScript, and various Data Science tools.",
    "Driven by curiosity, he constantly explores AI, machine learning, and data visualization to expand his expertise.",
    "Deepak created multiple real-world projects — from a Netflix clone to data analysis dashboards — showcasing both creativity and logic.",
    "He actively shares valuable Python insights and learning tips on LinkedIn to help others grow in their tech journey.",
    "Deepak’s story reflects passion, perseverance, and the belief that continuous learning can transform one’s life."
]

# 3️⃣ Function to generate embeddings
def generate_embeddings(text):
    url = "https://api.euron.one/api/v1/euri/embeddings"
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer euri-47df70dff217e205cf4b860bbb11ff1556a1ab993f374b1de33cd037823e0abf"
    }
    payload = {"input": text, "model": "text-embedding-3-small"}

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()
    embedding = np.array(data['data'][0]['embedding'])
    return embedding

# 4️⃣ Generate embeddings for all texts
embeddings = [generate_embeddings(t) for t in texts]

# 5️⃣ Create a collection and add data
collection = client.create_collection(name="kumar_collection")
collection.add(documents=texts, embeddings=embeddings, ids=[str(i) for i in range(len(texts))])

# 6️⃣ Query for similar text
query = "hands-on experience with Python"
embed_query = generate_embeddings(query)
results = collection.query(query_embeddings=[embed_query], n_results=2)

# 7️⃣ Show results
print(results)
```

---

## 🎯 **Final Summary**

| Step | What It Does        | Explanation                                   |
| ---- | ------------------- | --------------------------------------------- |
| 1    | Initialize ChromaDB | Creates a local database folder to store data |
| 2    | Prepare texts       | List of sentences you want to store           |
| 3    | Generate embeddings | Converts text → numeric vectors               |
| 4    | Add to collection   | Stores texts + embeddings in ChromaDB         |
| 5    | Query               | Searches for the most similar texts           |
| 6    | Reload              | Allows reuse of the same local database later |

---

In [6]:
texts = [
    "Deepak Kumar Mohanty was born in Balasore, Odisha, India, to a humble and supportive family.",
    "From an early age, he was deeply curious about how machines work and how technology shapes the world.",
    "Despite challenges, Deepak’s determination to learn and grow never faded.",
    "He earned his Bachelor’s degree in Computer Applications (BCA) from Bhadrak Autonomous College, Odisha.",
    "Deepak is a passionate Python developer and aspiring Data Scientist with strong analytical and problem-solving skills.",
    "He has hands-on experience with Python, Django, Flask, HTML, CSS, JavaScript, and various Data Science tools.",
    "Driven by curiosity, he constantly explores AI, machine learning, and data visualization to expand his expertise.",
    "Deepak created multiple real-world projects — from a Netflix clone to data analysis dashboards — showcasing both creativity and logic.",
    "He actively shares valuable Python insights and learning tips on LinkedIn to help others grow in their tech journey.",
    "Deepak’s story reflects passion, perseverance, and the belief that continuous learning can transform one’s life."
]


In [8]:

import requests
import numpy as np

def generate_embeddings(text):
    url = "https://api.euron.one/api/v1/euri/embeddings"
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer euri-47df70dff217e205cf4b860bbb11ff1556a1ab993f374b1de33cd037823e0abf"
    }
    payload = {
        "input": text,
        "model": "text-embedding-3-small"
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()
    
    embedding = np.array(data['data'][0]['embedding'])
    
    return embedding

In [17]:
embeddings = [generate_embeddings(text) for text in texts]

In [18]:
embeddings

[array([ 0.05878288,  0.01717264, -0.0338271 , ...,  0.00438903,
        -0.0014833 , -0.02586778], shape=(1536,)),
 array([ 0.00860733,  0.00794669, -0.03793576, ..., -0.0218264 ,
         0.00357315, -0.02698444], shape=(1536,)),
 array([ 0.00157107, -0.00291296,  0.01863574, ..., -0.00658583,
         0.01002349, -0.01296661], shape=(1536,)),
 array([ 0.01711475, -0.01486012,  0.04820827, ..., -0.01766816,
         0.02066069,  0.0024327 ], shape=(1536,)),
 array([ 0.03916692, -0.0105021 ,  0.01395733, ..., -0.02891487,
         0.02814199, -0.01163869], shape=(1536,)),
 array([-0.03450021,  0.01461001,  0.0322181 , ..., -0.02877255,
        -0.0201475 , -0.02002444], shape=(1536,)),
 array([-0.01525764, -0.01136182,  0.02394184, ..., -0.03263809,
        -0.03666659, -0.01676531], shape=(1536,)),
 array([ 0.02259899,  0.0227703 ,  0.02242768, ..., -0.02432639,
         0.02302727, -0.02324141], shape=(1536,)),
 array([-0.01527787, -0.06384856,  0.00627351, ..., -0.02923696,
       

In [19]:
type(embeddings)

list

In [20]:
len(embeddings)

10

In [21]:
len(embeddings[0])

1536

In [None]:
import chromadb
client = chromadb.Client()

In [22]:
collection = client.create_collection(name="kumar_collection")

In [None]:
collection.add(
    documents=texts,
    embeddings=embeddings,
    ids=[str(i) for i in range(len(texts))]
)

In [27]:
for i in range(len(texts)):
    print(str(i),type(str(i)))

0 <class 'str'>
1 <class 'str'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
5 <class 'str'>
6 <class 'str'>
7 <class 'str'>
8 <class 'str'>
9 <class 'str'>


In [26]:
for i in range(len(texts)):
    print(i,type(i))

0 <class 'int'>
1 <class 'int'>
2 <class 'int'>
3 <class 'int'>
4 <class 'int'>
5 <class 'int'>
6 <class 'int'>
7 <class 'int'>
8 <class 'int'>
9 <class 'int'>


In [29]:
collection.count()

10

In [30]:
collection.get()

{'ids': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
 'embeddings': None,
 'documents': ['Deepak Kumar Mohanty was born in Balasore, Odisha, India, to a humble and supportive family.',
  'From an early age, he was deeply curious about how machines work and how technology shapes the world.',
  'Despite challenges, Deepak’s determination to learn and grow never faded.',
  'He earned his Bachelor’s degree in Computer Applications (BCA) from Bhadrak Autonomous College, Odisha.',
  'Deepak is a passionate Python developer and aspiring Data Scientist with strong analytical and problem-solving skills.',
  'He has hands-on experience with Python, Django, Flask, HTML, CSS, JavaScript, and various Data Science tools.',
  'Driven by curiosity, he constantly explores AI, machine learning, and data visualization to expand his expertise.',
  'Deepak created multiple real-world projects — from a Netflix clone to data analysis dashboards — showcasing both creativity and logic.',
  'He actively 

In [48]:
query = "hands-on experience with Python"

In [49]:
embed_query = generate_embeddings(query)

In [50]:
embed_query

array([-0.00996748,  0.00263056,  0.00653042, ..., -0.03171746,
       -0.00762787, -0.01103478], shape=(1536,))

In [52]:
collection.query(query_embeddings=[embed_query], n_results=2)

{'ids': [['5', '4']],
 'embeddings': None,
 'documents': [['He has hands-on experience with Python, Django, Flask, HTML, CSS, JavaScript, and various Data Science tools.',
   'Deepak is a passionate Python developer and aspiring Data Scientist with strong analytical and problem-solving skills.']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[None, None]],
 'distances': [[0.7931717038154602, 1.0989384651184082]]}

# ☁️ **ChromaDB Cloud Setup**

## 📘 **Overview**

This guide explains how to:

* Connect to **ChromaDB Cloud** using your API key and tenant ID.
* Create a collection in your cloud database.
* Upload texts and embeddings to store them online.
* Manage your data in the cloud just like in the local version.

The cloud version works the same way as local ChromaDB — but all your data is saved on Chroma’s servers, not your computer.

---

## ⚙️ **Step 1: Install Required Packages**

Before starting, make sure the `chromadb` library is installed:

```bash
pip install chromadb
```

✅ **Explanation:**
This installs the official ChromaDB client library, which allows your Python code to communicate with both **local** and **cloud** databases.

---

## 🧩 **Step 2: Import the Library**

```python
import chromadb
```

✅ **Explanation:**
You import the main ChromaDB module that lets you create clients, collections, and add/query data.

---

## ☁️ **Step 3: Connect to Chroma Cloud**

```python
client = chromadb.CloudClient(
    api_key='ck-GnGngFgXZWvnZaT98NcpabLUGQzait46EACX4QUgYEUf',
    tenant='1090dceb-688e-4571-8aab-b4d321488244',
    database='test'
)
```

✅ **Explanation:**

Here’s what each parameter means:

| Parameter  | Description                                                                                         |
| ---------- | --------------------------------------------------------------------------------------------------- |
| `api_key`  | Your **personal secret key** that allows access to your cloud database. (Always keep this private!) |
| `tenant`   | Your **tenant ID**, which identifies your organization or account in Chroma Cloud.                  |
| `database` | The specific **database name** you want to work with inside your tenant.                            |

💡 **In simple words:**
You are logging in to your cloud-based Chroma account using credentials.
Think of it like connecting to a remote SQL or Firebase database — but this one stores embeddings and vectors.

Once connected, `client` acts as your gateway to the cloud database.

---

## 🧱 **Step 4: Create a Collection**

```python
collection = client.create_collection(name="kumar_collection")
```

✅ **Explanation:**

* A **collection** is like a *table* or *folder* inside your ChromaDB cloud database.
* You can create multiple collections — each one storing a different set of data (for example: user profiles, project notes, articles, etc.).
* Here, you’re creating a new collection named `"kumar_collection"`.

---

## 🧠 **Step 5: Add Data (Texts and Embeddings)**

```python
collection.add(
    documents=texts,
    embeddings=embeddings,
    ids=[str(i) for i in range(len(texts))]
)
```

✅ **Explanation:**

Let’s break this down:

| Parameter    | Meaning                                                                                     |
| ------------ | ------------------------------------------------------------------------------------------- |
| `documents`  | The original text data (like sentences, paragraphs, or articles).                           |
| `embeddings` | The numeric vectors (lists of numbers) representing the meaning of each text.               |
| `ids`        | Unique identifiers (for example `"0"`, `"1"`, `"2"`, etc.) used to reference each document. |

💡 You must make sure that:

* The number of documents = number of embeddings = number of IDs.
  Otherwise, Chroma will throw an error.

---

### ✅ Example Data

```python
texts = [
    "Deepak Kumar Mohanty was born in Balasore, Odisha, India, to a humble and supportive family.",
    "He has hands-on experience with Python, Django, and Data Science tools."
]

embeddings = [
    [0.12, 0.54, 0.33, 0.89],  # Example embedding for text 1
    [0.77, 0.65, 0.49, 0.21]   # Example embedding for text 2
]

collection.add(
    documents=texts,
    embeddings=embeddings,
    ids=["1", "2"]
)
```

✅ **Explanation:**
Here you added 2 texts, each with its embedding vector and a unique ID.
In real usage, you’ll generate embeddings using an API (like OpenAI or Euron) before adding them here.

---

## 🔍 **Step 6: Verify That Data Was Uploaded**

Once your data is uploaded, you can check it by calling:

```python
print(collection.count())  # Number of stored items
print(collection.get())    # Retrieve stored documents and IDs
```

✅ **Explanation:**

* `count()` → Tells you how many entries exist in the collection.
* `get()` → Returns the list of all stored texts, embeddings, and IDs.

This helps confirm that your upload worked successfully.

---

## 🧠 **Step 7: Query the Cloud Database (Optional)**

If you want to search for similar sentences (like in the local setup), you can use:

```python
query = "Python developer with strong analytical skills"
embed_query = generate_embeddings(query)  # Same API function as before

results = collection.query(
    query_embeddings=[embed_query],
    n_results=2
)

print(results)
```

✅ **Explanation:**

* `generate_embeddings(query)` converts your query text into a vector.
* `collection.query()` searches through the cloud database for **most similar embeddings**.
* `n_results=2` returns the top 2 closest matches.

💡 This works the same way as local ChromaDB but uses the **remote (cloud) version**.

---

## 💾 **Step 8: Key Differences (Local vs Cloud)**

| Feature      | Local ChromaDB                     | Cloud ChromaDB                              |
| ------------ | ---------------------------------- | ------------------------------------------- |
| Data Storage | Saved on your local machine        | Stored on Chroma’s cloud servers            |
| Access       | Only accessible on your computer   | Can be accessed anywhere (with API key)     |
| Persistence  | Controlled via `persist_directory` | Automatically handled by Chroma Cloud       |
| Setup        | No API key needed                  | Requires API key, tenant, and database name |
| Use Case     | Personal or small projects         | Team collaboration or production-scale apps |

---

## 🧩 **Step 9: Complete Cloud Example**

Here’s the full working code for your cloud setup 👇

```python
import chromadb

# 1️⃣ Connect to Chroma Cloud
client = chromadb.CloudClient(
    api_key='ck-GnGngFgXZWvnZaT98NcpabLUGQzait46EACX4QUgYEUf',
    tenant='1090dceb-688e-4571-8aab-b4d321488244',
    database='test'
)

# 2️⃣ Create or get a collection
collection = client.create_collection(name="kumar_collection")

# 3️⃣ Your text data
texts = [
    "Deepak Kumar Mohanty was born in Balasore, Odisha, India, to a humble and supportive family.",
    "He has hands-on experience with Python, Django, Flask, and Data Science tools."
]

# 4️⃣ Example embeddings (you can generate these using an API)
embeddings = [
    [0.12, 0.54, 0.33, 0.89],
    [0.77, 0.65, 0.49, 0.21]
]

# 5️⃣ Add data to the collection
collection.add(
    documents=texts,
    embeddings=embeddings,
    ids=[str(i) for i in range(len(texts))]
)

# 6️⃣ Check stored data
print("Total documents:", collection.count())
print("Data stored in collection:", collection.get())
```

---

## 🔒 **Step 10: Security Notes**

* Never share your **API key** publicly (like on GitHub or in screenshots).
* If it’s accidentally exposed, regenerate a new key from your Chroma Cloud dashboard.
* Only use keys for trusted code or environments.

---

## 🎯 **Summary Table**

| Step | What It Does      | Example                                           |
| ---- | ----------------- | ------------------------------------------------- |
| 1    | Install ChromaDB  | `pip install chromadb`                            |
| 2    | Import            | `import chromadb`                                 |
| 3    | Connect to Cloud  | `chromadb.CloudClient(...)`                       |
| 4    | Create Collection | `client.create_collection("kumar_collection")`    |
| 5    | Add Data          | `collection.add(documents, embeddings, ids)`      |
| 6    | Check Data        | `collection.count()` / `collection.get()`         |
| 7    | Query             | `collection.query(query_embeddings, n_results=2)` |

---

## 🧠 **In Short**

You’ve now learned how to:

* Connect to **Chroma Cloud**
* Create a **collection**
* Add **documents + embeddings**
* Perform **semantic search queries** — all stored and managed securely in the cloud.

---

In [53]:
import chromadb

client = chromadb.CloudClient(
  api_key='ck-GnGngFgXZWvnZaT98NcpabLUGQzait46EACX4QUgYEUf',
  tenant='1090dceb-688e-4571-8aab-b4d321488244',
  database='test')

In [54]:
collection = client.create_collection(name="kumar_collection")

In [55]:
collection.add(
    documents=texts,
    embeddings=embeddings,
    ids=[str(i) for i in range(len(texts))]
)