<a href="https://colab.research.google.com/github/dungdzaj255/Assignment-2_3-CSD201/blob/main/Hosting_Chroma_DB_on_ec2_VM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Installing Docker on an EC2 Ubuntu VM involves a series of steps. Here's a simplified guide to get Docker installed:

1. **Update Your System**:
    - Before installing Docker, update the package database with the command:
        ```bash
        sudo apt-get update
        ```

2. **Install Prerequisites**:
    - Install the necessary packages that allow apt to use a repository over HTTPS:
        ```bash
        sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
        ```

3. **Add Docker’s Official GPG Key**:
    - Next, add the GPG key for the official Docker repository to your system:
        ```bash
        curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
        ```

4. **Add Docker Repository**:
    - Add the Docker repository to APT sources:
        ```bash
        echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
        ```

5. **Update the Package Database with Docker Packages**:
    - Update the package database with the Docker packages from the newly added repo:
        ```bash
        sudo apt-get update
        ```

6. **Install Docker CE**:
    - Now you can install Docker:
        ```bash
        sudo apt-get install docker-ce docker-ce-cli containerd.io
        ```

7. **Verify Installation**:
    - Check the Docker version to ensure the installation was successful:
        ```bash
        docker --version
        ```

### Pull and Run Docker Image

docker pull chromadb/chroma

docker run -d -p 8000:8000 chromadb/chroma

### Watch this Video if you are not familar with Chroma DB

Semantic Search with Open-Source Vector DB: Chroma DB | Pinecone Alternative

https://youtu.be/eCCHDxMaFIk?si=ROUy2n5wVvGQKLei

https://github.com/PradipNichite/Youtube-Tutorials/blob/main/chroma_db/Chroma_DB_Tutorial.ipynb

In [1]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.20.0-py2.py3-none-any.whl.metadata (2.9 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.31.0-py

In [2]:
import chromadb
chroma_client = chromadb.HttpClient(host='47.128.217.39', port=8000)

In [None]:
collection = chroma_client.get_or_create_collection(name="test") # Get a collection object from an existing collection, by name. If it doesn't exist, create it.


In [None]:
collection = chroma_client.get_collection(name="test")

In [None]:
collection.add(
    documents=["This is a document about cat", "This is a document about car"],
    metadatas=[{"category": "animal"}, {"category": "vehicle"}],
    ids=["id1", "id2"]
)

/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:01<00:00, 69.4MiB/s]


In [None]:
collection.query(
    query_texts=["bike"],
    n_results=1
)

{'ids': [['id2']],
 'distances': [[1.4951326056823981]],
 'embeddings': None,
 'metadatas': [[{'category': 'vehicle'}]],
 'documents': [['This is a document about car']]}