### Initiating a persistent Chroma client

In [1]:
import chromadb

* You can configure Chroma to save and load from your local machine.
* Data will be persisted automatically and loaded on start (if it exists).

In [2]:
client = chromadb.PersistentClient(path="lecture2")

* The **path** is where Chroma will store its database files on disk, and load them on start.

* The client object has a few useful convenience methods.

* client.heartbeat() - returns a nanosecond heartbeat.
     * Useful for making sure the client remains connected.

* client.reset() # Empties and completely resets the database.
  * ⚠️ This is destructive and not reversible.

In [3]:
client.heartbeat() # returns a nanosecond heartbeat. Useful for making sure the client remains connected.
# client.reset() # Empties and completely resets the database. ⚠️ This is destructive and not reversible.

1703344297306534500

### Running Chroma in client/server mode

* Chroma can also be configured to run in client/server mode.
* In this mode, the Chroma client connects to a Chroma server running in a separate process.

* To start the Chroma server, run the following command:
  * chroma run --path /db_path

In [None]:
!chroma run --path /"lecture2"

* Then use the Chroma HTTP client to connect to the server:

In [None]:
import chromadb
chroma_client = chromadb.HttpClient(host='localhost', port=8000)

* That's it! Chroma's API will run in client-server mode with just this change.

### Using the python http-only client

* If you are running chroma in client-server mode, you may not need the full Chroma library.
* Instead, you can use the lightweight client-only library.
* In this case, you can install the chromadb-client package.
* This package is a lightweight HTTP client for the server with a minimal dependency footprint.

In [None]:
# !pip install chromadb-client

In [None]:
import chromadb
# Example setup of the client to connect to your chroma server
client = chromadb.HttpClient(host='localhost', port=8000)

* Note that the chromadb-client package is a subset of the full Chroma library and does not include all the dependencies.
* If you want to use the full Chroma library, you can install the chromadb package instead.
* Most importantly, there is no default embedding function.
* If you add() documents without embeddings, you must have manually specified an embedding function and installed the dependencies for it.

### Using collections

* Chroma lets you manage collections of embeddings, using the collection primitive.

* Creating, inspecting, and deleting Collections
    * Chroma uses collection names in the url, so there are a few restrictions on naming them:
        * The length of the name must be between 3 and 63 characters
        * The name must start and end with a lowercase letter or a digit, and it can contain dots, dashes, and underscores in between.
        * The name must not contain two consecutive dots.
        * The name must not be a valid IP address.

* Chroma collections are created with a name and an optional embedding function.
* If you supply an embedding function, you must supply it every time you get the collection.

In [4]:
# collection = client.create_collection(name="my_collection", embedding_function=emp_fn)
# collection = client.get_collection(name="my_collection", embedding_function=emp_fn)

* **CAUTION*** 
If you later wish to get_collection, you MUST do so with the embedding function you supplied while creating the collection

* The embedding function takes text as input, and performs tokenization and embedding.
* If no embedding function is supplied, Chroma will use sentence transfomer as a default.

* Existing collections can be retrieved by name with **.get_collection**, and deleted with **.delete_collection**.
* You can also use **.get_or_create_collection** to get a collection if it exists, or create it if it doesn't.

In [5]:
# Get a collection object from an existing collection, by name. Will raise an exception if it's not found.
collection = client.get_collection(name="test") 

ValueError: Collection test does not exist.

In [6]:
# Get a collection object from an existing collection, by name. If it doesn't exist, create it.
collection = client.get_or_create_collection(name="test") 

In [8]:
# Delete a collection and all associated embeddings, documents, and metadata. ⚠️ This is destructive and not reversible
client.delete_collection(name="test") 

* Collections have a few useful convenience methods.

* collection.peek() - returns a list of the first 10 items in the collection* 
collection.count()-# returns the number of items in the collectio
* 
collection.modify(name="new_name"- # Rename the collection

### Changing the distance function

* create_collection also takes an optional metadata argument which can be used to customize the distance method of the embedding space by setting the value of hnsw:space

In [None]:
 collection = client.create_collection(
        name="collection_name",
        metadata={"hnsw:space": "cosine"} # l2 is the default
    )

* Valid options for hnsw:space are "l2", "ip, "or "cosine". The default is "l2" which is the squared L2 norm.

![Alt text](img/1.png)
