Skip to content

Commit

Permalink
[ENH] Simplify auth and correctly overwrite singleton tenant+db (#1970)
Browse files Browse the repository at this point in the history
## Reviewer notes

- I would recommend avoiding the diff and reviewing the end state as if
it were fresh code. Trying to understand the diff will lead to madness.
- Crossed-out notes left for posterity but outdated.
- <s>`ServerAuthenticationProvider` returns `None` if it fails but
`ServerAuthorizationProvider` raises an error -- should we standardize
this behavior?
- The FastAPI boilerplate in `chromadb/server/fastapi/__init__.py` is a
little unwieldy. I'm fine with it since this file is mostly boilerplate
and we very infrequently update it, but I'm open to ideas for how to
make it cleaner.
  - However, everything should probably be async yeah?
- `ServerAuthenticationProvider` now has the job of overwriting
singleton tenant and database if the setting is specified and it's
possible. This leads to a bit of weirdness where we have to call into
this twice -- once in the authn/authz flow and then again in the core
request flow once authn/authz is complete. Two questions about this:
- Should the overwrite functionality live somewhere else?
`ServerAuthenticationProvider` is the arbiter of user identity so it
seems like a natural place in theory..
- Should `authenticate_and_authorize_or_raise()` just return the new
tenant and db if they exist? That feels like a weird mix of
responsibilities but would mean we only need to call the method once.
The method is very very cheap though.</s>

## Description of changes

### Summary and end state

- This PR takes away two minor functionalities from users. Other than
these, it does not reduce user abilities.
- If anyone overrode `ClientAuthProtocolAdapter` to periodically
re-authenticate their clients (unlikely imo) they are no longer able to
do so. If someone tells us they have actually been doing this it will be
easy to add re-auth functionality back.
- We don’t offer a baked-in auth scheme which requires
re-authentication.
- Users now must store auth info (e.g. usernames and passwords) in files
instead of passing them directly as config to their `Client` or
`Server`. I’m open to adding this back.
- Client auth is simpler. It now all takes place in a single
configurable `ClientAuthProvider` class.
- Server authn is simpler. It now all takes place in a single
configurable `ServerAuthnProvider`.
- Server authz is simpler. It now all takes place in a single
configurable `ServerAuthzProvider`.
- Server authn and authz are now explicitly executed as part of each API
handler. This is also where we overwrite request tenant and db if we
configured to. See `authenticate_and_authorize_or_raise` in
`chromadb/server/fastapi/__init__.py`.
- Overwriting singleton tenant and db from auth is now much clearer and
well-tested.

### Overall changes

- Organized `Settings`.
- Wrote or cleaned up docstrings for all auth-related classes.
- Best-effort clean up of every formatting and linting error in every
file I touched.
- Combined `chroma_server_auth_token_transport_header` and
`chroma_client_server_auth_transport_header` into
`chroma_auth_token_transport_header`.
- Deleted `AuthenticationError` as it was unused.
- Deleted the registry (`registry.py`). It’s a pattern we should either
have across our entire codebase or nowhere — as it stands, it only
served to make the code harder to read while barely simplifying users’
configs.
- Renamed everything server-side to “authn” or “authz”. Nothing
server-side has only “auth” in it now unless it applies to both (e.g.
`chroma_auth_token_transport_header`,
`chroma_server_auth_ignore_paths`).
- I eliminated all server middleware. I would like to use it, but that
would require doing deep request inspection (especially once we enable
e.g. collection-level auth). This would mean we essentially maintain a
completely separate schema of paths and request fields — it seems best
in the short, medium, and long runs to explicitly do authentication and
authorization as part of handler bodies.

### Client-side

- Deleted `ClientAuthConfigurationProvider` as it was unused.
- Deleted `ClientAuthCredentialsProvider` as it was extremely small and
only ever used by `ClientAuthProvider`. Folded its functionality into
`ClientAuthProvider`.
- Deleted `AuthInfoType` as it was doing nothing (we only supported
`AuthInfoType.HEADER` in practice).
- Deleted `ClientAuthResponse` as it was just a dict for headers.
Replaced with `ClientAuthHeaders` type.
- Deleted `ClientAuthProtocolAdapter` as it was just a complicated way
to call `ClientAuthProvider.authenticate()` and inject headers. Folded
this functionality into `ClientAuthProvider` and the FastAPI client
directly.
- Deleted `TokenAuthHeader` as it was only used in one place
(`TokenAuthClientProvider.authenticate()`). Folded it into said place.

### Server-side authn

- Configuration
- Deleted `chroma_server_auth_configuration_provider` as it was unused.
    - Deleted `chroma_server_auth_configuration_file` as it was unused.
    - Deleted `ServerAuthConfigurationProvider` as it was unused.
- This leaves no configuration besides the creds file. This is fine.
There’s no configuration for our built-in authn providers. If users
create their own, they can configure them as they’d like with files or
env vars.
- Credentials
- Deleted `chroma_server_auth_credentials` — we now require users to set
up auth through a credentials file.
- Deleted `HtpasswdConfigurationServerAuthCredentialsProvider` and
`TokenConfigServerAuthCredentialsProvider` as they are unused now that
`chroma_server_auth_credentials` is gone.
- With `HtpasswdConfigurationServerAuthCredentialsProvider` gone, the
only subclass of `HtpasswdServerAuthCredentialsProvider` was
`HtpasswdFileServerAuthCredentialsProvider` so I combined them.
- User identity + request representation
- Deleted `UserIdentity` as there was only one instance of it
(`SimpleUserIdentity`).
- Renamed `SimpleUserIdenity` to `UserIdentity` and made it a
`dataclass`.
- If anyone needs to, they can subclass this and add whatever they want.
- Deleted `AuthorizationRequestContext`. It was just a wrapper around
the request and nothing else.
- Request and Response representation
- Deleted `ServerAuthenticationRequest` as it was only ever used to wrap
a dict lookup in several layers of indirection. Replaced with raw
starlette `Headers`.
- Deleted the `ServerAuthenticationResponse` abstraction as it was only
ever used as a `SimpleServerAuthenticationResponse` and
`FastAPIServerAuthenticationResponse`, two subclasses which contained
the same data.
- Replaced all instances of `ServerAuthenticationResponse`s with an
`Optional[UserIdentity]` which is all they really were.
- Middleware
- Deleted `ChromaAuthMiddleware`, `FastAPIChromaAuthMiddleware` and
`FastAPIChromaAuthMiddlewareWrapper` in favor of explicit authn within
request handlers.
- `ServerAuthProviders`, `AbstractCredentials`, and
`ServerAuthCredentialsProviders`
- We previously had two each of `SecretStrAbstractCreds`,
`ServerAuthProvider`, and `ServerAuthCredentialsProvider`: one `Basic`
and one `Token`. (So we had `BasicAuthCredentials` +
`TokenAuthCredentials`, `BasicAuthServerProvider` +
`TokenAuthServerProvider`, etc.) You could only ever use all `Basic` or
all `Token`; they didn’t mix and match as they had a lot of shared
assumptions about e.g. which keys existed in which dicts.
- To make things extra confusing, the
`BasicAuthServerCredentialsProvider` was actually called
`HtpasswdServerAuthCredentialsProvider` and lived in `providers.py`.
- I deleted all the `Credentials` classes and `AuthCredentialsProvider`
classes. Folded all this functionality into two auth providers:
`BasicAuthServerProvider` and `TokenAuthServerProvider`.
- Deleted `chroma_server_auth_credentials_provider` since we no longer
have separate `ServerAuthCredentialsProvider`s.
- Added functionality to `ServerAuthenticationProvider` to get a user’s
singleton tenant and database if they exist and if the relevant setting
is `True`.

### Server-side authz

- Deleted `chroma_server_authz_config` as it seems like nobody was using
it.
- Deleted `chroma_server_authz_ignore_paths` since it is pretty much the
same as `chroma_server_auth_ignore_paths`. We now use
`chroma_server_auth_ignore_paths` for both.
- Deleted `AuthzUser` as it was identical to `UserIdentity` except for
one unused field.
- Deleted `ChromaAuthzMiddleware`, `FastAPIChromaAuthzMiddleware` and
`FastAPIChromaAuthzMiddlewareWrapper` in favor of explicit authz within
request handlers.
- Deleted `AuthzResourceTypes` as it only ever contained data which was
trivially derivable from the action being taken and was only used to
print.
- Modified `AuthzResource` to explicitly have all the fields it may
need. We can extend it in the future, or users can override it.
- Deleted `AuthzAction` as it was only ever used to print an error
string and only ever contained information included in
`AuthzResourceActions`.
- Renamed `AuthzResourceActions` to `AuthzAction` since it’s just an
enum representing the set of actions we might want to run authz checks
on.
- Deleted `AuthorizationContext` as it was just a wrapper around some
args. We pass args directly now.
- Deleted `LocalUserConfigAuthorizationConfigurationProvider` as all it
did was load a config file and it was only ever used in
`SimpleRBACAuthorizationProvider`. Folded its functionality (opening a
file) into `SimpleRBACAuthorizationProvider`.
- Deleted all the dynamic auth stuff. We now explicitly do authn and
authz in request handlers.

### Testing

- Added tests for the new `ServerAuthenticationProvider` behavior
(deciding which paths to ignore for auth based on config, and finding a
user’s singleton tenant and database if
`chroma_overwrite_singleton_tenant_database_access_from_auth` = `True`.
- Added tests for the newly supported
multiple-usernames-and-passwords-in-htpasswd-file flow.
- Split out everything to manage token + RBAC config for tests into
`chromadb/test/auth/strategies.py`. Open to a different home for these
utilities.
- Simplified the token authn and rbac authz tests.
- Wrote a new property test for `ServerAuthenticationProvider's` new
functionality to specify a tenant and database to overwrite those passed
by the user. We want to make sure this is done correctly on all current
and future API endpoints.

## Test plan
*How are these changes tested?*

- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
repository](https://github.com/chroma-core/docs)?*
  • Loading branch information
beggers committed Apr 17, 2024
1 parent f44d9e8 commit 729e657
Show file tree
Hide file tree
Showing 61 changed files with 3,014 additions and 3,547 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/chroma-integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ jobs:
- name: Install test dependencies
run: python -m pip install -r requirements.txt && python -m pip install -r requirements_dev.txt
- name: Integration Test
run: bin/integration-test ${{ matrix.testfile }}
run: bin/python-integration-test ${{ matrix.testfile }}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Chroma Client Integration Tests
name: Chroma Python Client Integration Tests

on:
push:
Expand Down Expand Up @@ -27,5 +27,5 @@ jobs:
python-version: ${{ matrix.python }}
- name: Install test dependencies
run: python -m pip install -r requirements.txt && python -m pip install -r requirements_dev.txt
- name: Test
- name: Python Client Test
run: clients/python/integration-test.sh
21 changes: 21 additions & 0 deletions .github/workflows/chroma-ts-client-integration-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Chroma Typescript Client Integration Tests

on:
push:
branches:
- main
pull_request:
branches:
- main
- '**'
workflow_dispatch:

jobs:
test:
timeout-minutes: 90
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: TS Client Test
run: bin/ts-integration-test
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ go/bin/
go/**/testdata/
go/coordinator/bin/

clients/js/package-lock.json

*.log

**/data__nogit
Expand All @@ -19,6 +21,8 @@ index_data
chroma/
chroma_test_data/
server.htpasswd
server.authz
server.authn

.venv
venv
Expand Down
6 changes: 5 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
{
"git.ignoreLimitWarning": true,
"editor.rulers": [
88
79
],
"editor.formatOnSave": true,
"python.formatting.provider": "black",
"python.formatting.blackArgs": [
"--line-length",
"79"
],
"files.exclude": {
"**/__pycache__": true,
"**/.ipynb_checkpoints": true,
Expand Down
24 changes: 24 additions & 0 deletions bin/python-integration-test
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env bash

set -e

export CHROMA_PORT=8000

function cleanup {
docker compose -f docker-compose.test.yml down --rmi local --volumes
}

trap cleanup EXIT

docker compose -f docker-compose.test.yml up --build -d

export CHROMA_INTEGRATION_TEST_ONLY=1
export CHROMA_API_IMPL=chromadb.api.fastapi.FastAPI
export CHROMA_SERVER_HOST=localhost
export CHROMA_SERVER_HTTP_PORT=8000
export CHROMA_SERVER_NOFILE=65535

echo testing: python -m pytest "$@"
python -m pytest "$@"

docker compose down
33 changes: 10 additions & 23 deletions bin/integration-test → bin/ts-integration-test
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,7 @@

set -e

export CHROMA_PORT=8000

function cleanup {
docker compose -f docker-compose.test.yml down --rmi local --volumes
rm server.htpasswd .chroma_env
}

Expand All @@ -15,25 +12,22 @@ function setup_auth {
basic)
docker run --rm --entrypoint htpasswd httpd:2 -Bbn admin admin > server.htpasswd
cat <<EOF > .chroma_env
CHROMA_SERVER_AUTH_CREDENTIALS_FILE="/chroma/server.htpasswd"
CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.providers.HtpasswdFileServerAuthCredentialsProvider"
CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.basic.BasicAuthServerProvider"
CHROMA_SERVER_AUTHN_CREDENTIALS_FILE="/chroma/server.htpasswd"
CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.basic_authn.BasicAuthenticationServerProvider"
EOF
;;
token)
cat <<EOF > .chroma_env
CHROMA_SERVER_AUTH_CREDENTIALS="test-token"
CHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER="AUTHORIZATION"
CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
CHROMA_AUTH_TOKEN_TRANSPORT_HEADER="AUTHORIZATION"
CHROMA_SERVER_AUTHN_CREDENTIALS="test-token"
CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.token_authn.TokenAuthenticationServerProvider"
EOF
;;
xtoken)
cat <<EOF > .chroma_env
CHROMA_SERVER_AUTH_CREDENTIALS="test-token"
CHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER="X_CHROMA_TOKEN"
CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER="chromadb.auth.token.TokenConfigServerAuthCredentialsProvider"
CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.token.TokenAuthServerProvider"
CHROMA_AUTH_TOKEN_TRANSPORT_HEADER="X_CHROMA_TOKEN"
CHROMA_SERVER_AUTHN_CREDENTIALS="test-token"
CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.token_authn.TokenAuthenticationServerProvider"
EOF
;;
*)
Expand All @@ -45,25 +39,18 @@ EOF

trap cleanup EXIT

docker compose -f docker-compose.test.yml up --build -d

export CHROMA_INTEGRATION_TEST_ONLY=1
export CHROMA_API_IMPL=chromadb.api.fastapi.FastAPI
export CHROMA_SERVER_HOST=localhost
export CHROMA_PORT=8000
export CHROMA_SERVER_HTTP_PORT=8000
export CHROMA_SERVER_NOFILE=65535

echo testing: python -m pytest "$@"
python -m pytest "$@"

cd clients/js

# moved off of yarn to npm to fix issues with jackspeak/cliui/string-width versions #1314
npm install
npm run test:run

docker compose down
cd ../..

for auth_type in basic token xtoken; do
echo "Testing $auth_type auth"
setup_auth "$auth_type"
Expand Down
6 changes: 3 additions & 3 deletions chromadb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import logging
from chromadb.api.client import Client as ClientCreator
from chromadb.api.client import AdminClient as AdminClientCreator
from chromadb.auth.token import TokenTransportHeader
from chromadb.auth.token_authn import TokenTransportHeader
import chromadb.config
from chromadb.config import DEFAULT_DATABASE, DEFAULT_TENANT, Settings
from chromadb.api import AdminAPI, ClientAPI
Expand Down Expand Up @@ -246,9 +246,9 @@ def CloudClient(
# Always use SSL for cloud
settings.chroma_server_ssl_enabled = enable_ssl

settings.chroma_client_auth_provider = "chromadb.auth.token.TokenAuthClientProvider"
settings.chroma_client_auth_provider = "chromadb.auth.token_authn.TokenAuthClientProvider"
settings.chroma_client_auth_credentials = api_key
settings.chroma_client_auth_token_transport_header = (
settings.chroma_auth_token_transport_header = (
TokenTransportHeader.X_CHROMA_TOKEN.name
)

Expand Down
31 changes: 8 additions & 23 deletions chromadb/api/fastapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,6 @@
from chromadb.auth import (
ClientAuthProvider,
)
from chromadb.auth.providers import RequestsClientAuthProtocolAdapter
from chromadb.auth.registry import resolve_provider
from chromadb.config import DEFAULT_DATABASE, DEFAULT_TENANT, Settings, System
from chromadb.telemetry.opentelemetry import (
OpenTelemetryClient,
Expand Down Expand Up @@ -114,33 +112,20 @@ def __init__(self, system: System):
default_api_path=system.settings.chroma_server_api_default_path,
)

self._session = requests.Session()

self._header = system.settings.chroma_server_headers
if (
system.settings.chroma_client_auth_provider
and system.settings.chroma_client_auth_protocol_adapter
):
self._auth_provider = self.require(
resolve_provider(
system.settings.chroma_client_auth_provider, ClientAuthProvider
)
)
self._adapter = cast(
RequestsClientAuthProtocolAdapter,
system.require(
resolve_provider(
system.settings.chroma_client_auth_protocol_adapter,
RequestsClientAuthProtocolAdapter,
)
),
)
self._session = self._adapter.session
else:
self._session = requests.Session()
if self._header is not None:
self._session.headers.update(self._header)
if self._settings.chroma_server_ssl_verify is not None:
self._session.verify = self._settings.chroma_server_ssl_verify

if system.settings.chroma_client_auth_provider:
self._auth_provider = self.require(ClientAuthProvider)
_headers = self._auth_provider.authenticate()
for header, value in _headers.items():
self._session.headers[header] = value.get_secret_value()

@trace_method("FastAPI.heartbeat", OpenTelemetryGranularity.OPERATION)
@override
def heartbeat(self) -> int:
Expand Down

0 comments on commit 729e657

Please sign in to comment.