Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ENH] Simplify auth and correctly overwrite singleton tenant+db (#1970)
## Reviewer notes - I would recommend avoiding the diff and reviewing the end state as if it were fresh code. Trying to understand the diff will lead to madness. - Crossed-out notes left for posterity but outdated. - <s>`ServerAuthenticationProvider` returns `None` if it fails but `ServerAuthorizationProvider` raises an error -- should we standardize this behavior? - The FastAPI boilerplate in `chromadb/server/fastapi/__init__.py` is a little unwieldy. I'm fine with it since this file is mostly boilerplate and we very infrequently update it, but I'm open to ideas for how to make it cleaner. - However, everything should probably be async yeah? - `ServerAuthenticationProvider` now has the job of overwriting singleton tenant and database if the setting is specified and it's possible. This leads to a bit of weirdness where we have to call into this twice -- once in the authn/authz flow and then again in the core request flow once authn/authz is complete. Two questions about this: - Should the overwrite functionality live somewhere else? `ServerAuthenticationProvider` is the arbiter of user identity so it seems like a natural place in theory.. - Should `authenticate_and_authorize_or_raise()` just return the new tenant and db if they exist? That feels like a weird mix of responsibilities but would mean we only need to call the method once. The method is very very cheap though.</s> ## Description of changes ### Summary and end state - This PR takes away two minor functionalities from users. Other than these, it does not reduce user abilities. - If anyone overrode `ClientAuthProtocolAdapter` to periodically re-authenticate their clients (unlikely imo) they are no longer able to do so. If someone tells us they have actually been doing this it will be easy to add re-auth functionality back. - We don’t offer a baked-in auth scheme which requires re-authentication. - Users now must store auth info (e.g. usernames and passwords) in files instead of passing them directly as config to their `Client` or `Server`. I’m open to adding this back. - Client auth is simpler. It now all takes place in a single configurable `ClientAuthProvider` class. - Server authn is simpler. It now all takes place in a single configurable `ServerAuthnProvider`. - Server authz is simpler. It now all takes place in a single configurable `ServerAuthzProvider`. - Server authn and authz are now explicitly executed as part of each API handler. This is also where we overwrite request tenant and db if we configured to. See `authenticate_and_authorize_or_raise` in `chromadb/server/fastapi/__init__.py`. - Overwriting singleton tenant and db from auth is now much clearer and well-tested. ### Overall changes - Organized `Settings`. - Wrote or cleaned up docstrings for all auth-related classes. - Best-effort clean up of every formatting and linting error in every file I touched. - Combined `chroma_server_auth_token_transport_header` and `chroma_client_server_auth_transport_header` into `chroma_auth_token_transport_header`. - Deleted `AuthenticationError` as it was unused. - Deleted the registry (`registry.py`). It’s a pattern we should either have across our entire codebase or nowhere — as it stands, it only served to make the code harder to read while barely simplifying users’ configs. - Renamed everything server-side to “authn” or “authz”. Nothing server-side has only “auth” in it now unless it applies to both (e.g. `chroma_auth_token_transport_header`, `chroma_server_auth_ignore_paths`). - I eliminated all server middleware. I would like to use it, but that would require doing deep request inspection (especially once we enable e.g. collection-level auth). This would mean we essentially maintain a completely separate schema of paths and request fields — it seems best in the short, medium, and long runs to explicitly do authentication and authorization as part of handler bodies. ### Client-side - Deleted `ClientAuthConfigurationProvider` as it was unused. - Deleted `ClientAuthCredentialsProvider` as it was extremely small and only ever used by `ClientAuthProvider`. Folded its functionality into `ClientAuthProvider`. - Deleted `AuthInfoType` as it was doing nothing (we only supported `AuthInfoType.HEADER` in practice). - Deleted `ClientAuthResponse` as it was just a dict for headers. Replaced with `ClientAuthHeaders` type. - Deleted `ClientAuthProtocolAdapter` as it was just a complicated way to call `ClientAuthProvider.authenticate()` and inject headers. Folded this functionality into `ClientAuthProvider` and the FastAPI client directly. - Deleted `TokenAuthHeader` as it was only used in one place (`TokenAuthClientProvider.authenticate()`). Folded it into said place. ### Server-side authn - Configuration - Deleted `chroma_server_auth_configuration_provider` as it was unused. - Deleted `chroma_server_auth_configuration_file` as it was unused. - Deleted `ServerAuthConfigurationProvider` as it was unused. - This leaves no configuration besides the creds file. This is fine. There’s no configuration for our built-in authn providers. If users create their own, they can configure them as they’d like with files or env vars. - Credentials - Deleted `chroma_server_auth_credentials` — we now require users to set up auth through a credentials file. - Deleted `HtpasswdConfigurationServerAuthCredentialsProvider` and `TokenConfigServerAuthCredentialsProvider` as they are unused now that `chroma_server_auth_credentials` is gone. - With `HtpasswdConfigurationServerAuthCredentialsProvider` gone, the only subclass of `HtpasswdServerAuthCredentialsProvider` was `HtpasswdFileServerAuthCredentialsProvider` so I combined them. - User identity + request representation - Deleted `UserIdentity` as there was only one instance of it (`SimpleUserIdentity`). - Renamed `SimpleUserIdenity` to `UserIdentity` and made it a `dataclass`. - If anyone needs to, they can subclass this and add whatever they want. - Deleted `AuthorizationRequestContext`. It was just a wrapper around the request and nothing else. - Request and Response representation - Deleted `ServerAuthenticationRequest` as it was only ever used to wrap a dict lookup in several layers of indirection. Replaced with raw starlette `Headers`. - Deleted the `ServerAuthenticationResponse` abstraction as it was only ever used as a `SimpleServerAuthenticationResponse` and `FastAPIServerAuthenticationResponse`, two subclasses which contained the same data. - Replaced all instances of `ServerAuthenticationResponse`s with an `Optional[UserIdentity]` which is all they really were. - Middleware - Deleted `ChromaAuthMiddleware`, `FastAPIChromaAuthMiddleware` and `FastAPIChromaAuthMiddlewareWrapper` in favor of explicit authn within request handlers. - `ServerAuthProviders`, `AbstractCredentials`, and `ServerAuthCredentialsProviders` - We previously had two each of `SecretStrAbstractCreds`, `ServerAuthProvider`, and `ServerAuthCredentialsProvider`: one `Basic` and one `Token`. (So we had `BasicAuthCredentials` + `TokenAuthCredentials`, `BasicAuthServerProvider` + `TokenAuthServerProvider`, etc.) You could only ever use all `Basic` or all `Token`; they didn’t mix and match as they had a lot of shared assumptions about e.g. which keys existed in which dicts. - To make things extra confusing, the `BasicAuthServerCredentialsProvider` was actually called `HtpasswdServerAuthCredentialsProvider` and lived in `providers.py`. - I deleted all the `Credentials` classes and `AuthCredentialsProvider` classes. Folded all this functionality into two auth providers: `BasicAuthServerProvider` and `TokenAuthServerProvider`. - Deleted `chroma_server_auth_credentials_provider` since we no longer have separate `ServerAuthCredentialsProvider`s. - Added functionality to `ServerAuthenticationProvider` to get a user’s singleton tenant and database if they exist and if the relevant setting is `True`. ### Server-side authz - Deleted `chroma_server_authz_config` as it seems like nobody was using it. - Deleted `chroma_server_authz_ignore_paths` since it is pretty much the same as `chroma_server_auth_ignore_paths`. We now use `chroma_server_auth_ignore_paths` for both. - Deleted `AuthzUser` as it was identical to `UserIdentity` except for one unused field. - Deleted `ChromaAuthzMiddleware`, `FastAPIChromaAuthzMiddleware` and `FastAPIChromaAuthzMiddlewareWrapper` in favor of explicit authz within request handlers. - Deleted `AuthzResourceTypes` as it only ever contained data which was trivially derivable from the action being taken and was only used to print. - Modified `AuthzResource` to explicitly have all the fields it may need. We can extend it in the future, or users can override it. - Deleted `AuthzAction` as it was only ever used to print an error string and only ever contained information included in `AuthzResourceActions`. - Renamed `AuthzResourceActions` to `AuthzAction` since it’s just an enum representing the set of actions we might want to run authz checks on. - Deleted `AuthorizationContext` as it was just a wrapper around some args. We pass args directly now. - Deleted `LocalUserConfigAuthorizationConfigurationProvider` as all it did was load a config file and it was only ever used in `SimpleRBACAuthorizationProvider`. Folded its functionality (opening a file) into `SimpleRBACAuthorizationProvider`. - Deleted all the dynamic auth stuff. We now explicitly do authn and authz in request handlers. ### Testing - Added tests for the new `ServerAuthenticationProvider` behavior (deciding which paths to ignore for auth based on config, and finding a user’s singleton tenant and database if `chroma_overwrite_singleton_tenant_database_access_from_auth` = `True`. - Added tests for the newly supported multiple-usernames-and-passwords-in-htpasswd-file flow. - Split out everything to manage token + RBAC config for tests into `chromadb/test/auth/strategies.py`. Open to a different home for these utilities. - Simplified the token authn and rbac authz tests. - Wrote a new property test for `ServerAuthenticationProvider's` new functionality to specify a tenant and database to overwrite those passed by the user. We want to make sure this is done correctly on all current and future API endpoints. ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?*
- Loading branch information