Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions backend/app/api/docs/collections/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
The collections interface is designed to manage document relationships
with RAG pipelines; where a RAG pipeline is any framework that aligns
LLM responses with information from a focused corpus of documents.

Right now this endpoint tightly coupled with OpenAI [File
Search](https://platform.openai.com/docs/assistants/tools/file-search). Its
functionality, along with descriptions in this section, are therefore
centered around that.
26 changes: 26 additions & 0 deletions backend/app/api/docs/collections/create.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Setup and configure the document store that is pertinent to the RAG
pipeline:

* Make OpenAI
[File](https://platform.openai.com/docs/api-reference/files)'s from
documents stored in the cloud (see the `documents` interface).
* Create an OpenAI [Vector
Store](https://platform.openai.com/docs/api-reference/vector-stores)
based on those File's.
* Attach the Vector Store to an OpenAI
[Assistant](https://platform.openai.com/docs/api-reference/assistants). Use
parameters in the request body relevant to an Assistant to flesh out
its configuration.

If any one of the OpenAI interactions fail, all OpenAI resources are
cleaned up. If a Vector Store is unable to be created, for example,
all File's that were uploaded to OpenAI are removed from
OpenAI. Failure can occur from OpenAI being down, or some parameter
value being invalid. It can also fail due to document types not be
accepted. This is especially true for PDFs that may not be parseable.

The immediate response from the endpoint is a packet containing a
`key`. Once the collection has been created, information about the
collection will be returned to the user via the callback URL. If a
callback URL is not provided, clients can poll the `info` endpoint
with the `key` to retrieve the same information.
9 changes: 9 additions & 0 deletions backend/app/api/docs/collections/delete.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Remove a collection from the platform. This is a two step process:

1. Delete all OpenAI resources that were allocated: File's, the Vector
Store, and the Assistant.
2. Delete the collection entry from the AI platform database.

No action is taken on the documents themselves: the contents of the
documents that were a part of the collection remain unchanged, those
documents can still be accessed via the documents endpoints.
3 changes: 3 additions & 0 deletions backend/app/api/docs/collections/docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
List document IDs associated with a given collection. Documents
returned are not only stored by the AI platform, but also by OpenAI
OpenAI.
5 changes: 5 additions & 0 deletions backend/app/api/docs/collections/info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Retrieve all AI-platform information about a collection given its
ID. This route is very helpful for:

* Understanding whether a `create` request has finished
* Obtaining the OpenAI assistant ID (`llm_service_id`)
2 changes: 2 additions & 0 deletions backend/app/api/docs/collections/list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
List _active_ collections -- collections that have been created but
not deleted
3 changes: 3 additions & 0 deletions backend/app/api/docs/documents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The documents interface manages client documents intended to drive LLM
chat interactions. The platform stores documents in AWS S3, but may
put copies in other databases to facilitate RAG pipelines.
8 changes: 8 additions & 0 deletions backend/app/api/docs/documents/delete.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Perform a soft delete of the document. A soft delete makes the
document invisible. It does not delete the document from cloud storage
or its information from the database.

If the document is part of an active collection, those collections
will be deleted using the collections delete interface. Noteably, this
means all OpenAI Vector Store's and Assistant's to which this document
belongs will be deleted.
1 change: 1 addition & 0 deletions backend/app/api/docs/documents/info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Retrieve all information about a given document.
1 change: 1 addition & 0 deletions backend/app/api/docs/documents/list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
List documents uploaded to the AI platform.
2 changes: 2 additions & 0 deletions backend/app/api/docs/documents/upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Upload a document to the AI platform. The response will contain an ID,
which is the document ID required by other routes.
75 changes: 60 additions & 15 deletions backend/app/api/routes/collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@

from openai import OpenAI, OpenAIError
from fastapi import APIRouter, HTTPException, BackgroundTasks, Query
from pydantic import BaseModel, HttpUrl
from fastapi import Path as FastPath
from pydantic import BaseModel, Field, HttpUrl
from sqlalchemy.exc import NoResultFound, MultipleResultsFound, SQLAlchemyError

from app.api.deps import CurrentUser, SessionDep
Expand All @@ -17,7 +18,7 @@
from app.crud import DocumentCrud, CollectionCrud, DocumentCollectionCrud
from app.crud.rag import OpenAIVectorStoreCrud, OpenAIAssistantCrud
from app.models import Collection, Document
from app.utils import APIResponse
from app.utils import APIResponse, load_description

router = APIRouter(prefix="/collections", tags=["collections"])

Expand All @@ -40,8 +41,17 @@ def now(cls):


class DocumentOptions(BaseModel):
documents: List[UUID]
batch_size: int = 1
documents: List[UUID] = Field(
description="List of document IDs",
)
batch_size: int = Field(
default=1,
description=(
"Number of documents to send to OpenAI in a single "
"transaction. See the `file_ids` parameter in the "
"vector store [create batch](https://platform.openai.com/docs/api-reference/vector-stores-file-batches/createBatch)."
),
)

def model_post_init(self, __context: Any):
self.documents = list(set(self.documents))
Expand All @@ -61,13 +71,33 @@ class AssistantOptions(BaseModel):
# Fields to be passed along to OpenAI. They must be a subset of
# parameters accepted by the OpenAI.clien.beta.assistants.create
# API.
model: str
instructions: str
temperature: float = 1e-6
model: str = Field(
description=(
"OpenAI model to attach to this assistant. The model "
"must compatable with the assistants API; see the "
"OpenAI [model documentation](https://platform.openai.com/docs/models/compare) for more."
),
)
instructions: str = Field(
description=(
"Assistant instruction. Sometimes referred to as the " '"system" prompt.'
),
)
temperature: float = Field(
default=1e-6,
description=(
"Model temperature. The default is slightly "
"greater-than zero because it is [unknown how OpenAI "
"handles zero](https://community.openai.com/t/clarifications-on-setting-temperature-0/886447/5)."
),
)


class CallbackRequest(BaseModel):
callback_url: Optional[HttpUrl] = None
callback_url: Optional[HttpUrl] = Field(
default=None,
description="URL to call to report endpoint status",
)


class CreationRequest(
Expand All @@ -82,7 +112,7 @@ def extract_super_type(self, cls: "CreationRequest"):


class DeletionRequest(CallbackRequest):
collection_id: UUID
collection_id: UUID = Field("Collection to delete")


class CallbackHandler:
Expand Down Expand Up @@ -200,7 +230,10 @@ def do_create_collection(
callback.success(collection.model_dump(mode="json"))


@router.post("/create")
@router.post(
"/create",
description=load_description("collections/create.md"),
)
def create_collection(
session: SessionDep,
current_user: CurrentUser,
Expand Down Expand Up @@ -248,7 +281,10 @@ def do_delete_collection(
callback.fail(str(err))


@router.post("/delete")
@router.post(
"/delete",
description=load_description("collections/delete.md"),
)
def delete_collection(
session: SessionDep,
current_user: CurrentUser,
Expand All @@ -270,11 +306,15 @@ def delete_collection(
return APIResponse.success_response(data=None, metadata=asdict(payload))


@router.post("/info/{collection_id}", response_model=APIResponse[Collection])
@router.post(
"/info/{collection_id}",
description=load_description("collections/info.md"),
response_model=APIResponse[Collection],
)
def collection_info(
session: SessionDep,
current_user: CurrentUser,
collection_id: UUID,
collection_id: UUID = FastPath(description="Collection to retrieve"),
):
collection_crud = CollectionCrud(session, current_user.id)
try:
Expand All @@ -289,7 +329,11 @@ def collection_info(
return APIResponse.success_response(data)


@router.post("/list", response_model=APIResponse[List[Collection]])
@router.post(
"/list",
description=load_description("collections/list.md"),
response_model=APIResponse[List[Collection]],
)
def list_collections(
session: SessionDep,
current_user: CurrentUser,
Expand All @@ -307,12 +351,13 @@ def list_collections(

@router.post(
"/docs/{collection_id}",
description=load_description("collections/docs.md"),
response_model=APIResponse[List[Document]],
)
def collection_documents(
session: SessionDep,
current_user: CurrentUser,
collection_id: UUID,
collection_id: UUID = FastPath(description="Collection to retrieve"),
skip: int = Query(0, ge=0),
limit: int = Query(100, gt=0, le=100),
):
Expand Down
26 changes: 20 additions & 6 deletions backend/app/api/routes/documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
from pathlib import Path

from fastapi import APIRouter, File, UploadFile, HTTPException, Query
from fastapi import Path as FastPath

from sqlalchemy.exc import NoResultFound, MultipleResultsFound, SQLAlchemyError

from app.crud import DocumentCrud, CollectionCrud
from app.models import Document
from app.utils import APIResponse
from app.utils import APIResponse, load_description
from app.api.deps import CurrentUser, SessionDep
from app.core.util import raise_from_unknown
from app.core.cloud import AmazonCloudStorage, CloudStorageError
Expand All @@ -17,7 +18,11 @@
router = APIRouter(prefix="/documents", tags=["documents"])


@router.get("/list", response_model=APIResponse[List[Document]])
@router.get(
"/list",
description=load_description("documents/list.md"),
response_model=APIResponse[List[Document]],
)
def list_docs(
session: SessionDep,
current_user: CurrentUser,
Expand All @@ -35,7 +40,11 @@ def list_docs(
return APIResponse.success_response(data)


@router.post("/upload", response_model=APIResponse[Document])
@router.post(
"/upload",
description=load_description("documents/upload.md"),
response_model=APIResponse[Document],
)
def upload_doc(
session: SessionDep,
current_user: CurrentUser,
Expand Down Expand Up @@ -69,12 +78,13 @@ def upload_doc(

@router.get(
"/remove/{doc_id}",
description=load_description("documents/delete.md"),
response_model=APIResponse[Document],
)
def remove_doc(
session: SessionDep,
current_user: CurrentUser,
doc_id: UUID,
doc_id: UUID = Path(description="Document to delete"),
):
a_crud = OpenAIAssistantCrud()
(d_crud, c_crud) = (
Expand All @@ -91,11 +101,15 @@ def remove_doc(
return APIResponse.success_response(data)


@router.get("/info/{doc_id}", response_model=APIResponse[Document])
@router.get(
"/info/{doc_id}",
description=load_description("documents/info.md"),
response_model=APIResponse[Document],
)
def doc_info(
session: SessionDep,
current_user: CurrentUser,
doc_id: UUID,
doc_id: UUID = FastPath(description="Document to retrieve"),
):
crud = DocumentCrud(session, current_user.id)
try:
Expand Down
15 changes: 15 additions & 0 deletions backend/app/utils.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import functools as ft
import logging
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
Expand Down Expand Up @@ -143,3 +144,17 @@ def verify_password_reset_token(token: str) -> str | None:
return str(decoded_token["sub"])
except InvalidTokenError:
return None


@ft.singledispatch
def load_description(filename: Path) -> str:
if not filename.exists():
this = Path(__file__)
filename = this.parent.joinpath("api", "docs", filename)

return filename.read_text()


@load_description.register
def _(filename: str) -> str:
return load_description(Path(filename))