-
Notifications
You must be signed in to change notification settings - Fork 5
Refactor Document Module and Cloud Storage to use project-based organization and add signed URL support #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughModels, storage, CRUD, API routes, and tests were migrated from user-scoped to project-scoped ownership: documents now reference project_id and project.storage_path; public response models (DocumentPublic, Message) added; CloudStorage gained project-aware APIs (signed URL, size, delete); soft-delete (is_deleted) and an Alembic migration were added. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as Documents API
participant Auth as CurrentUserOrgProject
participant Proj as ProjectService
participant Store as AmazonCloudStorage(project_id)
participant CRUD as DocumentCrud(project_id)
Client->>API: POST /docs/upload (file, X-API-KEY)
API->>Auth: resolve user + project_id
API->>Proj: get_project_by_id(project_id)
Proj-->>API: Project(storage_path)
API->>Store: put(file, path=storage_path/...)
Store-->>API: object_store_url
API->>CRUD: create(document with project_id, object_store_url)
CRUD-->>API: Document
API-->>Client: APIResponse(DocumentPublic)
sequenceDiagram
autonumber
actor Client
participant API as Documents API
participant Auth as CurrentUserOrgProject
participant CRUD as DocumentCrud(project_id)
participant Store as AmazonCloudStorage(project_id)
Client->>API: GET /docs/{id}?include_url=true (X-API-KEY)
API->>Auth: resolve user + project_id
API->>CRUD: read_one(id) [filters is_deleted=False]
CRUD-->>API: Document
alt include_url == true
API->>Store: get_signed_url(object_store_url, expires_in)
Store-->>API: signed_url
end
API-->>Client: APIResponse(DocumentPublic with optional signed_url)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/document.py (1)
46-56: Fix log context keys: use project_id (not owner_id) in error logsThe structured log payload still uses 'owner_id'. Replace with 'project_id'.
Apply this diff:
- logger.error( - f"[DocumentCrud.read_many] Invalid skip value | {{'owner_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", + logger.error( + f"[DocumentCrud.read_many] Invalid skip value | {{'project_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", @@ - logger.error( - f"[DocumentCrud.read_many] Invalid limit value | {{'owner_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}", + logger.error( + f"[DocumentCrud.read_many] Invalid limit value | {{'project_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}",Also applies to: 58-68
🧹 Nitpick comments (21)
backend/app/models/project.py (1)
1-1: Nit: remove unused import UUIDOnly uuid4 is used.
Apply this diff:
-from uuid import UUID, uuid4 +from uuid import uuid4backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py (1)
8-11: Remove unused importsqlmodel.sql.sqltypes is not used.
Apply this diff:
-from alembic import op -import sqlalchemy as sa -import sqlmodel.sql.sqltypes +from alembic import op +import sqlalchemy as sabackend/app/crud/document.py (3)
42-45: Optional: add deterministic ordering to read_manyConsider ordering results (e.g., newest first) for stable pagination.
Apply this diff:
- statement = select(Document).where( + statement = select(Document).where( and_(Document.project_id == self.project_id, Document.is_deleted.is_(False)) - ) + ).order_by(Document.inserted_at.desc())Also applies to: 70-71
83-91: Tighten error message grammar and detail in read_eachMinor but user-facing in logs; fix “atleast” and pluralization, and include missing IDs for easier debugging.
Apply this diff:
- raise ValueError( - f"Requested atleast {requested_count} document retrieved {retrieved_count}" - ) + missing = set(doc_ids) - {d.id for d in results} + raise ValueError( + f"Requested at least {requested_count} documents, retrieved {retrieved_count}. Missing: {sorted(missing)}" + )
98-106: Clarify ownership error messageThe placeholders are reversed and “attempter” is awkward. Make the message unambiguous.
Apply this diff:
- elif document.project_id != self.project_id: - error = "Invalid document ownership: project={} attempter={}".format( - self.project_id, - document.project_id, - ) + elif document.project_id != self.project_id: + error = "Invalid document ownership: document.project_id={} request.project_id={}".format( + document.project_id, + self.project_id, + )backend/app/models/document.py (3)
10-16: Index project_id for query performanceMost queries filter by project_id and is_deleted. Adding an index on project_id (and optionally a composite (project_id, is_deleted)) will speed reads.
Apply this diff (minimal):
- project_id: int = Field( + project_id: int = Field( description="The ID of the project to which the document belongs", foreign_key="project.id", - nullable=False, + nullable=False, + index=True, ondelete="CASCADE", )If you prefer a composite index, we can add it via Alembic instead.
37-47: Enable from-attributes mapping for DocumentPublic (SQLModel/Pydantic v2)When returning ORM instances as DocumentPublic, set model_config so model_validate(..., from_attributes=True) works cleanly.
Apply this diff:
class DocumentPublic(DocumentBase): id: UUID = Field(description="The unique identifier of the document") signed_url: str | None = Field( default=None, description="A signed URL for accessing the document" ) inserted_at: datetime = Field( description="The timestamp when the document was inserted" ) updated_at: datetime = Field( description="The timestamp when the document was last updated" ) + # Pydantic v2 / SQLModel v0.0.22+ + model_config = {"from_attributes": True}And add this import once at the top if you prefer a typed config:
# optional typed config from pydantic import ConfigDict # ... model_config = ConfigDict(from_attributes=True)
37-47: Confirm public exposure of project_id is intentionalDocumentPublic inherits project_id from DocumentBase. If APIs are always project-scoped, returning project_id may be redundant and could leak internal IDs to clients. Keep if needed; otherwise consider a slimmer public DTO without project_id.
backend/app/models/__init__.py (1)
5-5: Silence Ruff F401 for package re-exportsIn init.py, importing for re-export triggers F401. Easiest is to add a noqa on the line.
Apply this diff:
-from .document import Document, DocumentPublic +from .document import Document, DocumentPublic # noqa: F401Alternative: define all and include these names explicitly.
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py (1)
10-10: Remove unused import.
sqlmodel.sql.sqltypesis not used.-import sqlmodel.sql.sqltypesbackend/app/api/routes/collections.py (2)
27-27: Drop unused importDocument.
Documentisn’t referenced in this module. Keeps lints clean.-from app.models import Collection, Document, DocumentPublic +from app.models import Collection, DocumentPublic
426-439: Response model now uses DocumentPublic — filter out soft-deleted docs and prefer built-inlisttyping.
- The route returns
DocumentPublic, butDocumentCollectionCrud.readdoesn’t currently excludeDocument.is_deleteddocuments or enforce project scoping. You can prevent leaking soft-deleted or cross-project docs by filtering in the CRUD.- Prefer
listoverListper modern typing style.- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],Outside this file (CRUD suggestion for
backend/app/crud/document_collection.py):# inside DocumentCollectionCrud.read(...) statement = ( select(Document) .join(DocumentCollection, DocumentCollection.document_id == Document.id) .where( and_( DocumentCollection.collection_id == collection.id, Document.is_deleted.is_(False), Document.project_id == collection.project_id, # ensure project-scoping ) ) )backend/app/api/routes/documents.py (5)
1-6: Adopt built-inlisttyping and removetyping.Listimport.Modern Python favors
list[...]overList[...]. Also drop the now-unused import.-from typing import List +from typing import List # remove @@ - response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]], @@ - response_model=APIResponse[DocumentPublic], + response_model=APIResponse[DocumentPublic], @@ - response_model=APIResponse[DocumentPublic], + response_model=APIResponse[DocumentPublic],And actually remove the import:
-from typing import ListAlso applies to: 20-24, 123-126
4-4: Build S3 keys with POSIX semantics to avoid backslashes on Windows.
pathlib.Pathcan emit backslashes on Windows, which becomes part of the S3 key. UsePurePosixPathto guarantee forward slashes in object keys.-from pathlib import Path +from pathlib import PurePosixPath @@ - key = Path(str(project.storage_path), str(document_id)) + key = PurePosixPath(str(project.storage_path)) / str(document_id)Also applies to: 52-54
46-54: Validate project existence before upload — good. Consider early content-type/size checks.404 on missing project is correct. As a follow-up, consider validating
src.content_typeand limiting size before uploading to avoid wasted bandwidth.
84-85: Remove unused variable.The result from
c_crud.delete(document, a_crud)is not used.- data = c_crud.delete(document, a_crud) + c_crud.delete(document, a_crud)
106-116: “Permanent” delete still soft-deletes DB row. Consider adding a hard-delete path.You delete the S3 object, but
d_crud.delete(doc_id)only marks the document as deleted. If “permanent” should also remove the DB row, add apurgetoDocumentCrud(hard delete) and call it here.Outside this file (in
backend/app/crud/document.py):def purge(self, doc_id: UUID): document = self.read_one(doc_id) self.session.delete(document) self.session.commit() logger.info(f"[DocumentCrud.purge] Document hard-deleted | {{'doc_id': '{doc_id}', 'project_id': {self.project_id}}}")Then update this route to:
c_crud.delete(document, a_crud) storage.delete(document.object_store_url) d_crud.purge(doc_id)Also applies to: 117-119
backend/app/core/cloud/storage.py (4)
13-13: Remove unused import.
UserProjectOrgisn’t referenced.-from app.models import UserProjectOrg
110-116: Align abstractputsignature with subclass.Base class uses
basename: str, subclass useskey: Path. Aligning keeps contracts consistent and helps tooling.- def put(self, source: UploadFile, basename: str): + def put(self, source: UploadFile, key: Path): raise NotImplementedError()
126-141: Normalize S3 key formatting withas_posix().Ensure forward slashes regardless of OS.
- def put(self, source: UploadFile, key: Path) -> SimpleStorageName: - destination = SimpleStorageName(str(key)) + def put(self, source: UploadFile, key: Path) -> SimpleStorageName: + destination = SimpleStorageName(key.as_posix())
190-216: Presigned URL implementation — solid. Consider validating scheme.Implementation looks correct. As a minor hardening, validate that
urlhass3scheme before presigning and raise a clear error otherwise.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py(1 hunks)backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py(1 hunks)backend/app/api/routes/collections.py(3 hunks)backend/app/api/routes/documents.py(6 hunks)backend/app/core/cloud/storage.py(7 hunks)backend/app/crud/document.py(6 hunks)backend/app/models/__init__.py(1 hunks)backend/app/models/document.py(1 hunks)backend/app/models/project.py(2 hunks)backend/app/models/user.py(0 hunks)
💤 Files with no reviewable changes (1)
- backend/app/models/user.py
🧰 Additional context used
🧬 Code graph analysis (9)
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py (2)
backend/app/alembic/versions/99f4fc325617_add_organization_project_setup.py (1)
upgrade(20-72)backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
upgrade(21-73)
backend/app/models/project.py (3)
backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
upgrade(21-73)backend/app/alembic/versions/66abc97f3782_user_id_from_uuid_to_int.py (1)
upgrade(20-152)backend/app/alembic/versions/0f205e3779ee_add_api_key_table.py (1)
upgrade(20-39)
backend/app/models/__init__.py (1)
backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py (4)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/alembic/versions/8e7dc5eab0b0_add_fk_constraint_to_user_id_columns.py (1)
upgrade(20-42)backend/app/alembic/versions/60b6c511a485_add_project_id_in_api_key_table.py (1)
upgrade(20-47)backend/app/alembic/versions/99f4fc325617_add_organization_project_setup.py (1)
upgrade(20-72)
backend/app/api/routes/collections.py (5)
backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/core/cloud/storage.py (1)
AmazonCloudStorage(121-232)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/utils.py (1)
APIResponse(27-48)backend/app/crud/document_collection.py (2)
DocumentCollectionCrud(8-48)create(12-23)
backend/app/crud/document.py (5)
backend/app/crud/collection.py (3)
read_one(84-93)delete(107-115)CollectionCrud(17-146)backend/app/models/document.py (1)
Document(19-34)backend/app/tests/crud/documents/test_crud_document_delete.py (2)
document(13-21)TestDatabaseDelete(24-44)backend/app/tests/crud/documents/test_crud_document_update.py (2)
TestDatabaseUpdate(15-67)test_update_respects_owner(57-67)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)
backend/app/models/document.py (6)
backend/app/core/util.py (1)
now(13-14)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/models/user.py (1)
User(48-60)backend/app/tests/utils/document.py (1)
DocumentMaker(30-48)backend/app/tests/crud/documents/test_crud_document_update.py (1)
TestDatabaseUpdate(15-67)
backend/app/api/routes/documents.py (7)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/collection.py (3)
CollectionCrud(17-146)delete(107-115)read_one(84-93)backend/app/crud/project.py (1)
get_project_by_id(26-28)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/models/message.py (1)
Message(5-6)backend/app/utils.py (2)
APIResponse(27-48)success_response(34-37)backend/app/core/cloud/storage.py (5)
AmazonCloudStorage(121-232)put(114-115)put(126-150)delete(217-232)get_signed_url(190-215)
backend/app/core/cloud/storage.py (1)
backend/app/models/user.py (1)
UserProjectOrg(65-66)
🪛 Ruff (0.12.2)
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py
10-10: sqlmodel.sql.sqltypes imported but unused
Remove unused import: sqlmodel.sql.sqltypes
(F401)
backend/app/models/__init__.py
5-5: .document.Document imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
5-5: .document.DocumentPublic imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
10-10: sqlmodel.sql.sqltypes imported but unused
Remove unused import: sqlmodel.sql.sqltypes
(F401)
backend/app/api/routes/collections.py
27-27: app.models.Document imported but unused
Remove unused import: app.models.Document
(F401)
426-426: Use list instead of List for type annotation
Replace with list
(UP006)
backend/app/api/routes/documents.py
23-23: Use list instead of List for type annotation
Replace with list
(UP006)
84-84: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
backend/app/core/cloud/storage.py
13-13: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/crud/document.py (2)
19-25: LGTM: read_one adds correct project-scoped + soft-delete filtersThe query adds both project_id equality and is_deleted=False. This matches the new ownership model and soft delete semantics.
125-136: LGTM: delete implements soft-deletion via is_deleted/deleted_atSoft delete flows through update() and logs context. Behavior aligns with list/read filters.
backend/app/api/routes/collections.py (1)
228-230: Project-scoped storage and CRUD initialization — good change.Switching to
AmazonCloudStorage(current_user.project_id)andDocumentCrud(session, current_user.project_id)aligns the collections flow with the new per-project storage model.backend/app/api/routes/documents.py (2)
31-33: Project-scoped listing — good alignment.
DocumentCrud(session, current_user.project_id)ensures list operations respect project boundaries andis_deletedfiltering via the CRUD. Looks good.
131-144: Signed URL support indoc_info— looks correct.The
DocumentPublic.model_validate(..., from_attributes=True)conversion and conditional presign are clean. Good defaults.
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1)
18-24: Remove importedcrawlerfixture to avoid shadowing parameterImporting
crawlerand also using a parameter namedcrawlercan trip linters (F811) and is unnecessary.from app.tests.utils.document import ( DocumentStore, DocumentMaker, Route, WebCrawler, - crawler, crawler, )Replace with:
from app.tests.utils.document import ( DocumentStore, DocumentMaker, Route, WebCrawler, - crawler, +)backend/app/tests/api/routes/documents/test_route_document_info.py (1)
4-12: Fix linter error: importedcrawlershadows the fixture parameterRuff flags F811 (redefinition/shadowing). Drop the import; rely on the fixture parameter.
from app.tests.utils.document import ( DocumentComparator, DocumentMaker, DocumentStore, Route, WebCrawler, - crawler, httpx_to_standard, )backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
80-81: Don’t persist absolute local file paths as fname.
Storing full paths can leak local env/PII; store basename only. Update test expectation accordingly (and ensure route normalizes to basename).Apply this diff:
- assert result.fname == str(scratch) + assert result.fname == Path(scratch).nameIf the route currently stores the full path, normalize it server-side using Path(file.filename).name before persisting. I can provide a patch if desired.
backend/app/tests/utils/document.py (1)
136-150: Fix incorrect@singledispatchmethodusage (don’t combine with@staticmethod).Current implementation will raise due to wrong signature. Accept
selfand register typed variants accordingly.- @ft.singledispatchmethod - @staticmethod - def to_string(value): - return value + @ft.singledispatchmethod + def to_string(self, value): + return value @@ - @to_string.register - @staticmethod - def _(value: UUID): - return str(value) + @to_string.register + def _(self, value: UUID): + return str(value) @@ - @to_string.register - @staticmethod - def _(value: datetime): - return value.isoformat() + @to_string.register + def _(self, value: datetime): + return value.isoformat()
🧹 Nitpick comments (20)
backend/app/tests/crud/collections/test_crud_collection_read_one.py (1)
8-8: Remove unused import: settingsKeeps tests lint-clean.
-from app.core.config import settingsbackend/app/tests/crud/collections/test_crud_collection_delete.py (2)
61-62: Prefer.is_(False)for SQLAlchemy boolean comparisonsAvoid E712 and generate correct SQL.
-stmt = select(APIKey).where( - APIKey.project_id == project.id, APIKey.is_deleted == False -) +stmt = select(APIKey).where( + APIKey.project_id == project.id, APIKey.is_deleted.is_(False) +)
6-6: Remove unused import: settingsKeep lint green.
-from app.core.config import settingsbackend/app/tests/crud/collections/test_crud_collection_read_all.py (2)
8-8: Add missing import for DocumentMakerRequired if adopting the proposed fix.
-from app.tests.utils.document import DocumentStore +from app.tests.utils.document import DocumentStore, DocumentMaker
29-33: Fixture signature looks wrong and unusedThis fixture takes self (invalid for pytest fixtures) and appears unused. Remove or fix to avoid confusion.
-@pytest.fixture(scope="class") -def refresh(self, db: Session): - db.query(Collection).delete() - db.commit() +@pytest.fixture(scope="class") +def refresh(db: Session): + db.query(Collection).delete() + db.commit()If unused, delete the fixture entirely.
backend/app/tests/crud/documents/test_crud_document_read_one.py (2)
40-43: Avoid dependency on a hard-coded project name ("Dalgo").
This can flake if such a project isn’t seeded. Use a guaranteed different project_id without relying on DB seed data.Apply this diff:
- other_project = get_project(db, name="Dalgo") # Different project - - crud = DocumentCrud(db, other_project.id) + other_project_id = store.project.id + 1 # Different project + crud = DocumentCrud(db, other_project_id)
38-48: Add coverage for soft-deleted read behavior.
Reading a soft-deleted document should 404; add a test.Example:
def test_cannot_read_soft_deleted_document(self, db: Session, store: DocumentStore): doc = store.put() crud = DocumentCrud(db, store.project.id) crud.delete(doc.id) with pytest.raises(HTTPException) as exc_info: crud.read_one(doc.id) assert exc_info.value.status_code == 404backend/app/tests/api/routes/documents/test_route_document_remove.py (2)
57-66: Assert deleted_at as well to fully validate soft delete.
Currently only checks is_deleted.Apply this diff:
result = db.exec(statement).one() - assert result.is_deleted is True + assert result.is_deleted is True + assert result.deleted_at is not None
83-87: Strengthen assertion on unknown-document removal.
Also verify 404 and error detail.Apply this diff:
- response = crawler.delete(route.append(next(maker))) - - assert response.is_error + response = crawler.delete(route.append(next(maker))) + assert response.status_code == 404 + body = response.json() + assert body.get("detail") == "Document not found"backend/app/tests/api/routes/documents/test_route_document_upload.py (2)
93-104: Add S3-specific assertions for scheme and bucket.
Makes the storage contract explicit.Apply this diff:
- url = urlparse(result.object_store_url) + url = urlparse(result.object_store_url) + assert url.scheme == "s3" + assert url.netloc == settings.AWS_S3_BUCKET key = Path(url.path) key = key.relative_to(key.root)Optionally, also assert the key prefix matches the project’s storage_path.
62-64: Missing tests for signed URL support.
Add an API test that requests a signed GET URL for an uploaded document and validates: (a) 200 response, (b) URL host is the configured S3 endpoint, (c) query contains X-Amz-Signature and X-Amz-Expires, (d) access is denied after expiry in a time-mocked context.I can draft this test against your current route shape if you share the endpoint signature.
backend/app/tests/crud/documents/test_crud_document_update.py (3)
52-58: Rename tests to reflect “project” terminology.
The implementation moved away from “owner”; align test names.Apply this diff:
-def test_update_sets_default_owner( +def test_update_sets_default_project( @@ -def test_update_respects_owner( +def test_update_respects_project(
65-70: Valid negative ownership check.
For robustness, consider creating a real second project instead of +1 id if/when fixtures are available.
19-20: Leftover “owner_id” in DocumentCrud logs.
Log fields still reference owner_id; rename to project_id for consistency across the refactor.Proposed patch (in app/crud/document.py):
- f"[DocumentCrud.read_many] Invalid skip value | {{'owner_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", + f"[DocumentCrud.read_many] Invalid skip value | {{'project_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", @@ - f"[DocumentCrud.read_many] Invalid limit value | {{'owner_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}", + f"[DocumentCrud.read_many] Invalid limit value | {{'project_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}",backend/app/tests/crud/documents/test_crud_document_delete.py (1)
37-47: Cross-project delete correctly yields 404.
Good negative test for access scoping.Consider adding a test that a second delete on the same id returns 404 to confirm idempotent soft-delete behavior.
backend/app/tests/crud/documents/test_crud_document_read_many.py (2)
12-16: Fail fast if project lookup returns None in the fixture.Add an assertion so the fixture errors early with a clear message when no active project is found (avoids later AttributeError on
store.project.id).project = get_project(db) ds = DocumentStore(db, project.id) ds.fill(TestDatabaseReadMany._ndocs) + assert ds.project is not None, "DocumentStore.project not initialized"
27-27: DRY: provide acrudfixture instead of re-creatingDocumentCrudin each test.Reduces duplication and keeps test setup consistent.
@pytest.fixture def store(db: Session): project = get_project(db) ds = DocumentStore(db, project.id) ds.fill(TestDatabaseReadMany._ndocs) return ds + +@pytest.fixture +def crud(db: Session, store: DocumentStore): + return DocumentCrud(db, store.project.id)Then update tests to accept
crudand remove local instantiations:- crud = DocumentCrud(db, store.project.id) docs = crud.read_many()(Apply similarly in other tests.)
Also applies to: 44-44, 55-55, 64-64, 73-73, 82-82, 93-93, 101-101, 110-110
backend/app/tests/utils/document.py (3)
27-31: Type the lazy-loaded project as optional.Clarifies intent and avoids static analysis warnings.
- self.project: Project = None + self.project: Project | None = None
43-43: Align file extension betweenkeyandfnamefor consistency.Using different extensions is confusing in tests and storage expectations.
- key = f"{self.project.storage_path}/{doc_id}.txt" + key = f"{self.project.storage_path}/{doc_id}.xyz"(Alternatively, switch both to
.txtif downstream expects that.)Also applies to: 49-52
155-156: Minor: redundantdict()wrapper.
to_public_dict()already returns a dict.- this = dict(self.to_public_dict()) + this = self.to_public_dict()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (18)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py(1 hunks)backend/app/models/project.py(2 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(3 hunks)backend/app/tests/api/routes/documents/test_route_document_info.py(3 hunks)backend/app/tests/api/routes/documents/test_route_document_list.py(2 hunks)backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py(2 hunks)backend/app/tests/api/routes/documents/test_route_document_remove.py(3 hunks)backend/app/tests/api/routes/documents/test_route_document_upload.py(5 hunks)backend/app/tests/crud/collections/test_crud_collection_create.py(1 hunks)backend/app/tests/crud/collections/test_crud_collection_delete.py(2 hunks)backend/app/tests/crud/collections/test_crud_collection_read_all.py(1 hunks)backend/app/tests/crud/collections/test_crud_collection_read_one.py(1 hunks)backend/app/tests/crud/documents/test_crud_document_delete.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_read_many.py(9 hunks)backend/app/tests/crud/documents/test_crud_document_read_one.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_update.py(3 hunks)backend/app/tests/utils/collection.py(1 hunks)backend/app/tests/utils/document.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/models/project.py
🧰 Additional context used
🧬 Code graph analysis (16)
backend/app/tests/utils/collection.py (4)
backend/app/models/collection.py (1)
Collection(21-54)backend/app/tests/utils/utils.py (1)
get_user_id_by_email(52-54)backend/app/crud/collection.py (3)
__init__(18-20)read_one(84-93)CollectionCrud(17-146)backend/app/api/routes/collections.py (1)
collection_info(399-406)
backend/app/tests/crud/collections/test_crud_collection_read_all.py (3)
backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
store(11-16)backend/app/tests/crud/documents/test_crud_document_read_one.py (1)
store(13-15)backend/app/tests/utils/document.py (2)
DocumentStore(55-82)fill(81-82)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py (4)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/alembic/versions/60b6c511a485_add_project_id_in_api_key_table.py (1)
upgrade(20-47)backend/app/alembic/versions/0f205e3779ee_add_api_key_table.py (1)
upgrade(20-39)backend/app/alembic/versions/99f4fc325617_add_organization_project_setup.py (1)
upgrade(20-72)
backend/app/tests/api/routes/documents/test_route_document_list.py (4)
backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
store(11-16)backend/app/tests/crud/documents/test_crud_document_read_one.py (1)
store(13-15)backend/app/tests/utils/document.py (2)
DocumentStore(55-82)crawler(171-172)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (2)
backend/app/tests/utils/document.py (3)
DocumentStore(55-82)crawler(171-172)DocumentMaker(26-52)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)
backend/app/tests/crud/collections/test_crud_collection_create.py (4)
backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
store(11-16)backend/app/tests/crud/documents/test_crud_document_read_one.py (1)
store(13-15)backend/app/tests/utils/document.py (2)
DocumentStore(55-82)fill(81-82)backend/app/tests/crud/documents/test_crud_document_update.py (1)
documents(11-14)
backend/app/tests/crud/collections/test_crud_collection_delete.py (5)
backend/app/crud/collection.py (3)
CollectionCrud(17-146)_(118-125)_(128-146)backend/app/models/api_key.py (1)
APIKey(28-38)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (5)
project(67-68)DocumentStore(55-82)fill(81-82)_(143-144)_(148-149)backend/app/tests/utils/collection.py (1)
get_collection(24-55)
backend/app/tests/api/routes/documents/test_route_document_upload.py (3)
backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/tests/conftest.py (3)
user_api_key(89-91)client(52-55)db(24-41)backend/app/models/document.py (1)
Document(19-34)
backend/app/tests/crud/collections/test_crud_collection_read_one.py (2)
backend/app/tests/utils/document.py (2)
DocumentStore(55-82)fill(81-82)backend/app/tests/utils/collection.py (1)
get_collection(24-55)
backend/app/tests/utils/document.py (6)
backend/app/crud/project.py (1)
get_project_by_id(37-39)backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/tests/utils/utils.py (1)
SequentialUuidGenerator(140-153)backend/app/core/config.py (1)
AWS_S3_BUCKET(76-77)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/tests/api/routes/test_api_key.py (1)
test_create_api_key(14-29)backend/app/tests/api/routes/collections/test_collection_info.py (1)
test_collection_info_successful(57-77)
backend/app/tests/crud/documents/test_crud_document_read_one.py (3)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (2)
DocumentCrud(14-135)read_one(19-35)
backend/app/tests/api/routes/documents/test_route_document_remove.py (4)
backend/app/tests/utils/document.py (5)
DocumentStore(55-82)delete(126-130)append(107-112)put(70-75)DocumentMaker(26-52)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)backend/app/tests/crud/documents/test_crud_document_delete.py (1)
document(14-23)backend/app/models/document.py (1)
Document(19-34)
backend/app/tests/crud/documents/test_crud_document_update.py (3)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)DocumentMaker(26-52)backend/app/crud/document.py (2)
DocumentCrud(14-135)update(98-123)
backend/app/tests/crud/documents/test_crud_document_read_many.py (3)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (2)
project(67-68)DocumentStore(55-82)backend/app/crud/document.py (2)
DocumentCrud(14-135)read_many(37-71)
backend/app/tests/crud/documents/test_crud_document_delete.py (4)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/models/document.py (1)
Document(19-34)
🪛 Ruff (0.12.2)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
44-44: Trailing whitespace
Remove trailing whitespace
(W291)
45-45: Trailing whitespace
Remove trailing whitespace
(W291)
backend/app/tests/crud/collections/test_crud_collection_delete.py
6-6: app.core.config.settings imported but unused
Remove unused import: app.core.config.settings
(F401)
61-61: Avoid equality comparisons to False; use not APIKey.is_deleted: for false checks
Replace with not APIKey.is_deleted
(E712)
backend/app/tests/crud/collections/test_crud_collection_read_one.py
8-8: app.core.config.settings imported but unused
Remove unused import: app.core.config.settings
(F401)
10-10: app.tests.utils.collection.uuid_increment imported but unused
Remove unused import: app.tests.utils.collection.uuid_increment
(F401)
backend/app/tests/api/routes/documents/test_route_document_info.py
47-47: Redefinition of unused crawler from line 10
(F811)
🪛 GitHub Actions: AI Platform CI
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
[error] 1-1: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The file was updated.
[error] 1-1: Black formatting changed the file. Hook: black. Please re-run 'pre-commit run --all-files'.
🔇 Additional comments (32)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py (1)
34-52: Backfill document.project_id with earliest active API key and guard against NULLs
Use a CTE to deterministically pick each user’s earliest non-deleted API key and update only NULL project_id values, then raise an error if any NULLs remain. Sincealembic/psqlweren’t available here, manually run the migration and confirmSELECT COUNT(*) FROM document WHERE project_id IS NULL;returns 0.backend/app/tests/api/routes/documents/test_route_document_list.py (1)
51-51: LGTM: DocumentStore now scoped by project_idPer-project instantiation aligns with storage/CRUD changes.
Also applies to: 81-81
backend/app/tests/crud/collections/test_crud_collection_read_one.py (1)
18-19: LGTM: Create documents under the collection’s projectEnsures project-consistent fixtures.
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (2)
62-62: LGTM: Store scoped to projectCreating DocumentStore with project_id derived from the crawler aligns with project-scoped storage.
96-97: LGTM: Using DocumentMaker with project_id for non-existent doc caseCorrectly generates an in-memory doc tied to the right project.
backend/app/tests/crud/collections/test_crud_collection_create.py (1)
16-19: LGTM: Documents are created under the collection’s projectInitializing DocumentStore with collection.project_id matches the project-based refactor.
backend/app/tests/api/routes/collections/test_create_collections.py (3)
9-9: LGTM: Use APIKeyPublic for per-project contextSwitching the fixture type clarifies inputs and aligns with the API surface.
52-55: LGTM: Project-scoped DocumentStore in API testDocuments seeded correctly for the API call using the key’s project_id.
66-66: LGTM: Header built from API keyMatches expected auth header consumption.
backend/app/tests/api/routes/documents/test_route_document_info.py (4)
27-27: LGTM: Project-scoped store for info routeConsistent with project-based ownership.
38-38: LGTM: Same hereProject-scoped seeding looks correct.
47-47: LGTM: Correct fixture typingUsing WebCrawler matches how the tests authenticate.
50-50: LGTM: Proper maker usage for unknown docCreates an id not present in DB to exercise error path.
backend/app/tests/crud/documents/test_crud_document_read_one.py (4)
8-8: Import aligns with project-scoped tests.
Matches the new project-based access pattern.
14-15: Project-scoped DocumentStore fixture is correct.
Ensures all generated docs are tied to a specific project.
22-24: Correct: DocumentCrud now receives project_id.
Scoping reads to the project is enforced.
30-36: Good negative case using a non-persisted document.
Validates 404 on unknown id in the same project.backend/app/tests/api/routes/documents/test_route_document_remove.py (1)
38-40: Correct project scoping via API key.
Route call uses the API key’s project_id consistently.backend/app/tests/api/routes/documents/test_route_document_upload.py (2)
29-31: Header standardization looks good.
Fixed header "X-API-KEY" matches the test client helpers.
49-51: Fixture update aligns uploader with APIKeyPublic.
Clean and explicit dependency on user_api_key.backend/app/tests/crud/documents/test_crud_document_update.py (3)
19-26: Project-scoped CRUD usage is correct.
Update path respects per-project ownership.
32-36: Sequential updates ordering check is sound.
Covers monotonic insert timestamps.
42-46: Active document remains undeleted on update.
Good invariant check on is_deleted.backend/app/tests/crud/documents/test_crud_document_delete.py (2)
15-21: Project-scoped deletion path is correct.
DocumentCrud constructed with the document’s project_id and soft-deletes the record.
31-31: Proper soft-delete assertion.
Pairing this with inserted_at ≤ deleted_at is a solid check.backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
44-48: LGTM on skip/limit semantics.Assertions correctly reflect offset/limit behavior and boundary cases (0, negative, and oversized values).
Also applies to: 55-57, 64-66, 73-75, 82-86, 93-95, 101-103, 110-115
backend/app/tests/utils/document.py (6)
15-16: LGTM on imports and model surface update.Imports reflect project-scoped ownership and public models.
Also applies to: 19-19
56-60: LGTM: deterministic DB state for tests.Clearing the table on init makes test runs isolated.
118-130: LGTM: API key handling moved to model, header usage correct.Header name is consistent and avoids dict plumbing across tests.
134-135: Helpful docstring.Clarifies the comparator’s intent against
DocumentPublic.
158-167: LGTM: public-shape projection matchesDocumentPublicfields.Covers inherited fields and converts UUID/datetime to strings.
171-173: LGTM: crawler fixture signature updated toAPIKeyPublic.Header wiring is correct.
...p/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
Show resolved
Hide resolved
...p/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
Outdated
Show resolved
Hide resolved
...p/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/api/routes/documents.py (1)
83-84: Use sentence case for user-facing messageMatches prior feedback and reads better.
- Message(message="Document Deleted Successfully") + Message(message="Document deleted successfully")
🧹 Nitpick comments (12)
backend/app/core/cloud/__init__.py (1)
1-8: Export get_cloud_storage to satisfy Ruff F401Expose the public surface via all.
from .storage import ( AmazonCloudStorage, AmazonCloudStorageClient, CloudStorage, CloudStorageError, get_cloud_storage, ) + +__all__ = [ + "AmazonCloudStorage", + "AmazonCloudStorageClient", + "CloudStorage", + "CloudStorageError", + "get_cloud_storage", +]backend/app/api/routes/collections.py (2)
27-28: Remove unused importDocumentKeep imports minimal.
-from app.models import Collection, Document, DocumentPublic +from app.models import Collection, DocumentPublic
426-427: Prefer built-in generics (list) overListAlign with PEP 585 and silence UP006.
- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],backend/app/api/routes/documents.py (6)
6-6: Drop unused importHTTPExceptionSilence Ruff F401.
-from fastapi import APIRouter, File, UploadFile, Query, HTTPException +from fastapi import APIRouter, File, UploadFile, Query
9-10: Trim imports: remove unusedget_project_by_idIt’s not used after scoping to project via
CurrentUserOrgProject.-from app.crud import DocumentCrud, CollectionCrud, get_project_by_id +from app.crud import DocumentCrud, CollectionCrud
12-13: Remove unusedCurrentUserProject-scoped deps are used everywhere.
-from app.api.deps import CurrentUser, SessionDep, CurrentUserOrgProject +from app.api.deps import SessionDep, CurrentUserOrgProject
23-24: Prefer built-in generics (list) overListAlign with PEP 585 and silence UP006.
- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],
80-81: Remove unused variabledataResult isn’t used; call for side-effects only.
- data = c_crud.delete(document, a_crud) + c_crud.delete(document, a_crud)
100-105: Remove unusedprojectfetch; not needed hereAvoid dead code.
- project = get_project_by_id(session=session, project_id=current_user.project_id) a_crud = OpenAIAssistantCrud(client) d_crud = DocumentCrud(session, current_user.project_id) c_crud = CollectionCrud(session, current_user.id) storage = get_cloud_storage(session=session, project_id=current_user.project_id)backend/app/core/cloud/storage.py (3)
16-17: Remove unused importUserProjectOrgSilence Ruff F401.
-from app.models import UserProjectOrg
139-141: Ensure POSIX S3 keys; avoid OS-specific separatorsPath() may emit backslashes on Windows. Use PurePosixPath for S3 keys.
- def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: - key = Path(self.storage_path, file_path) + def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: + from pathlib import PurePosixPath + key = PurePosixPath(self.storage_path) / PurePosixPath(str(file_path))
139-141: Align base/derived method signatures forputBase accepts
basename: str; derived expectsfile_path: Path. Unify to Path + return type.Apply in base class (outside this hunk):
- def put(self, source: UploadFile, basename: str): + def put(self, source: UploadFile, file_path: Path) -> "SimpleStorageName": raise NotImplementedError()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
backend/app/api/routes/collections.py(4 hunks)backend/app/api/routes/documents.py(6 hunks)backend/app/core/cloud/__init__.py(1 hunks)backend/app/core/cloud/storage.py(7 hunks)backend/app/crud/project.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
backend/app/crud/project.py (4)
backend/app/models/project.py (2)
Project(29-60)ProjectPublic(62-66)backend/app/tests/crud/test_project.py (2)
test_get_project_by_id(53-60)test_get_non_existent_project(118-121)backend/app/api/routes/project.py (3)
update_project(76-91)read_project(59-67)delete_project(100-111)backend/app/crud/openai_conversation.py (1)
get_conversation_by_id(11-23)
backend/app/core/cloud/__init__.py (1)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(249-259)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
backend/app/api/routes/documents.py (5)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/models/message.py (1)
Message(5-6)backend/app/core/cloud/storage.py (8)
get_cloud_storage(249-259)put(118-119)put(139-164)delete(130-131)delete(231-246)client(29-41)get_signed_url(127-128)get_signed_url(204-229)
backend/app/api/routes/collections.py (3)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(249-259)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/crud/document.py (1)
DocumentCrud(14-135)
🪛 Ruff (0.12.2)
backend/app/core/cloud/__init__.py
6-6: .storage.get_cloud_storage imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/core/cloud/storage.py
16-16: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
backend/app/api/routes/documents.py
6-6: fastapi.HTTPException imported but unused
Remove unused import: fastapi.HTTPException
(F401)
12-12: app.api.deps.CurrentUser imported but unused
Remove unused import: app.api.deps.CurrentUser
(F401)
23-23: Use list instead of List for type annotation
Replace with list
(UP006)
80-80: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
100-100: Local variable project is assigned to but never used
Remove assignment to unused variable project
(F841)
backend/app/api/routes/collections.py
27-27: app.models.Document imported but unused
Remove unused import: app.models.Document
(F401)
426-426: Use list instead of List for type annotation
Replace with list
(UP006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/crud/project.py (1)
37-39: LGTM: switch to session.get() is correct and simplerThe change improves clarity and leverages the identity map semantics. No issues.
backend/app/api/routes/documents.py (3)
31-33: LGTM: per-project scoping for listing is correct
DocumentCrud(session, current_user.project_id)ensures project-level isolation.
46-52: LGTM: storage + CRUD correctly scoped per projectKey construction uses project storage path; CRUD sets
project_idon new docs.
135-139: LGTM: signed-URL gating via query flag is cleanPublic projection + on-demand presign is appropriate.
backend/app/core/cloud/storage.py (1)
248-260: LGTM: project-scoped storage factoryValidates project, propagates storage_path, returns configured instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
backend/app/api/routes/documents.py (1)
100-105: “Permanent delete” still soft-deletes in DB; implement hard-delete and remove unusedproject.S3 object is removed, but
d_crud.deleteonly setsis_deleted=True. Provide apurgethat actually deletes the DB row; also drop the unusedprojectvariable.- project = get_project_by_id(session=session, project_id=current_user.project_id) @@ - d_crud.delete(doc_id) + d_crud.purge(doc_id)Add to
backend/app/crud/document.py:def purge(self, doc_id: UUID) -> None: document = self.session.get(Document, doc_id) if document is None or document.project_id != self.project_id: raise HTTPException(status_code=404, detail="Document not found") self.session.delete(document) self.session.commit() logger.info( f"[DocumentCrud.purge] Document permanently deleted | " f"{{'doc_id': '{doc_id}', 'project_id': {self.project_id}}}" )Also applies to: 111-111
🧹 Nitpick comments (4)
backend/app/api/routes/documents.py (4)
6-6: Prune unused imports (ruff F401).Remove
HTTPExceptionandCurrentUser.-from fastapi import APIRouter, File, UploadFile, Query, HTTPException +from fastapi import APIRouter, File, UploadFile, Query-from app.api.deps import CurrentUser, SessionDep, CurrentUserOrgProject +from app.api.deps import SessionDep, CurrentUserOrgProjectAlso applies to: 12-12
23-23: Modernize typing (UP006).Use built-in
listin the response model.- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],
9-9: Remove now-unused import after droppingprojectlookup.
get_project_by_idbecomes unused; prune it.-from app.crud import DocumentCrud, CollectionCrud, get_project_by_id +from app.crud import DocumentCrud, CollectionCrud
127-139: Signed URL integration looks correct; consider optional TTL parameter.Current default (1h) is fine; optionally expose
expires_in: int = Query(3600, ge=60, le=86400)and pass it toget_signed_url.- include_url: bool = Query( - False, description="Include a signed URL to access the document" - ), + include_url: bool = Query(False, description="Include a signed URL to access the document"), + expires_in: int = Query(3600, ge=60, le=86400, description="Signed URL TTL (seconds)"), @@ - if include_url: + if include_url: storage = get_cloud_storage(session=session, project_id=current_user.project_id) - doc_schema.signed_url = storage.get_signed_url(document.object_store_url) + doc_schema.signed_url = storage.get_signed_url(document.object_store_url, expires_in=expires_in)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/api/routes/documents.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/api/routes/documents.py (6)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/models/message.py (1)
Message(5-6)backend/app/utils.py (2)
APIResponse(27-48)success_response(34-37)backend/app/core/cloud/storage.py (8)
get_cloud_storage(249-259)put(118-119)put(139-164)delete(130-131)delete(231-246)client(29-41)get_signed_url(127-128)get_signed_url(204-229)
🪛 Ruff (0.12.2)
backend/app/api/routes/documents.py
6-6: fastapi.HTTPException imported but unused
Remove unused import: fastapi.HTTPException
(F401)
12-12: app.api.deps.CurrentUser imported but unused
Remove unused import: app.api.deps.CurrentUser
(F401)
23-23: Use list instead of List for type annotation
Replace with list
(UP006)
80-80: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
100-100: Local variable project is assigned to but never used
Remove assignment to unused variable project
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/api/routes/documents.py (3)
27-33: Project-scoped listing looks good.
CurrentUserOrgProject+DocumentCrud(session, current_user.project_id)is correct.
113-115: LGTM on message phrasing.Sentence case “Document permanently deleted successfully” matches previous nitpick.
46-56: Persistingstr(object_store_url)is correct. SimpleStorageName.str returns the canonicals3://bucket/keyURL viaurlunparse(self.to_url()), so no change is needed.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/core/cloud/storage.py (1)
95-104: Fix S3 URI construction: missing leading “/” and None fields in ParseResult.Without a leading slash,
str(SimpleStorageName(...))can render likes3://bucketkey. Also, defaultingparams/query/fragmenttoNoneis unnecessary—use empty strings.Apply:
def to_url(self): - kwargs = { - "scheme": "s3", - "netloc": self.Bucket, - "path": self.Key, - } - for k in ParseResult._fields: - kwargs.setdefault(k) + kwargs = { + "scheme": "s3", + "netloc": self.Bucket, + "path": "/" + self.Key.lstrip("/"), + } + for k in ParseResult._fields: + kwargs.setdefault(k, "") return ParseResult(**kwargs)
♻️ Duplicate comments (1)
backend/app/core/cloud/storage.py (1)
16-18: Remove unused import.
UserProjectOrgis unused and flagged by Ruff F401.Apply:
-from app.models import UserProjectOrg
🧹 Nitpick comments (1)
backend/app/core/cloud/storage.py (1)
155-162: Seek to start before uploading to avoid partial uploads.Apply:
try: - self.aws.client.upload_fileobj( + source.file.seek(0) + self.aws.client.upload_fileobj( source.file,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/core/cloud/storage.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
🪛 Ruff (0.12.2)
backend/app/core/cloud/storage.py
17-17: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/app/core/cloud/storage.py (3)
114-122: Abstractputsignature mismatches implementation — align to Path and return type.ABC takes
(basename: str) -> SimpleStorageNamewhile the impl uses(file_path: Path) -> SimpleStorageName. This breaks LSP and confuses callers.@abstractmethod - def put(self, source: UploadFile, basename: str) -> SimpleStorageName: - """Upload a file to storage""" + def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: + """Upload a file to storage at the given relative file_path and return its storage name""" passRun to verify no lingering call sites use the old API:
#!/bin/bash set -euo pipefail # Find any abstract/override still using basename or -> str rg -nP 'def\s+put\s*\([^)]*\bbasename\b' -C2 rg -nP 'def\s+put\s*\([^)]*\)\s*->\s*str' -C2 # Calls passing basename= or a string literal as 2nd arg (likely old API) rg -nP '\bput\s*\([^)]*basename\s*=' -C2 || true rg -nP '\bput\s*\(\s*[^,]+,\s*["\']' -n -C2 || true
217-235: Validateexpires_in(AWS limit is 7 days) to prevent boto ParamValidationError.Reject values outside 1–604800 seconds with a clear error.
def get_signed_url(self, url: str, expires_in: int = 3600) -> str: @@ - name = SimpleStorageName.from_url(url) + if not isinstance(expires_in, int) or not (1 <= expires_in <= 604800): + raise ValueError("expires_in must be an integer between 1 and 604800 seconds (7 days).") + name = SimpleStorageName.from_url(url)Check for call sites passing excessive TTLs:
#!/bin/bash set -euo pipefail rg -nP '\bget_signed_url\s*\([^)]*expires_in\s*=\s*(\d+)' | \ python - << 'PY' import re,sys for line in sys.stdin: m = re.search(r'expires_in\s*=\s*(\d+)', line) if m and int(m.group(1))>604800: print("[exceeds 7 days] ", line.strip()) PY
262-272: Fail fast whenstorage_pathis missing (unmigrated project).Avoid constructing a client with an empty/None root.
storage_path = project.storage_path + if not storage_path: + raise ValueError(f"Project {project_id} has no storage_path configured") return AmazonCloudStorage(project_id=project_id, storage_path=storage_path)
🧹 Nitpick comments (2)
backend/app/core/cloud/storage.py (2)
17-17: Remove unused import.
UserProjectOrgisn’t used; drop it to satisfy Ruff F401 and reduce noise.-from app.models import UserProjectOrg
157-164: Default Content-Type to octet-stream when missing.
UploadFile.content_typecan be None; boto will error or store without type. Set a sane default.self.aws.client.upload_fileobj( source.file, - ExtraArgs={ - "ContentType": source.content_type, - }, + ExtraArgs={"ContentType": (source.content_type or "application/octet-stream")}, **kwargs, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/core/cloud/storage.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
🪛 Ruff (0.12.2)
backend/app/core/cloud/storage.py
17-17: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/core/cloud/storage.py (1)
167-169: Good: project-scoped telemetry with masked bucket/key.Logging now consistently includes
project_idand masks sensitive values.Also applies to: 186-189, 205-208, 250-252
backend/app/tests/crud/documents/test_crud_document_read_one.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/core/cloud/storage.py (2)
92-104: Fix S3 URL formatting and POSIX key handling.
to_urlshould ensure a leading slash;from_urlshould normalize to POSIX. Current code risks malformeds3://bucketkeyand backslashes on Windows.Apply:
class SimpleStorageName: @@ - def to_url(self): - kwargs = { - "scheme": "s3", - "netloc": self.Bucket, - "path": self.Key, - } - for k in ParseResult._fields: - kwargs.setdefault(k) - return ParseResult(**kwargs) + def to_url(self): + path = self.Key if str(self.Key).startswith("/") else f"/{self.Key}" + kwargs = {"scheme": "s3", "netloc": self.Bucket, "path": path} + for k in ParseResult._fields: + kwargs.setdefault(k, "") + return ParseResult(**kwargs) @@ - def from_url(cls, url: str): - url = urlparse(url) - path = Path(url.path) - if path.is_absolute(): - path = path.relative_to(path.root) - return cls(Bucket=url.netloc, Key=str(path)) + def from_url(cls, url: str): + url = urlparse(url) + key = PurePosixPath(url.path.lstrip("/")).as_posix() + return cls(Bucket=url.netloc, Key=key)And update the import:
from pathlib import Path, PurePosixPathAlso applies to: 105-112
44-85: Robust bucket checks: avoidint()on error code; handleus-east-1; catch ParamValidationError.Casting
Error.Codeto int is brittle (e.g., "NoSuchBucket"). Also, omitCreateBucketConfigurationinus-east-1.Apply:
- except ValueError as err: + except (ValueError, ParamValidationError) as err: @@ - except ClientError as err: - response = int(err.response["Error"]["Code"]) - if response != 404: + except ClientError as err: + code = str(err.response.get("Error", {}).get("Code", "")) + if code not in ("404", "NoSuchBucket"): @@ - self.client.create_bucket( - Bucket=settings.AWS_S3_BUCKET, - CreateBucketConfiguration={ - "LocationConstraint": settings.AWS_DEFAULT_REGION, - }, - ) + create_kwargs = {"Bucket": settings.AWS_S3_BUCKET} + if settings.AWS_DEFAULT_REGION and settings.AWS_DEFAULT_REGION != "us-east-1": + create_kwargs["CreateBucketConfiguration"] = { + "LocationConstraint": settings.AWS_DEFAULT_REGION + } + self.client.create_bucket(**create_kwargs)Add (if not already present):
from botocore.exceptions import ClientError, ParamValidationError
♻️ Duplicate comments (4)
backend/app/core/cloud/storage.py (4)
16-17: Remove unused import and keep imports lean.
UserProjectOrgis not used in this module.Apply:
-from app.models import UserProjectOrg
262-273: Fail fast if project has nostorage_path.Older/missing rows will otherwise produce confusing errors downstream.
Apply:
storage_path = project.storage_path + if not storage_path: + raise ValueError(f"Project {project_id} has no storage_path configured") return AmazonCloudStorage(project_id=project_id, storage_path=storage_path)
150-155: Harden key construction: block traversal segments and keep keys relative.Reject
./..path segments to avoid odd S3 keys and potential traversal issues. You already reject absolute paths and normalize to POSIX—good.Apply:
def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: if file_path.is_absolute(): raise ValueError("file_path must be relative to the project's storage root") + if any(part in (".", "..") for part in file_path.parts): + raise ValueError("file_path must not contain '.' or '..' segments'") key = Path(self.storage_path) / file_path destination = SimpleStorageName(key.as_posix())
217-236: Validateexpires_inand surface clear errors for presigned URLs.Constrain to AWS limits (1–604800s) and catch
ParamValidationError.Apply:
def get_signed_url(self, url: str, expires_in: int = 3600) -> str: @@ - name = SimpleStorageName.from_url(url) + if not isinstance(expires_in, int) or not (1 <= expires_in <= 604800): + raise ValueError("expires_in must be an integer between 1 and 604800 seconds (7 days).") + name = SimpleStorageName.from_url(url) try: signed_url = self.aws.client.generate_presigned_url( "get_object", Params={"Bucket": name.Bucket, "Key": name.Key}, ExpiresIn=expires_in, ) @@ - except ClientError as err: + except (ClientError, ParamValidationError) as err: logger.error( f"[AmazonCloudStorage.get_signed_url] AWS presign error | " f"{{'project_id': '{self.project_id}', 'bucket': '{mask_string(name.Bucket)}', 'key': '{mask_string(name.Key)}', 'error': '{str(err)}'}}", exc_info=True, ) raise CloudStorageError(f'AWS Error: "{err}" ({url})') from errAlso add the missing import:
from botocore.exceptions import ClientError, ParamValidationError
🧹 Nitpick comments (1)
backend/app/core/cloud/storage.py (1)
157-164: Set a safe default ContentType to avoid boto ParamValidationError.
UploadFile.content_typecan be empty; boto3 expects a string.Apply:
self.aws.client.upload_fileobj( source.file, ExtraArgs={ - "ContentType": source.content_type, + "ContentType": source.content_type or "application/octet-stream", }, **kwargs, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/core/cloud/storage.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/core/cloud/storage.py (2)
166-168: LGTM: Project-scoped logging.Including
project_idand masking bucket/key improves traceability without leaking secrets.Also applies to: 171-173
179-196: LGTM: S3 interactions and project-scoped logging.
stream,get_file_size_kb, anddeletelook correct and consistent with the new project-based design.Also applies to: 197-216, 244-259
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/app/api/routes/documents.py (2)
100-114: Permanent delete currently soft-deletes; call purge + fix CollectionCrud argEnsure DB row is removed after S3 delete and collections cleanup. Also pass project_id to CollectionCrud.
- d_crud = DocumentCrud(session, current_user.project_id) - c_crud = CollectionCrud(session, current_user.id) + d_crud = DocumentCrud(session, current_user.project_id) + c_crud = CollectionCrud(session, current_user.project_id) @@ - storage.delete(document.object_store_url) - d_crud.delete(doc_id) + storage.delete(document.object_store_url) + d_crud.purge(doc_id)
76-84: Fix scope: use project_id for collections; drop unused var; sentence case message
- Pass project_id to CollectionCrud (project-scoped).
- Remove unused variable (ruff F841).
- Standardize message casing.
- d_crud = DocumentCrud(session, current_user.project_id) - c_crud = CollectionCrud(session, current_user.id) + d_crud = DocumentCrud(session, current_user.project_id) + c_crud = CollectionCrud(session, current_user.project_id) @@ - document = d_crud.delete(doc_id) - data = c_crud.delete(document, a_crud) + document = d_crud.delete(doc_id) + c_crud.delete(document, a_crud) @@ - return APIResponse.success_response( - Message(message="Document Deleted Successfully") - ) + return APIResponse.success_response(Message(message="Document deleted successfully"))
🧹 Nitpick comments (2)
backend/app/crud/document.py (1)
73-96: Edge case + message wording in read_each
- Short-circuit empty doc_ids to avoid generating a no-op query.
- Fix wording (“at least”, pluralize “documents”) for clearer errors.
- def read_each(self, doc_ids: list[UUID]): + def read_each(self, doc_ids: list[UUID]): + if not doc_ids: + return [] @@ - raise ValueError( - f"Requested atleast {requested_count} document retrieved {retrieved_count}" - ) + raise ValueError( + f"Requested at least {requested_count} documents; retrieved {retrieved_count}" + )backend/app/api/routes/documents.py (1)
23-33: Optional: explicitly coerce to DocumentPublicFastAPI can coerce, but making it explicit keeps parity with info() and avoids surprises.
- data = crud.read_many(skip, limit) - return APIResponse.success_response(data) + docs = crud.read_many(skip, limit) + public = [DocumentPublic.model_validate(d, from_attributes=True) for d in docs] + return APIResponse.success_response(public)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
backend/app/api/routes/documents.py(6 hunks)backend/app/crud/document.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/crud/document.py (4)
backend/app/crud/collection.py (3)
read_one(84-93)delete(107-115)CollectionCrud(17-146)backend/app/models/document.py (1)
Document(19-34)backend/app/tests/crud/documents/test_crud_document_update.py (2)
TestDatabaseUpdate(15-67)test_update_respects_owner(57-67)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)
backend/app/api/routes/documents.py (6)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/collection.py (3)
CollectionCrud(17-146)delete(107-115)read_one(84-93)backend/app/models/document.py (3)
Document(19-34)DocumentPublic(37-47)Document(10-30)backend/app/models/message.py (1)
Message(5-6)backend/app/utils.py (3)
APIResponse(27-48)get_openai_client(170-200)success_response(34-37)backend/app/core/cloud/storage.py (7)
get_cloud_storage(262-272)put(120-122)put(150-177)delete(140-142)delete(244-259)get_signed_url(135-137)get_signed_url(217-242)
🪛 Ruff (0.12.2)
backend/app/api/routes/documents.py
80-80: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/crud/document.py (3)
15-25: Good: project-scoped access + soft-delete filter applied in lookupsConstructor and read_one correctly scope by project_id and exclude soft-deleted rows.
39-71: Input validation on pagination looks soundNegative skip/limit checks with structured logging are fine.
98-114: Ownership enforcement on update is correctPopulate missing project_id and reject mismatches; timestamp update is in place.
backend/app/api/routes/documents.py (2)
46-58: Upload path looks goodPer-project storage via get_cloud_storage and deferring project_id set to CRUD is fine.
130-139: LGTM: signed URL opt-in on info()The from_attributes conversion and conditional presign are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
backend/app/tests/crud/documents/test_crud_document_read_one.py (3)
15-16: Avoid flaky fixture: create a dedicated test project instead of fetching any active one.get_project(db) can raise when no active projects exist and may couple this test to seeded data. Prefer an isolated test project.
Apply:
- project = get_project(db) - return DocumentStore(db, project.id) + project = create_test_project(db) + return DocumentStore(db, project.id)Additionally, to avoid cross-test interference, scope DocumentStore.clear to the project (change in backend/app/tests/utils/document.py):
class DocumentStore: def __init__(self, db: Session, project_id: int): self.db = db self.documents = DocumentMaker(project_id=project_id, session=db) - self.clear(self.db) + self.clear(self.db, project_id) - @staticmethod - def clear(db: Session): - db.exec(delete(Document)) + @staticmethod + def clear(db: Session, project_id: int | None = None): + if project_id is None: + db.exec(delete(Document)) + else: + db.exec(delete(Document).where(Document.project_id == project_id)) db.commit()
31-31: OK; consider a simpler invalid-ID path.Use a random UUID instead of generating an unpersisted Document to reduce coupling to DocumentMaker internals.
Apply:
+from uuid import uuid4 @@ - document = next(store.documents) + random_id = uuid4() @@ - crud.read_one(document.id) + crud.read_one(random_id)
43-43: Correct: enforce project boundary.Using other_project.id validates 404 on cross-project reads. Consider adding a companion test for is_deleted=True returning 404.
backend/app/tests/crud/documents/test_crud_document_delete.py (4)
9-10: Prefer deterministic project creation to avoid env-coupled flakesRelying on get_project(db) assumes an active project already exists. Using create_test_project(db) makes tests self-contained and stable across environments.
Apply:
-from app.tests.utils.utils import get_project from app.tests.utils.test_data import create_test_project- project = get_project(db) + project = create_test_project(db)- project = get_project(db) + project = create_test_project(db)
16-21: Initialize CRUD with the known project id, not the generated document’sPassing project.id is clearer and avoids coupling the test to DocumentMaker’s behavior of setting document.project_id.
- crud = DocumentCrud(db, document.project_id) + crud = DocumentCrud(db, project.id)
38-43: Strengthen the negative test by asserting no side effects on the source docAfter the 404, re-fetch the original doc and assert it wasn’t mutated.
with pytest.raises(HTTPException) as exc_info: crud.delete(document.id) assert exc_info.value.status_code == 404 assert "Document not found" in str(exc_info.value.detail) + # Verify no side effects on the original document + fresh = db.exec(select(Document).where(Document.id == document.id)).one() + assert fresh.is_deleted is False
27-33: Add a targeted read_one test to lock in soft-delete semanticsValidate that CRUD reads exclude deleted documents by asserting 404 on read_one after deletion.
class TestDatabaseDelete: @@ def test_delete_marks_deleted(self, document: Document): assert document.is_deleted is True + + def test_read_one_404_after_delete(self, db: Session): + project = create_test_project(db) + store = DocumentStore(db, project.id) + doc = store.put() + crud = DocumentCrud(db, project.id) + crud.delete(doc.id) + with pytest.raises(HTTPException) as exc_info: + crud.read_one(doc.id) + assert exc_info.value.status_code == 404
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
backend/app/tests/crud/documents/test_crud_document_delete.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_read_one.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_update.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/tests/crud/documents/test_crud_document_update.py
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/tests/crud/documents/test_crud_document_delete.py (5)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/test_data.py (1)
create_test_project(34-50)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/models/document.py (1)
Document(19-34)
backend/app/tests/crud/documents/test_crud_document_read_one.py (4)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/test_data.py (1)
create_test_project(34-50)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (2)
DocumentCrud(14-135)read_one(19-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/tests/crud/documents/test_crud_document_read_one.py (3)
8-9: Imports align with project-scoped tests.These imports are appropriate for the new project-based organization.
23-23: Correct project scoping in CRUD instantiation.Passing store.project.id ensures reads are constrained to the project.
41-41: Good isolation for cross-project access.Creating a separate project for this check is the right approach.
backend/app/tests/crud/documents/test_crud_document_delete.py (2)
31-33: Good assertion on soft-delete flagAsserting is_deleted is True verifies the core behavior post-delete.
17-18: Be cautious: DocumentStore.clear() purges all documentsBecause DocumentStore.init calls clear(), creating a store inside a test wipes all documents in the DB. Ensure tests run in isolated transactions or a per-test database to avoid cross-test interference, especially under parallel runs.
If isolation isn’t guaranteed, consider moving the clear() call into a dedicated fixture with explicit scope or using transaction rollbacks per test.
Also applies to: 39-39
Summary
Target issuen: #85
This PR refactors the document module and cloud storage system to organize files by project rather than user, adds support for generating signed URLs for secure file access, and implements a project-based storage path structure.
Database Schema Updates
Storage Architecture Refactor
API Improvements
Migration Impact
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Refactor
Chores
Tests