Feature/document module/dev #114

jerome-white · 2025-04-02T08:34:00Z

Summary

Target issue is #113

The document module had been developed against this branch. This merges those additions into main; and lays the groundwork for continuing document module development more closely tied to the main branch.

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

All code was reviewed in previous issues:

This pull request introduces a new document management module to the ai-platform project. Key changes include:

.gitignore Enhancements: Added entries to exclude macOS and Emacs temporary files, improving project maintenance by keeping the repository clean from unnecessary files.
API Enhancements:
- Reorganized API router imports in main.py for better structure.
- Introduced a new FastAPI router in routes/documents.py for document management, featuring CRUD operations, cloud storage integration, and error handling.
Cloud Storage Integration:
- Added AWS S3 functionality in storage.py, including client configuration, bucket management, and file upload capabilities.
- Configured AWS settings in config.py for S3 bucket integration, including access credentials and environment-based bucket name computation.
- Introduced a script in initial_storage.py for initializing cloud storage with error handling and logging.
Model and CRUD Updates:
- Added Document and DocumentList models in document.py using SQLModel, defining the database schema with UUID-based identification and user ownership.
- Updated user.py to include a documents relationship and a count field in the UsersPublic class.
- Improved model organization and type safety in models/__init__.py.
Utility and Script Modifications:
- Added a utility function in util.py to get the current UTC time without timezone information.
- Modified the prestart.sh script to handle multiple initialization services using an array-based approach.

These changes enhance the document management capabilities of the platform, integrate cloud storage, and improve overall project organization and initialization processes.

* Basic document model * Document route skeleton Implements document list based on current user. * Imports for document route exposure * Add relationship between users and documents

* Cleaner syntax * Interface for cloud storage functionality * Create module dedicated to cloud functionality * Take the user in the constructor * Rename the document list route Route names will conform to common Unix operations. * Implementation of file upload * Default placeholders for AWS variables * Take AWS credentials from .env * Perform bucket creation at startup * Use a single bucket for all clients #59 (comment) * Changes to support new bucket semantics * Since there is a single bucket per client: storage class no longer responsible for name generation * Since the bucket is created at platform startup: storage class no longer needs to worry about it, and client creation needs to be globally accessible. * Missing imports * Unused imports * Lift timestamp generation to common location * Timestamp generation is global resource * Updating a document is not yet supported * Flesh out document remove and stat * Must specify region when creating a bucket See: https://stackoverflow.com/a/49665620 * Add boto3 requirement * Repeating AWS environment variables in the settings * Client expected to pass the basename of the destination Storage class builds bucket path based on user information and the basename. This allows the basename to follow the clients convention. * Corrected upload file specification * Build basename that matches the UUID expectation of the model * SQLAlchemy cannot process Path types natively * Ensure document ID is passed to route body * Corrected database interaction when deleting * More graceful handling of non singular results * Move document database interactions to CRUD * Ignore Emacs backup files * Allow document list to be iterable * Corrected parameter naming * Initial document CRUD tests * Test for read_* methods * Infrastructure for making it work (utils) * Document update returns inserted document * Lift document creation function * Test document update * Whitespace * Linted * Corrected update_at test * Appropriate class variable naming * Move document creation into a class * Linted * Test document CRUD delete * All test methods cleanup after themselves * Gracefully handle negative skip's and limit's * Fixture usage is more explicit and straightforward * Better usage of fixtures * Import reordering * Lift document crud testing utils to global testing utilities * Take read error into account * Test document list route * Simplify test parameters * Lift common document endpoint testing resources to utils * Return number of rows deleted * Test document endpoint deletion * Consisten Session variable naming * Special list document type * Ability to add component to URL path * Corrections to injected types * Simplification of Route semantics * Route type now handles all URL operations required for the crawler * Crawler assumes it will be called with a single type (Route) * Linted * Delete uses update * Lift document comparison to test utilities * Tests assert * Must refresh the session before interacting with the database * Tests for document stat route * Linted * Better temporary bucket naming * Move from deprecated way of Pydantic to dict * Lift bucket creation to global module * Unused import * Return all information about an uploaded document * Updates to Python packages * Bump boto3 version * Add moto dependency * Test upload endpoint * Use object to upload documents * General document test cleanup * Fewer fixtures that are more general * Remove class fixtures; assume someone else manages database state * Consolidate manual database interaction into a single class * Fix test class name clashes * Remove unused imports * Remove extraneous code * Document crud takes respects user throughout Each method takes into account whether the client is allowed to access the documents they are trying to touch. To do this, owner is dropped from the method signatures; instead being specified at construction. * Routes respect new document CRUD interface * Integer to UUID now a standalone generator * Better variable naming * Tests take into account new document CRUD interface * Unused imports * More descriptive AWS error type * Catch generic exceptions to ensure service does not go down * Ensure boto3 respects environment The preference for client creation is to take parameters values from the environment; platform "settings" are a backup. This ensures that when mocking S3, things work as expected. * Log cloud storage errors during startup * Corrected variable naming

kody-ai · 2025-04-02T08:35:32Z

Code Review Completed! 🔥

The code review was successfully completed based on your current configurations.

Kody Guide: Usage and Configuration

Interacting with Kody

Request a Review: Ask Kody to review your PR manually by adding a comment with the @kody start-review command at the root of your PR.
Provide Feedback: Help Kody learn and improve by reacting to its comments with a 👍 for helpful suggestions or a 👎 if improvements are needed.

Current Kody Configuration

Review Options

The following review options are enabled or disabled:

Options	Enabled
Security	✅
Code Style	✅
Kody Rules	✅
Refactoring	✅
Error Handling	✅
Maintainability	✅
Potential Issues	✅
Documentation And Comments	✅
Performance And Optimization	✅
Breaking Changes	❌

Access your configuration settings here.

backend/app/api/routes/documents.py

backend/app/core/cloud/storage.py

kody-ai · 2025-04-02T08:38:41Z

backend/app/api/main.py

+from app.api.routes import (
+    api_keys,
+    documents,
+    items,
+    login,
+    organization,
+    project,
+    project_user,
+    private,
+    threads,
+    users,
+    utils,
+)


from app.api.routes import ( api_keys, documents, items, login, organization, private, project, project_user, threads, users, utils, )

Multiple instances of missing type hints or unclear type definitions reduce code maintainability and IDE support.

This issue appears in multiple locations:

backend/app/api/main.py: Lines 3-15

backend/app/core/util.py: Lines 3-3

backend/app/models/document.py: Lines 29-30
Please add type hints and clear type definitions to improve code clarity and maintainability.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kody-ai · 2025-04-02T08:38:44Z

backend/app/api/main.py

 api_router.include_router(login.router)
 api_router.include_router(users.router)
 api_router.include_router(utils.router)
 api_router.include_router(items.router)
+api_router.include_router(documents.router)
 api_router.include_router(threads.router)
 api_router.include_router(organization.router)
 api_router.include_router(project.router)
 api_router.include_router(project_user.router)
 api_router.include_router(api_keys.router)


# Auth routes api_router.include_router(login.router) api_router.include_router(users.router) # Document management routes api_router.include_router(documents.router) api_router.include_router(threads.router) # Organization routes api_router.include_router(organization.router) api_router.include_router(project.router) api_router.include_router(project_user.router) # Utility routes api_router.include_router(utils.router) api_router.include_router(items.router) api_router.include_router(api_keys.router)

Group related routers together in the inclusion order for better code organization and maintenance. For example, group authentication-related routers (login, users), document-related routers (documents, threads), and organization-related routers (organization, project, project_user) together.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kody-ai · 2025-04-02T08:38:47Z

backend/app/api/routes/documents.py

+    except Exception as err:
+        raise_from_unknown(err)
+
+@router.get("/rm/{doc_id}")


@router.delete("/rm/{doc_id}")

HTTP DELETE method should be used for the delete operation instead of GET, as per REST conventions. GET requests should be idempotent and not modify server state.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kody-ai · 2025-04-02T08:38:50Z

backend/app/core/config.py

+    @computed_field  # type: ignore[prop-decorator]
+    @property
+    def AWS_S3_BUCKET(self) -> str:
+        return f'ai-platform-documents-{self.ENVIRONMENT}'


@computed_field # type: ignore[prop-decorator] @property def AWS_S3_BUCKET(self) -> str: bucket_name = f'ai-platform-documents-{self.ENVIRONMENT}'.lower() if not bucket_name.islower() or not all(c.isalnum() or c in '-.' for c in bucket_name): raise ValueError('S3 bucket name must contain only lowercase letters, numbers, hyphens, and periods') return bucket_name

The S3 bucket name could potentially contain invalid characters from the environment variable. Add validation to ensure the bucket name follows AWS S3 naming conventions.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

kody-ai · 2025-04-02T08:38:53Z

backend/app/models/document.py

+    owner_id: UUID = Field(
+        foreign_key='user.id',
+        nullable=False,
+        ondelete='CASCADE',
+    )


owner_id: UUID = Field( foreign_key='user.id', nullable=False, ondelete='CASCADE', index=True, )

Add an index on owner_id to improve query performance when fetching documents by owner, which is likely a common operation.

_{Talk to Kody by mentioning @kody}

_{Was this suggestion helpful? React with 👍 or 👎 to help Kody learn from this interaction.}

backend/app/models/user.py

kody-ai · 2025-04-02T10:56:56Z

Code Review Completed! 🔥

The code review was successfully completed based on your current configurations.

Kody Guide: Usage and Configuration

Interacting with Kody

Request a Review: Ask Kody to review your PR manually by adding a comment with the @kody start-review command at the root of your PR.
Provide Feedback: Help Kody learn and improve by reacting to its comments with a 👍 for helpful suggestions or a 👎 if improvements are needed.

Current Kody Configuration

Review Options

The following review options are enabled or disabled:

Options	Enabled
Security	✅
Code Style	✅
Kody Rules	✅
Refactoring	✅
Error Handling	✅
Maintainability	✅
Potential Issues	✅
Documentation And Comments	✅
Performance And Optimization	✅
Breaking Changes	❌

Access your configuration settings here.

avirajsingh7 · 2025-04-03T10:12:23Z

backend/app/api/routes/documents.py

+):
+    crud = DocumentCrud(session, current_user.id)
+    try:
+        return crud.read_many(skip, limit)


We are implementing a standardized API response format across all API responses to ensure consistency throughout the application. You can refer to the implementation details in Pull Request #67.

Additionally, we can now add metadata in the response if required
Any HTTPException raised will be automatically converted to the standardized format, maintaining uniformity across all responses.

The goal of this PR is to bring my code in line with main. Given that, is it okay if I do that as a new PR once this is merged? I'm not even sure, for example, that this branch has the standard response format definitions. I've been away for too long

avirajsingh7 · 2025-04-03T10:14:45Z

backend/app/api/routes/documents.py

+    )
+
+    try:
+        return crud.update(document)


Need to add standard api response for all the endpoint

avirajsingh7 · 2025-04-03T10:18:23Z

backend/app/models/document.py

+from .user import User
+
+class Document(SQLModel, table=True):
+    id: UUID = Field(


We will be using id as int for all entities.
The Project and Organization models are already set up this way, but uuid was previously used for the user_id as part of the default template. To address this, I have created an issue to convert user_id to int for consistency.

I prefer int's, so I welcome this. However, as with the other change, is it okay if we do this after the merge? Making this change will have bigger implications for how some of the other document-related stuff work

No worries, @jerome-white. I am approving the PR.
You can create an issue to ensure this doesn't get ignored, and make sure to include the lock file, as it's essential.

backend/pyproject.toml

kody-ai · 2025-04-04T04:23:09Z

Code Review Completed! 🔥

The code review was successfully completed based on your current configurations.

Kody Guide: Usage and Configuration

Interacting with Kody

Request a Review: Ask Kody to review your PR manually by adding a comment with the @kody start-review command at the root of your PR.
Provide Feedback: Help Kody learn and improve by reacting to its comments with a 👍 for helpful suggestions or a 👎 if improvements are needed.

Current Kody Configuration

Review Options

The following review options are enabled or disabled:

Options	Enabled
Security	✅
Code Style	✅
Kody Rules	✅
Refactoring	✅
Error Handling	✅
Maintainability	✅
Potential Issues	✅
Documentation And Comments	✅
Performance And Optimization	✅
Breaking Changes	❌

Access your configuration settings here.

kody-ai · 2025-04-04T05:18:54Z

Kody Review Complete

Great news! 🎉
No issues were found that match your current review configurations.

Keep up the excellent work! 🚀

Kody Guide: Usage and Configuration

Interacting with Kody

Request a Review: Ask Kody to review your PR manually by adding a comment with the @kody start-review command at the root of your PR.
Provide Feedback: Help Kody learn and improve by reacting to its comments with a 👍 for helpful suggestions or a 👎 if improvements are needed.

Current Kody Configuration

Review Options

The following review options are enabled or disabled:

Options	Enabled
Security	✅
Code Style	✅
Kody Rules	✅
Refactoring	✅
Error Handling	✅
Maintainability	✅
Potential Issues	✅
Documentation And Comments	✅
Performance And Optimization	✅
Breaking Changes	❌

Access your configuration settings here.

backend/uv.lock

jerome-white and others added 3 commits March 12, 2025 15:55

[ Feature: document module ] Schema setup and basic route (#29)

a5a9518

* Basic document model * Document route skeleton Implements document list based on current user. * Imports for document route exposure * Add relationship between users and documents

Merge branch 'main' into feature/document-module/dev

24e599c

jerome-white requested review from AkhileshNegi, avirajsingh7, nishika26 and sourabhlodha April 2, 2025 08:34

jerome-white self-assigned this Apr 2, 2025

jerome-white added this to Kaapi-dev Apr 2, 2025

jerome-white moved this to In progress in Kaapi-dev Apr 2, 2025

jerome-white linked an issue Apr 2, 2025 that may be closed by this pull request

Merge partial document module into main #113

Closed

kody-ai bot reviewed Apr 2, 2025

View reviewed changes

backend/app/api/routes/documents.py Show resolved Hide resolved

kody-ai bot reviewed Apr 2, 2025

View reviewed changes

backend/app/core/cloud/storage.py Show resolved Hide resolved

kody-ai bot reviewed Apr 2, 2025

View reviewed changes

backend/app/models/user.py Show resolved Hide resolved

Merge branch 'main' into feature/document-module/dev

faa9830

avirajsingh7 requested changes Apr 3, 2025

View reviewed changes

sourabhlodha approved these changes Apr 3, 2025

View reviewed changes

avirajsingh7 approved these changes Apr 3, 2025

View reviewed changes

Merge branch 'main' into feature/document-module/dev

6c7f973

jerome-white added 3 commits April 4, 2025 10:44

Provide default values for AWS credentials

6c4d654

Unused import

0f40527

Proper handling of missing AWS keys

ca59ce7

Updated UV lock

b2d400e

jerome-white commented Apr 4, 2025

View reviewed changes

backend/uv.lock Show resolved Hide resolved

jerome-white merged commit f6caf60 into main Apr 4, 2025
1 check failed

jerome-white deleted the feature/document-module/dev branch April 4, 2025 05:28

github-project-automation bot moved this from In progress to Closed in Kaapi-dev Apr 4, 2025

jerome-white mentioned this pull request Apr 6, 2025

Standard API responses for document endpoints #120

Closed

Feature/document module/dev #114

Feature/document module/dev #114

Uh oh!

Conversation

jerome-white commented Apr 2, 2025 • edited by kody-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

kody-ai bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Completed! 🔥

Uh oh!

Uh oh!

Uh oh!

kody-ai bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

kody-ai bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

kody-ai bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

kody-ai bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

kody-ai bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kody-ai bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Completed! 🔥

Uh oh!

avirajsingh7 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

jerome-white Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

avirajsingh7 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

avirajsingh7 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

jerome-white Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

avirajsingh7 Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kody-ai bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Completed! 🔥

Uh oh!

kody-ai bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kody Review Complete

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerome-white commented Apr 2, 2025 •

edited by kody-ai bot

Loading

kody-ai bot commented Apr 2, 2025 •

edited

Loading

kody-ai bot commented Apr 2, 2025 •

edited

Loading

kody-ai bot commented Apr 4, 2025 •

edited

Loading

kody-ai bot commented Apr 4, 2025 •

edited

Loading