-
Notifications
You must be signed in to change notification settings - Fork 7
[ Feature: document module ] document endpoints #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ Feature: document module ] document endpoints #59
Conversation
Route names will conform to common Unix operations.
|
There are several questions I have so far (relevant to 4cbb94f): AWSHow are we managing AWS credentials across development and production:
S3 managementThe current version of this PR assumes a bucket per project/organization. Alternatives include:
Using buckets per project/org ( Project and organization informationHow are we passing the project and organization to routes? Right now I assume this information is carried in the "current user"; although in my branch the current user has no awareness of a org/project breakdown. I believe #63 will bring this functionality in, but it's not clear whether that will hide the details behind the user facade (like I'm hoping it will) |
|
@jerome-white I can share you AWS credentials you can add in .env. Let's use same bucket and different directive for all organisation |
* Since there is a single bucket per client: storage class no longer responsible for name generation * Since the bucket is created at platform startup: storage class no longer needs to worry about it, and client creation needs to be globally accessible.
Storage class builds bucket path based on user information and the basename. This allows the basename to follow the clients convention.
* Fewer fixtures that are more general * Remove class fixtures; assume someone else manages database state * Consolidate manual database interaction into a single class * Fix test class name clashes
Based on various discord discussions and meetings:
|
Each method takes into account whether the client is allowed to access the documents they are trying to touch. To do this, owner is dropped from the method signatures; instead being specified at construction.
The preference for client creation is to take parameters values from the environment; platform "settings" are a backup. This ensures that when mocking S3, things work as expected.
|
I can no longer wait for a second review. Merging this |
612462c
into
feature/document-module/dev
* [ Feature: document module ] Schema setup and basic route (#29) * Basic document model * Document route skeleton Implements document list based on current user. * Imports for document route exposure * Add relationship between users and documents * [ Feature: document module ] document endpoints (#59) * Cleaner syntax * Interface for cloud storage functionality * Create module dedicated to cloud functionality * Take the user in the constructor * Rename the document list route Route names will conform to common Unix operations. * Implementation of file upload * Default placeholders for AWS variables * Take AWS credentials from .env * Perform bucket creation at startup * Use a single bucket for all clients #59 (comment) * Changes to support new bucket semantics * Since there is a single bucket per client: storage class no longer responsible for name generation * Since the bucket is created at platform startup: storage class no longer needs to worry about it, and client creation needs to be globally accessible. * Missing imports * Unused imports * Lift timestamp generation to common location * Timestamp generation is global resource * Updating a document is not yet supported * Flesh out document remove and stat * Must specify region when creating a bucket See: https://stackoverflow.com/a/49665620 * Add boto3 requirement * Repeating AWS environment variables in the settings * Client expected to pass the basename of the destination Storage class builds bucket path based on user information and the basename. This allows the basename to follow the clients convention. * Corrected upload file specification * Build basename that matches the UUID expectation of the model * SQLAlchemy cannot process Path types natively * Ensure document ID is passed to route body * Corrected database interaction when deleting * More graceful handling of non singular results * Move document database interactions to CRUD * Ignore Emacs backup files * Allow document list to be iterable * Corrected parameter naming * Initial document CRUD tests * Test for read_* methods * Infrastructure for making it work (utils) * Document update returns inserted document * Lift document creation function * Test document update * Whitespace * Linted * Corrected update_at test * Appropriate class variable naming * Move document creation into a class * Linted * Test document CRUD delete * All test methods cleanup after themselves * Gracefully handle negative skip's and limit's * Fixture usage is more explicit and straightforward * Better usage of fixtures * Import reordering * Lift document crud testing utils to global testing utilities * Take read error into account * Test document list route * Simplify test parameters * Lift common document endpoint testing resources to utils * Return number of rows deleted * Test document endpoint deletion * Consisten Session variable naming * Special list document type * Ability to add component to URL path * Corrections to injected types * Simplification of Route semantics * Route type now handles all URL operations required for the crawler * Crawler assumes it will be called with a single type (Route) * Linted * Delete uses update * Lift document comparison to test utilities * Tests assert * Must refresh the session before interacting with the database * Tests for document stat route * Linted * Better temporary bucket naming * Move from deprecated way of Pydantic to dict * Lift bucket creation to global module * Unused import * Return all information about an uploaded document * Updates to Python packages * Bump boto3 version * Add moto dependency * Test upload endpoint * Use object to upload documents * General document test cleanup * Fewer fixtures that are more general * Remove class fixtures; assume someone else manages database state * Consolidate manual database interaction into a single class * Fix test class name clashes * Remove unused imports * Remove extraneous code * Document crud takes respects user throughout Each method takes into account whether the client is allowed to access the documents they are trying to touch. To do this, owner is dropped from the method signatures; instead being specified at construction. * Routes respect new document CRUD interface * Integer to UUID now a standalone generator * Better variable naming * Tests take into account new document CRUD interface * Unused imports * More descriptive AWS error type * Catch generic exceptions to ensure service does not go down * Ensure boto3 respects environment The preference for client creation is to take parameters values from the environment; platform "settings" are a backup. This ensures that when mocking S3, things work as expected. * Log cloud storage errors during startup * Corrected variable naming * Provide default values for AWS credentials * Unused import * Proper handling of missing AWS keys * Updated UV lock
Summary
Target issue is #32
Endpoints for manipulating documents meant for RAG interaction.
Checklist
Before submitting a pull request, please ensure that you mark these task.
poetry run uvicorn src.app.main:app --reloadin the repository root and test.Notes
This is designed to be a "working" PR: it will remain in draft state as it is fleshed out to provide transparency and get ongoing discussion around its practices.
Update on moving from draft
This has become a large PR. It was started before a lot of our development processes were stablized. Notably:
To address the second point, I ask that the reviewers look at this code in isolation, not necessarily how the changes fit with what is currently on main. "Approving" and merging this branch will start the following process: