-
Couldn't load subscription status.
- Fork 5
Classification: Fine tuning Initiation and retrieve endpoint #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification: Fine tuning Initiation and retrieve endpoint #315
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
b1a4479 to
04c7269
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving this with the assumption that the comments will be resolved
| error_msg = handle_openai_error(e) | ||
| logger.error( | ||
| f"[Retrieve_fine_tune_status] Failed to retrieve OpenAI job | " | ||
| f"provider_job_id={mask_string(job.provider_job_id)}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we should mask the provider_job_id; maybe it is required and I don't have full context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a security measure we take with all such IDs like this such as thread id, assistant id, response id, etc. we dont want such things to get exposed in logs, while this does not pose any immediate threat but we want this mask to avoid risk of expose
* Classification: db models and migration script (#305) * db models and migration script * Classification: Fine tuning Initiation and retrieve endpoint (#315) * Fine-tuning core, initiation, and retrieval * seperate session for bg task, and formating fixes * fixing alembic revision * Classification : Model evaluation of fine tuned models (#326) * Model evaluation of fine tuned models * fixing alembic revision * alembic revision fix * Classification : train and test data to s3 (#343) * alembic file for adding and removing columns * train and test s3 url column * updating alembic revision * formatting fix * Classification : retaining prediction and fetching data from s3 for model evaluation (#359) * adding new columns to model eval table * test data and prediction data s3 url changes * single migration file * status enum columns * document seeding * Classification : small fixes and storage related changes (#365) * first commit covering all * changing model name to fine tuned model in model eval * error handling in get cloud storage and document not found error handling * fixing alembic revision * uv lock * new uv lock file * updated uv lock file * coderabbit suggestions and removing unused imports * changes in uv lock file * making csv a supported file format, changing uv lock and pyproject toml
Summary
Target issue is #301
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
This PR introduces a complete flow to create and manage OpenAI fine-tuning jobs:
API to create jobs (one per split_ratio), refresh status from OpenAI, and list jobs by document.
CRUD with idempotency checks to prevent duplicate jobs for the same (document_id, base_model, split_ratio, project).
Preprocessing that converts a CSV (via object storage) into OpenAI chat-format JSONL (train/test), with stratified splits and temp-file cleanup.
Background processing so heavy steps (preprocess, upload, job creation) don’t block the request thread.
Seeded document table to be able to use in testing, added a "fetch doc object from db" function in test utils