-
Couldn't load subscription status.
- Fork 5
Classification : small fixes and storage related changes #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classification : small fixes and storage related changes #365
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving with comments, please try and handle errors for all aws storage calls before merging
| ) | ||
|
|
||
| top_model = fetch_top_model_by_doc_id(session, document_id, current_user.project_id) | ||
| storage = get_cloud_storage(session=session, project_id=current_user.project_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we not handle errors here? what if get_signed_url fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way error handling is already there in the get signed url method, I will add it to the get cloud storage method as well, so that errors for this are also handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added one more error handling to get cloud storage
| attach a signed prediction data file URL (if available). | ||
| """ | ||
| s3_key = getattr(model_obj, "prediction_data_s3_object", None) | ||
| prediction_data_file_url = storage.get_signed_url(s3_key) if s3_key else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the s3_key is None, what happens? should we send empty/null value is response json?
also maybe include error handling there, something like this
try:
prediction_data_file_url = storage.get_signed_url(s3_key)
except Exception as e:
logger.warning(f"Failed to generate signed URL for {s3_key}: {e}")
prediction_data_file_url = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the s3_key is None, then there’s simply no prediction file to sign, so the function will set prediction_data_file_url to None. In the API response, this will show up as null, which is fine since the ModelEvaluationPublic schema already specifies a default of None for this field.
We don’t need to add extra error handling inside attach_prediction_file_url, because any errors related to generating the signed URL are already being logged and handled in the get_signed_url method itself. This keeps the logic clean — attach_prediction_file_url just decides whether or not to attempt URL generation, while get_signed_url is responsible for managing AWS-specific failures.
* Classification: db models and migration script (#305) * db models and migration script * Classification: Fine tuning Initiation and retrieve endpoint (#315) * Fine-tuning core, initiation, and retrieval * seperate session for bg task, and formating fixes * fixing alembic revision * Classification : Model evaluation of fine tuned models (#326) * Model evaluation of fine tuned models * fixing alembic revision * alembic revision fix * Classification : train and test data to s3 (#343) * alembic file for adding and removing columns * train and test s3 url column * updating alembic revision * formatting fix * Classification : retaining prediction and fetching data from s3 for model evaluation (#359) * adding new columns to model eval table * test data and prediction data s3 url changes * single migration file * status enum columns * document seeding * Classification : small fixes and storage related changes (#365) * first commit covering all * changing model name to fine tuned model in model eval * error handling in get cloud storage and document not found error handling * fixing alembic revision * uv lock * new uv lock file * updated uv lock file * coderabbit suggestions and removing unused imports * changes in uv lock file * making csv a supported file format, changing uv lock and pyproject toml
Summary
Target issue is #364
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Object Storage field renamed for clarity: *_s3_url/ → *_s3_object across models, CRUD, migrations, preprocessing, and business logic.
Public API enrichment: added signed file URLs (derived from object keys) to public responses for fine-tuning datasets and evaluation predictions.
Populating the error message column with the actual error instead of just "failed during background job processing" in both the cases, fine tuning and model evaluation
Consistency fixes:
Metric key standardized: mcc → mcc_score.
Route parameter/response corrections (e.g., /fine_tuning/{fine_tuning_id}/refresh, response shape for /model_evaluation/evaluate_models/). changed input parameter's name in this endpoint from "job_id", to "fine_tuning_id"
Use get_cloud_storage(...) instead of AmazonCloudStorage(...), also using project id for the initialization of document crud
Preprocessor - storage.put now uses file_path= kwarg, according to the recent changes made in the storage.put method.