New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEV-1311 - Add the list_models and get_training_metadata functions #138
Conversation
def _get_automation_model(cls, automation_model_str: str) -> AutomationModels: | ||
try: | ||
return AutomationModels(automation_model_str) | ||
except ValueError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frederik-encord I guess in hindsight this can be a bit annoying for clients with the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this would be an issue? The main idea of the "endpoint" is to allow people to find the model they want to run inference with or train. So the enum just allows filtering on the type of model you are looking for.
What would be the annoying part @denis-cord?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just thinking of the following scenario:
- client is at SDK version X
- we add a new model, which means that here in version X we would throw
- client is forced to update their SDK version
Not a big issue for now that we only do minor version updates, but definitely annoying once we introduce a breaking change and force the client to deal with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that is not great. @denis-cord is there a way to get both intellisense and avoid the need to update sdk for new models in the future?
https://github.com/encord-team/cord-backend/pull/862 this needs to be rolled out before we can release the SDK to clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment on listing models through the Project
.
Looks great otherwise.
@@ -375,6 +376,12 @@ def add_classification( | |||
""" | |||
return self._client.add_classification(name, classification_type, required, options) | |||
|
|||
def list_models(self) -> List[ModelConfiguration]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be consistent with, e.g., datasets and label rows, this should probably be a property called models
instead. What do you think @denis-cord?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good point.
What I was thinking about is that we might want to introduce a filter at some point like we do in list_label_rows
. In that case we could add the filter methods in a new non-property accessor.
So I guess the consideration is:
- do we believe that we might want to have filters at some point for models (I think it may be possible)
- in that case - would we rather have two different methods or one (I thought one would be a bit nicer).
But I am also happy to have only the property accessor for now if you say that would seem more usable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Personally, I don't like preparing code for the 🔮 looking into the potential future.
However, I don't see a likely future where developers wouldn't filter the models by code afterwards, so why not help them do so.
def _get_automation_model(cls, automation_model_str: str) -> AutomationModels: | ||
try: | ||
return AutomationModels(automation_model_str) | ||
except ValueError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this would be an issue? The main idea of the "endpoint" is to allow people to find the model they want to run inference with or train. So the enum just allows filtering on the type of model you are looking for.
What would be the annoying part @denis-cord?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user needs to be able to find model iteration uids to be able to run inference.
encord/orm/model.py
Outdated
def from_dict(cls, json_dict: dict): | ||
return ModelConfiguration( | ||
model_uid=json_dict["model_hash"], | ||
title=json_dict["title"], | ||
description=json_dict["description"], | ||
feature_node_hashes=cls._get_feature_node_hashes(json_dict["features"]), | ||
model=cls._get_automation_model(json_dict["model"]), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a distinction between model uids and model iteration uids (the model itself and different iterations of training).
I think that these iteration ids should also be available through the ModelConfguration
?
Otherwise, this functionality is useless if one wants to run inference, as the iteration uids are necessary for that operation.
@frederik-encord I had a chance to look into this again. Please let me know how this looks now. Also in terms of naming what would be the most natural for you. |
TODO for Denis: Merge this once https://github.com/encord-team/cord-backend/pull/862 is rolled out to prod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a discussion with @denis-cord, I think that we should include some metadata for the training rows, which would allow users to filter their data:
From the top of my head, I would say that these are the use cases from the data available.
created_at
basically used to choose the latest model or “the one from last week that you know worked”training_final_loss
used to choose the model with the smallest loss.training_labels
choose the model that corresponds to a specific label row. The whole micro model idea is to train very targeted models, so applying the right model to the right data is important. (However, this could be on the model level and not the training_log level?)
@frederik-encord now you can inspect more metadata in a composable way.
|
I've discussed with @frederik-encord that the payload of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. Didn't find any issues.
TODO for Denis: Merge this once https://github.com/encord-team/cord-backend/pull/862 is rolled out to prod |
Introduction and Explanation
Allow users to query all the models that are available for a project. Additionally allow users to get more information about the training instances.
JIRA
DEV-1311