Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV-1311 - Add the list_models and get_training_metadata functions #138

Merged
merged 8 commits into from Aug 17, 2022

Conversation

denis-cord
Copy link
Contributor

@denis-cord denis-cord commented Jun 15, 2022

Introduction and Explanation

Allow users to query all the models that are available for a project. Additionally allow users to get more information about the training instances.

JIRA

DEV-1311

def _get_automation_model(cls, automation_model_str: str) -> AutomationModels:
try:
return AutomationModels(automation_model_str)
except ValueError as e:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frederik-encord I guess in hindsight this can be a bit annoying for clients with the enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this would be an issue? The main idea of the "endpoint" is to allow people to find the model they want to run inference with or train. So the enum just allows filtering on the type of model you are looking for.

What would be the annoying part @denis-cord?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking of the following scenario:

  1. client is at SDK version X
  2. we add a new model, which means that here in version X we would throw
  3. client is forced to update their SDK version

Not a big issue for now that we only do minor version updates, but definitely annoying once we introduce a breaking change and force the client to deal with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that is not great. @denis-cord is there a way to get both intellisense and avoid the need to update sdk for new models in the future?

@github-actions
Copy link

github-actions bot commented Jun 15, 2022

Unit test report

45 tests   35 ✔️  2s ⏱️
  1 suites  10 💤
  1 files      0

Results for commit 48efc4f.

♻️ This comment has been updated with latest results.

@denis-cord
Copy link
Contributor Author

https://github.com/encord-team/cord-backend/pull/862 this needs to be rolled out before we can release the SDK to clients.

Copy link
Contributor

@frederik-encord frederik-encord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment on listing models through the Project.
Looks great otherwise.

@@ -375,6 +376,12 @@ def add_classification(
"""
return self._client.add_classification(name, classification_type, required, options)

def list_models(self) -> List[ModelConfiguration]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be consistent with, e.g., datasets and label rows, this should probably be a property called models instead. What do you think @denis-cord?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good point.

What I was thinking about is that we might want to introduce a filter at some point like we do in list_label_rows. In that case we could add the filter methods in a new non-property accessor.

So I guess the consideration is:

  • do we believe that we might want to have filters at some point for models (I think it may be possible)
  • in that case - would we rather have two different methods or one (I thought one would be a bit nicer).

But I am also happy to have only the property accessor for now if you say that would seem more usable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Personally, I don't like preparing code for the 🔮 looking into the potential future.
However, I don't see a likely future where developers wouldn't filter the models by code afterwards, so why not help them do so.

def _get_automation_model(cls, automation_model_str: str) -> AutomationModels:
try:
return AutomationModels(automation_model_str)
except ValueError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this would be an issue? The main idea of the "endpoint" is to allow people to find the model they want to run inference with or train. So the enum just allows filtering on the type of model you are looking for.

What would be the annoying part @denis-cord?

Copy link
Contributor

@frederik-encord frederik-encord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user needs to be able to find model iteration uids to be able to run inference.

Comment on lines 59 to 66
def from_dict(cls, json_dict: dict):
return ModelConfiguration(
model_uid=json_dict["model_hash"],
title=json_dict["title"],
description=json_dict["description"],
feature_node_hashes=cls._get_feature_node_hashes(json_dict["features"]),
model=cls._get_automation_model(json_dict["model"]),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a distinction between model uids and model iteration uids (the model itself and different iterations of training).
I think that these iteration ids should also be available through the ModelConfguration?
Otherwise, this functionality is useless if one wants to run inference, as the iteration uids are necessary for that operation.

@denis-cord denis-cord marked this pull request as draft June 21, 2022 10:34
@denis-cord denis-cord marked this pull request as ready for review July 6, 2022 16:17
@denis-cord
Copy link
Contributor Author

@frederik-encord I had a chance to look into this again. Please let me know how this looks now. Also in terms of naming what would be the most natural for you.

@frederik-encord frederik-encord self-requested a review July 8, 2022 09:15
@denis-cord
Copy link
Contributor Author

TODO for Denis: Merge this once https://github.com/encord-team/cord-backend/pull/862 is rolled out to prod

Copy link
Contributor

@frederik-encord frederik-encord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with @denis-cord, I think that we should include some metadata for the training rows, which would allow users to filter their data:
From the top of my head, I would say that these are the use cases from the data available.

  • created_at basically used to choose the latest model or “the one from last week that you know worked”
  • training_final_loss used to choose the model with the smallest loss.
  • training_labels choose the model that corresponds to a specific label row. The whole micro model idea is to train very targeted models, so applying the right model to the right data is important. (However, this could be on the model level and not the training_log level?)

@denis-cord
Copy link
Contributor Author

denis-cord commented Aug 12, 2022

@frederik-encord now you can inspect more metadata in a composable way.

TODO DONE Denis: double check the docs generation.

@denis-cord
Copy link
Contributor Author

I've discussed with @frederik-encord that the payload of training_labels can be quite large. Hence, there is now a second helper function which is responsible for retrieving data around individual training instances.

Copy link
Contributor

@frederik-encord frederik-encord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Didn't find any issues.

@denis-cord denis-cord changed the title DEV-1311 - Add the list_models function DEV-1311 - Add the list_models and get_training_metadata functions Aug 16, 2022
@denis-cord
Copy link
Contributor Author

TODO for Denis: Merge this once https://github.com/encord-team/cord-backend/pull/862 is rolled out to prod

@denis-cord denis-cord merged commit b0218d3 into master Aug 17, 2022
@denis-cord denis-cord deleted the dg-add-list-models branch August 17, 2022 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants