DEV-1311 - Add the list_models and get_training_metadata functions #138

denis-cord · 2022-06-15T14:24:42Z

Introduction and Explanation

Allow users to query all the models that are available for a project. Additionally allow users to get more information about the training instances.

JIRA

DEV-1311

denis-cord · 2022-06-15T14:25:25Z

encord/orm/model.py

+    def _get_automation_model(cls, automation_model_str: str) -> AutomationModels:
+        try:
+            return AutomationModels(automation_model_str)
+        except ValueError as e:


@frederik-encord I guess in hindsight this can be a bit annoying for clients with the enum.

Not sure why this would be an issue? The main idea of the "endpoint" is to allow people to find the model they want to run inference with or train. So the enum just allows filtering on the type of model you are looking for.

What would be the annoying part @denis-cord?

I'm just thinking of the following scenario:

client is at SDK version X

we add a new model, which means that here in version X we would throw

client is forced to update their SDK version

Not a big issue for now that we only do minor version updates, but definitely annoying once we introduce a breaking change and force the client to deal with it.

Yea that is not great. @denis-cord is there a way to get both intellisense and avoid the need to update sdk for new models in the future?

github-actions · 2022-06-15T14:25:39Z

Unit test report

45 tests 35 ✔️ 2s ⏱️
  1 suites 10 💤
  1 files   0 ❌

Results for commit 48efc4f.

♻️ This comment has been updated with latest results.

denis-cord · 2022-06-15T14:28:16Z

https://github.com/encord-team/cord-backend/pull/862 this needs to be rolled out before we can release the SDK to clients.

frederik-encord

Just one comment on listing models through the Project.
Looks great otherwise.

frederik-encord · 2022-06-16T06:32:02Z

encord/project.py

@@ -375,6 +376,12 @@ def add_classification(
        """
        return self._client.add_classification(name, classification_type, required, options)

+    def list_models(self) -> List[ModelConfiguration]:


If we want to be consistent with, e.g., datasets and label rows, this should probably be a property called models instead. What do you think @denis-cord?

I think that is a good point.

What I was thinking about is that we might want to introduce a filter at some point like we do in list_label_rows. In that case we could add the filter methods in a new non-property accessor.

So I guess the consideration is:

do we believe that we might want to have filters at some point for models (I think it may be possible)

in that case - would we rather have two different methods or one (I thought one would be a bit nicer).

But I am also happy to have only the property accessor for now if you say that would seem more usable.

I see your point. Personally, I don't like preparing code for the 🔮 looking into the potential future.
However, I don't see a likely future where developers wouldn't filter the models by code afterwards, so why not help them do so.

frederik-encord · 2022-06-16T06:36:27Z

encord/orm/model.py

+    def _get_automation_model(cls, automation_model_str: str) -> AutomationModels:
+        try:
+            return AutomationModels(automation_model_str)
+        except ValueError as e:


Not sure why this would be an issue? The main idea of the "endpoint" is to allow people to find the model they want to run inference with or train. So the enum just allows filtering on the type of model you are looking for.

What would be the annoying part @denis-cord?

frederik-encord

The user needs to be able to find model iteration uids to be able to run inference.

frederik-encord · 2022-06-16T06:52:26Z

encord/orm/model.py

+    def from_dict(cls, json_dict: dict):
+        return ModelConfiguration(
+            model_uid=json_dict["model_hash"],
+            title=json_dict["title"],
+            description=json_dict["description"],
+            feature_node_hashes=cls._get_feature_node_hashes(json_dict["features"]),
+            model=cls._get_automation_model(json_dict["model"]),
+        )


There is a distinction between model uids and model iteration uids (the model itself and different iterations of training).
I think that these iteration ids should also be available through the ModelConfguration?
Otherwise, this functionality is useless if one wants to run inference, as the iteration uids are necessary for that operation.

denis-cord · 2022-07-06T16:17:33Z

@frederik-encord I had a chance to look into this again. Please let me know how this looks now. Also in terms of naming what would be the most natural for you.

denis-cord · 2022-07-08T09:19:49Z

TODO for Denis: Merge this once https://github.com/encord-team/cord-backend/pull/862 is rolled out to prod

frederik-encord

After a discussion with @denis-cord, I think that we should include some metadata for the training rows, which would allow users to filter their data:
From the top of my head, I would say that these are the use cases from the data available.

created_at basically used to choose the latest model or “the one from last week that you know worked”
training_final_loss used to choose the model with the smallest loss.
training_labels choose the model that corresponds to a specific label row. The whole micro model idea is to train very targeted models, so applying the right model to the right data is important. (However, this could be on the model level and not the training_log level?)

denis-cord · 2022-08-12T17:11:11Z

@frederik-encord now you can inspect more metadata in a composable way.

~~TODO~~ DONE Denis: double check the docs generation.

denis-cord · 2022-08-15T08:33:45Z

I've discussed with @frederik-encord that the payload of training_labels can be quite large. Hence, there is now a second helper function which is responsible for retrieving data around individual training instances.

frederik-encord

Looking great. Didn't find any issues.

denis-cord · 2022-08-16T09:39:04Z

TODO for Denis: Merge this once https://github.com/encord-team/cord-backend/pull/862 is rolled out to prod

Add the list_models function

3e9dd27

denis-cord requested review from frederik-encord and alexey-cord-tech June 15, 2022 14:24

denis-cord commented Jun 15, 2022

View reviewed changes

frederik-encord reviewed Jun 16, 2022

View reviewed changes

frederik-encord requested changes Jun 16, 2022

View reviewed changes

denis-cord marked this pull request as draft June 21, 2022 10:34

denis-cord added 3 commits July 6, 2022 16:14

Merge remote-tracking branch 'origin/master' into dg-add-list-models

623d295

Add the model_iteration_uids into the response.

23737d0

Add clarifying user comment

9af64bf

denis-cord marked this pull request as ready for review July 6, 2022 16:17

frederik-encord approved these changes Jul 8, 2022

View reviewed changes

frederik-encord self-requested a review July 8, 2022 09:15

frederik-encord requested changes Jul 8, 2022

View reviewed changes

denis-cord added 2 commits August 12, 2022 10:57

Merge remote-tracking branch 'origin/master' into dg-add-list-models

967a558

Add a composable get_training_metadata function.

3a78ad5

Fixes to docs

aaa14af

More sphinx changes

48efc4f

frederik-encord approved these changes Aug 15, 2022

View reviewed changes

denis-cord changed the title ~~DEV-1311 - Add the list_models function~~ DEV-1311 - Add the list_models and get_training_metadata functions Aug 16, 2022

denis-cord merged commit b0218d3 into master Aug 17, 2022

denis-cord deleted the dg-add-list-models branch August 17, 2022 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEV-1311 - Add the list_models and get_training_metadata functions #138

DEV-1311 - Add the list_models and get_training_metadata functions #138

denis-cord commented Jun 15, 2022 •

edited

denis-cord Jun 15, 2022

frederik-encord Jun 16, 2022

denis-cord Jun 20, 2022

frederik-encord Aug 15, 2022

github-actions bot commented Jun 15, 2022 •

edited

denis-cord commented Jun 15, 2022

frederik-encord left a comment

frederik-encord Jun 16, 2022

denis-cord Jun 20, 2022

frederik-encord Jul 8, 2022

frederik-encord Jun 16, 2022

frederik-encord left a comment

frederik-encord Jun 16, 2022

denis-cord commented Jul 6, 2022

denis-cord commented Jul 8, 2022

frederik-encord left a comment

denis-cord commented Aug 12, 2022 •

edited

denis-cord commented Aug 15, 2022

frederik-encord left a comment

denis-cord commented Aug 16, 2022

DEV-1311 - Add the list_models and get_training_metadata functions #138

DEV-1311 - Add the list_models and get_training_metadata functions #138

Conversation

denis-cord commented Jun 15, 2022 • edited

Introduction and Explanation

JIRA

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 15, 2022 • edited

Unit test report

denis-cord commented Jun 15, 2022

frederik-encord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frederik-encord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denis-cord commented Jul 6, 2022

denis-cord commented Jul 8, 2022

frederik-encord left a comment

Choose a reason for hiding this comment

denis-cord commented Aug 12, 2022 • edited

denis-cord commented Aug 15, 2022

frederik-encord left a comment

Choose a reason for hiding this comment

denis-cord commented Aug 16, 2022

denis-cord commented Jun 15, 2022 •

edited

github-actions bot commented Jun 15, 2022 •

edited

denis-cord commented Aug 12, 2022 •

edited