-
Notifications
You must be signed in to change notification settings - Fork 68
[AL-4398] Add Model Run exports #840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
73c45f1 to
798f518
Compare
d1c9aca to
45dc93a
Compare
147583e to
8f0bf2c
Compare
8f0bf2c to
b26f235
Compare
2d965a9 to
598db8d
Compare
4ffc435 to
2347a2b
Compare
labelbox/schema/model_run.py
Outdated
| """ | ||
|
|
||
| def export_labels_v2(self, task_name: str, | ||
| filter: Optional[ModelRunExportFilter]) -> Task: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should pass in a default value of None for the filter. Even though the type hint says optional, it is a required arg here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter={"media_attributes": True}
hm, this does not look like a filter to me but rather a parameter of the export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed from ModelRunExportFilter to ModelRunExportParams
labelbox/schema/model_run.py
Outdated
| "includeAttachments": | ||
| filter["attachments"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also weird that the gql fields are different than the name we expose in the SDK. I can see a future where this causes confusion :).
We should make these the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you update SDK or the API? Also, should SDK use camelCase or snake_case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the api. I think it is fine if the api casing is camelcase and sdk is snake case just that the names should be the same.
| task_name, filter={"media_attributes: true"}) | ||
| assert task.name == task_name | ||
| task.wait_till_done() | ||
| assert task.status == "COMPLETE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to check at least that the high level keys exist in the payload and the number of items returned was expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that be in the scope of integration tests of API, not the SDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SDK is the most useful end to end integration test. So it is nice to have it here to guarantee that things are working (also it blocks deployments). But technically it is weird that it serves that purpose.
labelbox/schema/model_run.py
Outdated
| """ | ||
|
|
||
| def export_labels_v2(self, task_name: str, | ||
| filter: Optional[ModelRunExportFilter]) -> Task: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter={"media_attributes": True}
hm, this does not look like a filter to me but rather a parameter of the export
labelbox/schema/model_run.py
Outdated
|
|
||
| def export_labels_v2(self, task_name: str, | ||
| filter: Optional[ModelRunExportFilter]) -> Task: | ||
| mutation_name = "exportDataRows" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this general mutation should be removed and two specialized mutations be added instead:
- exportDataRowsInProject
- exportDataRowsInModelRun
Right now, the user can call this mutation without any filters and that would export all the data rows in his organization, and if he has 100M data rows, it's going to add a huge load to our system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@attila-papai won't it be constrained to a specific model run? I think the consensus was to merge the SDK as it is (as API won't change, and then add API endpoints -> update SDK)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when you do
model_run = client.get_model_run("model_run_id")
model_run.export_labels_v2(...)
isn't that constrained to a single model run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is constrained to a single model run from within that SDK call:
"modelRunIds": [self.uid],
"projectIds": []
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it would make sense if mutation exportDataRowsInModelRun accepts only a single model run id, because users can't export multiple model runs in a single call, right?
I can add this new mutation, shouldn't be more than half an hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@attila-papai https://github.com/Labelbox/intelligence/pull/13104 I've added it, but it accepts multiple modelRunIds, should we:
a) make it general purpose and rename to exportDataRowsInModelRuns
b) limit it to just one modelRunId and keep the exportDataRowsInModelRun name
b587d87 to
639501c
Compare
639501c to
dc7d985
Compare
labelbox/schema/model_run.py
Outdated
| """ | ||
|
|
||
| def export_labels_v2(self, task_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from @karenkyang:
We should call these functions .export_v2(...)
labelbox/schema/model_run.py
Outdated
| """ | ||
| Creates a model run export task with the given filter and returns the task. | ||
| >>> export_task = export_labels_v2("my_export_task", filter={"media_attributes": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter={"media_attributes": True})
^^ params
| class DataRowParams(TypedDict): | ||
| include_data_row_details: Optional[bool] | ||
| include_media_attributes: Optional[bool] | ||
| include_metadata_fields: Optional[bool] | ||
| include_attachments: Optional[bool] | ||
|
|
||
|
|
||
| class ProjectExportParams(DataRowParams): | ||
| include_project_details: Optional[bool] | ||
| include_label_details: Optional[bool] | ||
| include_performance_details: Optional[bool] | ||
|
|
||
|
|
||
| class ModelRunExportParams(DataRowParams): | ||
| # TODO: Add model run fields | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these should be pydantic models. It is really bothering me that we have yet another pattern for our interfaces. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we've discussed this point with Karen and ended up on picking TypedDict, because it requires less ceremony from the end user
50d2f4f to
4863f15
Compare
4863f15 to
32c30f2
Compare
Adds model run exports method method