Skip to content

Conversation

@mnoszczak
Copy link
Contributor

@mnoszczak mnoszczak commented Jan 23, 2023

Adds model run exports method method

@mnoszczak mnoszczak requested a review from msokoloff1 January 23, 2023 10:36
@mnoszczak mnoszczak force-pushed the mno/al-4398 branch 2 times, most recently from d1c9aca to 45dc93a Compare January 23, 2023 11:18
@mnoszczak mnoszczak force-pushed the mno/al-4398 branch 2 times, most recently from 147583e to 8f0bf2c Compare January 23, 2023 11:28
"""

def export_labels_v2(self, task_name: str,
filter: Optional[ModelRunExportFilter]) -> Task:
Copy link
Contributor

@msokoloff1 msokoloff1 Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should pass in a default value of None for the filter. Even though the type hint says optional, it is a required arg here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter={"media_attributes": True}

hm, this does not look like a filter to me but rather a parameter of the export

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed from ModelRunExportFilter to ModelRunExportParams

Comment on lines 473 to 474
"includeAttachments":
filter["attachments"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also weird that the gql fields are different than the name we expose in the SDK. I can see a future where this causes confusion :).

We should make these the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you update SDK or the API? Also, should SDK use camelCase or snake_case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the api. I think it is fine if the api casing is camelcase and sdk is snake case just that the names should be the same.

task_name, filter={"media_attributes: true"})
assert task.name == task_name
task.wait_till_done()
assert task.status == "COMPLETE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to check at least that the high level keys exist in the payload and the number of items returned was expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be in the scope of integration tests of API, not the SDK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SDK is the most useful end to end integration test. So it is nice to have it here to guarantee that things are working (also it blocks deployments). But technically it is weird that it serves that purpose.

"""

def export_labels_v2(self, task_name: str,
filter: Optional[ModelRunExportFilter]) -> Task:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter={"media_attributes": True}

hm, this does not look like a filter to me but rather a parameter of the export


def export_labels_v2(self, task_name: str,
filter: Optional[ModelRunExportFilter]) -> Task:
mutation_name = "exportDataRows"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this general mutation should be removed and two specialized mutations be added instead:

  • exportDataRowsInProject
  • exportDataRowsInModelRun

Right now, the user can call this mutation without any filters and that would export all the data rows in his organization, and if he has 100M data rows, it's going to add a huge load to our system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@attila-papai won't it be constrained to a specific model run? I think the consensus was to merge the SDK as it is (as API won't change, and then add API endpoints -> update SDK)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when you do

model_run = client.get_model_run("model_run_id")
model_run.export_labels_v2(...)

isn't that constrained to a single model run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is constrained to a single model run from within that SDK call:

                    "modelRunIds": [self.uid],
                    "projectIds": []
                },

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it would make sense if mutation exportDataRowsInModelRun accepts only a single model run id, because users can't export multiple model runs in a single call, right?
I can add this new mutation, shouldn't be more than half an hour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@attila-papai https://github.com/Labelbox/intelligence/pull/13104 I've added it, but it accepts multiple modelRunIds, should we:
a) make it general purpose and rename to exportDataRowsInModelRuns
b) limit it to just one modelRunId and keep the exportDataRowsInModelRun name

@mnoszczak mnoszczak force-pushed the mno/al-4398 branch 2 times, most recently from b587d87 to 639501c Compare January 31, 2023 12:47
"""

def export_labels_v2(self, task_name: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from @karenkyang:
We should call these functions .export_v2(...)

"""
Creates a model run export task with the given filter and returns the task.
>>> export_task = export_labels_v2("my_export_task", filter={"media_attributes": True})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter={"media_attributes": True})
^^ params

Comment on lines 10 to 25
class DataRowParams(TypedDict):
include_data_row_details: Optional[bool]
include_media_attributes: Optional[bool]
include_metadata_fields: Optional[bool]
include_attachments: Optional[bool]


class ProjectExportParams(DataRowParams):
include_project_details: Optional[bool]
include_label_details: Optional[bool]
include_performance_details: Optional[bool]


class ModelRunExportParams(DataRowParams):
# TODO: Add model run fields
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these should be pydantic models. It is really bothering me that we have yet another pattern for our interfaces. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we've discussed this point with Karen and ended up on picking TypedDict, because it requires less ceremony from the end user

@mnoszczak mnoszczak force-pushed the mno/al-4398 branch 3 times, most recently from 50d2f4f to 4863f15 Compare February 2, 2023 01:38
@mnoszczak mnoszczak merged commit dbadecd into develop Feb 2, 2023
@mnoszczak mnoszczak deleted the mno/al-4398 branch February 2, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants