Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOM-38906] - update datasets endpoints #136

Merged

Conversation

ddl-olsonJD
Copy link
Collaborator

@ddl-olsonJD ddl-olsonJD commented Jun 22, 2022

Link to JIRA

DOM-38906

What issue does this pull request solve?

Domino had api endpoints available for dataset access, however, those endpoints were not usable via python API. Some updates were made to the datasets endpoints, which are also reflected in this PR.

This PR's goal is to complete the work that had started a while back to bring dataset access via python-domino, and is being updated.

What is the solution?

Updated PR code base to reflect current version of python-domino and domino.

Testing

Added Unit tests, as well as example test file file

  • Unit test(s)

Pull Request Reminders

References (optional)

katieshakman and others added 19 commits December 30, 2019 13:15
Merging in upstream changes from original repo into fork.
…rks with or without a project id argument). Also added an example file get_datasets.py in the examples folder, which shows examples with and without specifying a project id.

Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d
@ddl-olsonJD
Copy link
Collaborator Author

image

@ddl-olsonJD ddl-olsonJD marked this pull request as ready for review June 23, 2022 00:29
@ddl-olsonJD ddl-olsonJD requested a review from a team June 23, 2022 00:29
@ddl-olsonJD ddl-olsonJD changed the title [DOM-38906] - update and merge datasets endpoints [DOM-38906] - update datasets endpoints Jun 23, 2022
README.md Outdated

list the names of a filtered datasets for a particular project

- _project_id:_ id that identify the specific project to be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*identifies

@@ -247,6 +247,70 @@ parameter in `job_start` method

<hr>

## Datasets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more just a nice to have, but maybe it could be helpful to define what a Dataset is.

I think the Domino documentation defines it as "A Domino Dataset is a collection of files that are available in user executions as a filesystem directory" (https://docs.dominodatalab.com/en/latest/user_guide/0a8d11/datasets-overview/)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -423,7 +427,7 @@ def validate_spark_executor_count(executor_count, max_executor_count):
def get_default_spark_settings():
self.log.debug("Getting default spark settings")
default_spark_setting_url = self._routes.default_spark_setting(
self._project_id
self.project_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the tradeoffs involved in making this no longer private?

Copy link
Collaborator Author

@ddl-olsonJD ddl-olsonJD Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to using it internally (as in the example above), it can also be used externally, particularly within datasets to filter current project datasets. It can also be used, can cases where someone may want to switch between projects, in ensuring current project.
Having it private means that it 'should not' be used as a public attribute, but it does not prevent its usage.

domino/domino.py Outdated
@@ -925,6 +929,74 @@ def model_version_publish(
response = self.request_manager.post(url, json=request)
return response.json()

# Dataset Functions
def datasets_list(self, project_id=None):
self.requires_at_least("3.6.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"3.6.0" seems to be used in several places, perhaps it could be a constant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps requires_at_least should have a default, but 2.5.0 seems to be the most frequent currently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As 3.6.x is the current minimum version available, that will be set as min required, until 3.6 is fully deprecated.

@@ -4,6 +4,18 @@ class DominoException(Exception):
pass


class DatasetNotFoundException(DominoException):
"""Run Not Found Exception"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Dataset instead of Run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this.


# Get the details of a dataset, if one exists for the current project
current_project_datasets = domino.datasets_list(domino.project_id)
dataset_id = str(current_project_datasets[1]["datasetId"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is str needed here? maybe can just do interpolation in the print instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. they are not.

@ddl-olsonJD ddl-olsonJD merged commit 08f22ad into master Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants