-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOM-38906] - update datasets endpoints #136
[DOM-38906] - update datasets endpoints #136
Conversation
Merging in upstream changes from original repo into fork.
…rks with or without a project id argument). Also added an example file get_datasets.py in the examples folder, which shows examples with and without specifying a project id. Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d
Datasets list
README.md
Outdated
|
||
list the names of a filtered datasets for a particular project | ||
|
||
- _project_id:_ id that identify the specific project to be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*identifies
@@ -247,6 +247,70 @@ parameter in `job_start` method | |||
|
|||
<hr> | |||
|
|||
## Datasets | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more just a nice to have, but maybe it could be helpful to define what a Dataset is.
I think the Domino documentation defines it as "A Domino Dataset is a collection of files that are available in user executions as a filesystem directory" (https://docs.dominodatalab.com/en/latest/user_guide/0a8d11/datasets-overview/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
@@ -423,7 +427,7 @@ def validate_spark_executor_count(executor_count, max_executor_count): | |||
def get_default_spark_settings(): | |||
self.log.debug("Getting default spark settings") | |||
default_spark_setting_url = self._routes.default_spark_setting( | |||
self._project_id | |||
self.project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the tradeoffs involved in making this no longer private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to using it internally (as in the example above), it can also be used externally, particularly within datasets to filter current project datasets. It can also be used, can cases where someone may want to switch between projects, in ensuring current project.
Having it private means that it 'should not' be used as a public attribute, but it does not prevent its usage.
domino/domino.py
Outdated
@@ -925,6 +929,74 @@ def model_version_publish( | |||
response = self.request_manager.post(url, json=request) | |||
return response.json() | |||
|
|||
# Dataset Functions | |||
def datasets_list(self, project_id=None): | |||
self.requires_at_least("3.6.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"3.6.0" seems to be used in several places, perhaps it could be a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps requires_at_least should have a default, but 2.5.0 seems to be the most frequent currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As 3.6.x is the current minimum version available, that will be set as min required, until 3.6 is fully deprecated.
domino/exceptions.py
Outdated
@@ -4,6 +4,18 @@ class DominoException(Exception): | |||
pass | |||
|
|||
|
|||
class DatasetNotFoundException(DominoException): | |||
"""Run Not Found Exception""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Dataset instead of Run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching this.
examples/example_dataset.py
Outdated
|
||
# Get the details of a dataset, if one exists for the current project | ||
current_project_datasets = domino.datasets_list(domino.project_id) | ||
dataset_id = str(current_project_datasets[1]["datasetId"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is str needed here? maybe can just do interpolation in the print instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. they are not.
… into olsonJD.DOM-38906.update-and-merge-datasets-endpoints
Link to JIRA
DOM-38906
What issue does this pull request solve?
Domino had api endpoints available for dataset access, however, those endpoints were not usable via python API. Some updates were made to the datasets endpoints, which are also reflected in this PR.
This PR's goal is to complete the work that had started a while back to bring dataset access via python-domino, and is being updated.
What is the solution?
Updated PR code base to reflect current version of python-domino and domino.
Testing
Added Unit tests, as well as example test file file
Pull Request Reminders
References (optional)