[DOM-38906] - update datasets endpoints #136

ddl-olsonJD · 2022-06-22T18:59:17Z

Link to JIRA

What issue does this pull request solve?

Domino had api endpoints available for dataset access, however, those endpoints were not usable via python API. Some updates were made to the datasets endpoints, which are also reflected in this PR.

This PR's goal is to complete the work that had started a while back to bring dataset access via python-domino, and is being updated.

What is the solution?

Updated PR code base to reflect current version of python-domino and domino.

Testing

Added Unit tests, as well as example test file file

Unit test(s)

Pull Request Reminders

Has relevant documentation been updated?
Does the code follow [Python Style Guide] (https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)
Update the changelog
Are the existing unit tests still passing?
Have new unit tests been added to cover any changes to the code?
Has the JIRA ticket(s) been linked above?

References (optional)

Datasets list #50

Merging in upstream changes from original repo into fork.

Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0a728cecce460006318842

…python-domino-dev/runs/5e0a8037c47b3f0006feada2

…rks with or without a project id argument). Also added an example file get_datasets.py in the examples folder, which shows examples with and without specifying a project id. Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d

…tasets.py example. Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d

…ils api wrapper. Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d

…python-domino-dev/runs/5e17b14bfd96490006ab0548

Datasets list

ddl-olsonJD · 2022-06-23T00:18:52Z

ddl-awroblicky · 2022-06-23T19:42:17Z

README.md

+
+list the names of a filtered datasets for a particular project
+
+- _project_id:_ id that identify the specific project to be used 


*identifies

ddl-awroblicky · 2022-06-23T19:45:10Z

README.md

@@ -247,6 +247,70 @@ parameter in `job_start` method

 <hr>

+## Datasets
+


more just a nice to have, but maybe it could be helpful to define what a Dataset is.

I think the Domino documentation defines it as "A Domino Dataset is a collection of files that are available in user executions as a filesystem directory" (https://docs.dominodatalab.com/en/latest/user_guide/0a8d11/datasets-overview/)

ddl-awroblicky · 2022-06-23T19:46:03Z

domino/domino.py

@@ -423,7 +427,7 @@ def validate_spark_executor_count(executor_count, max_executor_count):
        def get_default_spark_settings():
            self.log.debug("Getting default spark settings")
            default_spark_setting_url = self._routes.default_spark_setting(
-                self._project_id
+                self.project_id


What are the tradeoffs involved in making this no longer private?

In addition to using it internally (as in the example above), it can also be used externally, particularly within datasets to filter current project datasets. It can also be used, can cases where someone may want to switch between projects, in ensuring current project.
Having it private means that it 'should not' be used as a public attribute, but it does not prevent its usage.

ddl-awroblicky · 2022-06-23T19:46:37Z

domino/domino.py

@@ -925,6 +929,74 @@ def model_version_publish(
        response = self.request_manager.post(url, json=request)
        return response.json()

+    # Dataset Functions
+    def datasets_list(self, project_id=None):
+        self.requires_at_least("3.6.0")


"3.6.0" seems to be used in several places, perhaps it could be a constant?

Perhaps requires_at_least should have a default, but 2.5.0 seems to be the most frequent currently.

As 3.6.x is the current minimum version available, that will be set as min required, until 3.6 is fully deprecated.

ddl-awroblicky · 2022-06-23T19:47:35Z

domino/exceptions.py

@@ -4,6 +4,18 @@ class DominoException(Exception):
    pass


+class DatasetNotFoundException(DominoException):
+    """Run Not Found Exception"""


Should this be Dataset instead of Run?

Thank you for catching this.

ddl-awroblicky · 2022-06-23T19:49:17Z

examples/example_dataset.py

+
+# Get the details of a dataset, if one exists for the current project
+current_project_datasets = domino.datasets_list(domino.project_id)
+dataset_id = str(current_project_datasets[1]["datasetId"])


is str needed here? maybe can just do interpolation in the print instead?

Nope. they are not.

… into olsonJD.DOM-38906.update-and-merge-datasets-endpoints

katieshakman and others added 19 commits December 30, 2019 13:15

Merge pull request #1 from dominodatalab/master

bccde74

Merging in upstream changes from original repo into fork.

Edits in progress to add datasets_list route and function.

cf1f299

Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0a728cecce460006318842

Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/…

7ed2133

…python-domino-dev/runs/5e0a8037c47b3f0006feada2

Cleaned up after adding datasets_list route and function and a get_da…

4ad47fb

…tasets.py example. Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d

Finished create a dataset api wrapper, working on update dataset deta…

ca198b9

…ils api wrapper. Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/python-domino-dev/runs/5e0b9ea9ed0ad80006aa053d

Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/…

0565524

…python-domino-dev/runs/5e17b14bfd96490006ab0548

Cleaned up comment in get_datasets.py.

9ad90bb

Pushed from Domino: https://staging.domino.tech/u/domino-katie-test2/…

2ba1370

…python-domino-dev/runs/5e17b14bfd96490006ab0548

Cleaned up comments in get_datasets.py example.

f6a1a00

Merge branch 'master' into datasets_list

8ac6a39

Merge pull request #2 from katieshakman/datasets_list

457dc17

Datasets list

DOM-38906 - updating from master

9884f4c

DOM-38906 - updating for linters

b4664e1

DOM-38906 - fixing dataset bugs

d1e0ade

DOM-38906 - update datasets routes

521bee3

DOM-38906 - update datasets updates

21a57a6

DOM-38906 - added dataset tests

93bd354

DOM-38906 - updated dataset examples

32dca5d

DOM-38906 - updated changelog and fix for linters

a8d709b

ddl-olsonJD marked this pull request as ready for review June 23, 2022 00:29

ddl-olsonJD requested a review from a team June 23, 2022 00:29

ddl-olsonJD added 3 commits June 22, 2022 17:44

DOM-38906 - updated dataset test

03d31da

DOM-38906 - updated dataset test with dataset name fixtures

b669801

DOM-38906 - updated readme with datasets

1ef4039

ddl-olsonJD changed the title ~~[DOM-38906] - update and merge datasets endpoints~~ [DOM-38906] - update datasets endpoints Jun 23, 2022

ddl-awroblicky reviewed Jun 23, 2022

View reviewed changes

ddl-olsonJD added 3 commits June 24, 2022 06:12

DOM-38906 - updated gitignore

312a1d6

Merge branch 'master' of https://github.com/dominodatalab/python-domino…

9433769

… into olsonJD.DOM-38906.update-and-merge-datasets-endpoints

DOM-38906 - updated from comments

63d22bb

ddl-olsonJD requested review from ddl-awroblicky and ddl-abayly June 24, 2022 14:15

ddl-olsonJD mentioned this pull request Jun 24, 2022

Datasets list #50

Closed

ddl-awroblicky approved these changes Jun 24, 2022

View reviewed changes

DOM-38906 - updated minimum domino version

5d211fe

ddl-olsonJD merged commit 08f22ad into master Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOM-38906] - update datasets endpoints #136

[DOM-38906] - update datasets endpoints #136

ddl-olsonJD commented Jun 22, 2022 •

edited

Loading

ddl-olsonJD commented Jun 23, 2022

ddl-awroblicky Jun 23, 2022

ddl-awroblicky Jun 23, 2022

ddl-olsonJD Jun 24, 2022

ddl-awroblicky Jun 23, 2022

ddl-olsonJD Jun 24, 2022 •

edited

Loading

ddl-awroblicky Jun 23, 2022

ddl-olsonJD Jun 24, 2022

ddl-olsonJD Jun 24, 2022

ddl-awroblicky Jun 23, 2022

ddl-olsonJD Jun 24, 2022

ddl-awroblicky Jun 23, 2022

ddl-olsonJD Jun 24, 2022


		list the names of a filtered datasets for a particular project

		- _project_id:_ id that identify the specific project to be used

		@@ -247,6 +247,70 @@ parameter in `job_start` method

		<hr>

		## Datasets

[DOM-38906] - update datasets endpoints #136

[DOM-38906] - update datasets endpoints #136

Conversation

ddl-olsonJD commented Jun 22, 2022 • edited Loading

Link to JIRA

What issue does this pull request solve?

What is the solution?

Testing

Pull Request Reminders

References (optional)

ddl-olsonJD commented Jun 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddl-olsonJD Jun 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ddl-olsonJD commented Jun 22, 2022 •

edited

Loading

ddl-olsonJD Jun 24, 2022 •

edited

Loading