feat: add project files to datasets from service #877

jsam · 2019-12-18T11:14:22Z

closes: #769
depends on: #875

fixes: #881
fixes: #880

Fixes additional problems with:

dataset creators
linting problems
fixes broken tests
adds more tests around dataset create and add commands

…r/renku-python into 769_project_files0

mohammad-alisafaee

Thanks for the PR! Some minor comments. My main comment is regarding #881 which I believe has a simpler fix.

mohammad-alisafaee · 2019-12-20T08:02:36Z

renku/core/commands/dataset.py

        creators = [Person.from_string(c) for c in creators]

+    elif hasattr(creators, '__iter__') and isinstance(creators[0], dict):
+        creators = [Person.from_dict(creator) for creator in creators]


Should we pass the correct type (i.e. list of Person) to this function when calling it. These type checks do not look nice.

Creators can be any collection type (list, tuple, str,...), so checking for correct type would make this even uglier.

I am not talking about type checking for correctness but about taking action based on the type. This code also does not check for type correctness (if creators is not an iterable nothing happens and we might end up with wrong metadata).

mohammad-alisafaee · 2019-12-20T08:14:13Z

renku/core/management/datasets.py

        file_paths = {str(data['path']) for data in files if str(data['path'])}
-        self.repo.git.add(*(file_paths - set(ignored)))
+        new_files_count = len(file_paths) - len(ignored)
+        new_files = (file_paths - set(ignored))


These are not new files but the files that are not ignored. Ignored files can be new as well.

Ignored files are handled by the force. There are tests for it and you can check that it works fine. But the count might be off, let me check.

mohammad-alisafaee · 2019-12-20T08:22:05Z

renku/core/management/datasets.py


-        if not self.repo.is_dirty():
-            return warning_message
+        if dataset.contains(new_files) is False and (new_files_count or force):


This condition is not right. For example: If the dataset contains at least one file from new_files then the if block won't execute and the rest of the files won't be added to the dataset.

Which (I would argue) that is exactly what we want as a default behaviour - transactional add. Operation should not partially succeed since from users perspective is very strange to have partial success - since users might add something by accident which they donot want. We should force users to be explicit on the CLI and not silently succeeding at the operations and not letting them know what exactly happened (which is currently the case).

edit: Let's add a flag where that behavior would be possible. That would make things more explicit and straightforward.

The following is obviously a bug:

[~/playground/tmp]$ renku dataset create a Use the name "a" to refer to this dataset. OK [~/playground/tmp]$ renku dataset add a ~/some-file [~/playground/tmp]$ renku dataset add a ~/some-file ~/some-other-file [~/playground/tmp]$ git grep some-file .gitattributes:data/a/some-file filter=lfs diff=lfs merge=lfs -text .renku/datasets/f2960ea8-65a3-4a97-92e2-e87768d21f05/metadata.yml: _id: file://blob/1f7d45364cbe7e7de19bfdfa01da7585351a9e5c/data/a/some-file .renku/datasets/f2960ea8-65a3-4a97-92e2-e87768d21f05/metadata.yml: _label: data/a/some-file@1f7d45364cbe7e7de19bfdfa01da7585351a9e5c .renku/datasets/f2960ea8-65a3-4a97-92e2-e87768d21f05/metadata.yml: name: some-file .renku/datasets/f2960ea8-65a3-4a97-92e2-e87768d21f05/metadata.yml: path: data/a/some-file .renku/datasets/f2960ea8-65a3-4a97-92e2-e87768d21f05/metadata.yml: url: file://../../some-file [~/playground/tmp]$ git grep some-other-file .gitattributes:data/a/some-other-file filter=lfs diff=lfs merge=lfs -text [~/playground/tmp]$ git log --pretty=oneline 762b5de192477613185d8e90900a0f38c920b8b9 (HEAD -> master) renku dataset add a /home/mohammad/some-file /home/mohammad/some-other-file bf50499e866daec0a1aa09170e36a422a6ba26ee renku dataset add a /home/mohammad/some-file 1f7d45364cbe7e7de19bfdfa01da7585351a9e5c renku dataset: committing 1 newly added files e6dcaae7418a8c53f47ec412bae405032393a863 renku dataset create a 11b817913f9b58601b19aa1bc20d4f21ad196f06 renku init

It partially succeeds and it does not print an error message which I believe is the exact opposite of what you are aiming to do here.

That is truly a bug! I will add a test case for it and fix it.

Thank you.

mohammad-alisafaee · 2019-12-20T08:27:41Z

renku/core/management/datasets.py

+        new_files = (file_paths - set(ignored))

-        if not self.repo.is_dirty():
-            return warning_message


The bug reported in #881 can be solved by moving the commit statement to this if block and removing return from here:

if not self.repo.is_dirty(): self.repo.index.commit( 'renku dataset: commiting {} newly added files'. format(len(file_paths) + len(ignored)) )

Like this we avoid an empty commit if there is no change in data files and we allow the metadata to be updated (which was prevented previously by returning early).

And in that case adding ignored files would fail to be added?

No. Added ignored files show up as dirty.

How? If they are ignored it means that they are part of the .gitignore and git won't register them at all.

[~/playground/tmp]$ rm -rf * .* ; renku init OK Initialized empty project in /home/mohammad/playground/tmp [~/playground/tmp]$ grep .scrapy .gitignore .scrapy [~/playground/tmp]$ echo .scrapy > .scrapy [~/playground/tmp]$ git add .scrapy The following paths are ignored by one of your .gitignore files: .scrapy Use -f if you really want to add them. [~/playground/tmp]$ git add -f .scrapy [~/playground/tmp]$ git status -s A .scrapy

If users do not use --force and there are ignored files then this function raises an exception and won't reach this point.

mohammad-alisafaee · 2019-12-20T08:29:51Z

renku/core/models/datasets.py

        data = {field_: obj.pop(field_) for field_ in self.EDITABLE_FIELDS}
        return data

+    def contains(self, files):


This name is a bit misleading. I'd rename it to contains_any or contains_at_least_one.

Why is it miss leading? It follows a common functionality pattern which can be found in standard library (and in other languages standard library).

Since it accepts multiple files I expect contains return true only if all files are contained. This function return true when any of files is in the dataset.

Ok, renamed to contains_any.

mohammad-alisafaee · 2019-12-20T09:20:59Z

tests/cli/test_datasets.py

+
+    monkeypatch.setattr(requests, 'get', get)
+    dataset = client.load_dataset('my-dataset')
+    o = client._add_from_url(dataset, url, client.path)


Is there a way to avoid calling a non-public method in this test?

That client abstraction should really be fixed (IMHO), cause isolating pieces in testable units is almost impossible without resorting to such calls.

I don't get why you removed the cli call to add files. I thought perhaps it was because of the requests mocking that something wasn't working in between.
Perhaps it's better to remove this test and have it only as integration test because it is mocking an implementation detail and it is calling a protected method.

I din't removed anything, that test the way it was written it was integration test and it was not marked as such - I just marked it as an integration test and wrote the actual unit test for the part of what that test is suppose to be testing since @rokroskar requested to have it.

I would also be fine just with having it as an integration test, tbh.

mohammad-alisafaee · 2019-12-20T09:21:26Z

tests/cli/test_integration_datasets.py

+
+    with client.with_dataset('my-dataset') as dataset:
+        file_ = dataset.files[0]
+        assert file_.url == 'https://example.com/index.html'


Ah right, it should be other way around - thanks!

Let's do not start the discussion about expected and actual ;).

mohammad-alisafaee · 2019-12-20T09:27:31Z

tests/service/test_dataset_views.py

+
+    assert {'result'} == set(response.json.keys())
+    assert {'dataset_name', 'files',
+            'project_id'} == set(response.json['result'].keys())


This might fail depending on the order. Consider using sorted(...).

Since when does order matter in a set? 😄

Try this in your interpreter:

print({1,2,3} == {2,3,1})

Oops! Technically, the order in sets matters! :)

rokroskar · 2019-12-20T09:38:07Z

fixes: #877

I think you mean #876?

jsam · 2019-12-20T10:09:29Z

fixes: #877

I think you mean #876?

It was a typo, sorry!

jsam

Oki, thanks for the review. I will look into it.

mohammad-alisafaee

LGTM. Thanks!

jsam added the needs ✋ testing label Dec 18, 2019

jsam added 4 commits December 18, 2019 12:40

fix: service dataset.create with additional metadata

ee1277a

feat: add project files to datasets from service

341ed22

Merge branch 'master' into 769_project_files0

61e78c4

test: test_add_removes_credentials marked as integration

2f76b4d

jsam removed the needs ✋ testing label Dec 19, 2019

jsam marked this pull request as ready for review December 19, 2019 17:04

jsam requested a review from a team as a code owner December 19, 2019 17:04

jsam added 2 commits December 19, 2019 18:54

fix: bugfixes related to dataset create and add

13d7e85

Merge branch '769_project_files0' of github.com:SwissDataScienceCente…

be1e125

…r/renku-python into 769_project_files0

mohammad-alisafaee reviewed Dec 20, 2019

View reviewed changes

jsam commented Dec 20, 2019

View reviewed changes

jsam added needs ✋ testing kind/bug labels Dec 22, 2019

fix: raise dataset file exists

2f4f2b5

jsam removed kind/bug needs ✋ testing labels Jan 6, 2020

mohammad-alisafaee approved these changes Jan 6, 2020

View reviewed changes

Merge branch 'master' into 769_project_files0

5d9b02f

jsam merged commit faf4c9a into master Jan 6, 2020

jsam deleted the 769_project_files0 branch January 6, 2020 13:28

mohammad-alisafaee mentioned this pull request Jan 16, 2020

Dataset add overrides existing files in a dataset #884

Closed

feat: add project files to datasets from service #877

feat: add project files to datasets from service #877

Uh oh!

Conversation

jsam commented Dec 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mohammad-alisafaee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsam Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsam Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohammad-alisafaee Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsam Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohammad-alisafaee Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohammad-alisafaee Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsam Dec 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsam commented Dec 18, 2019 •

edited

Loading

jsam Dec 20, 2019 •

edited

Loading

jsam Dec 20, 2019 •

edited

Loading

mohammad-alisafaee Dec 20, 2019 •

edited

Loading

jsam Dec 20, 2019 •

edited

Loading

mohammad-alisafaee Dec 20, 2019 •

edited

Loading

mohammad-alisafaee Dec 20, 2019 •

edited

Loading

jsam Dec 20, 2019 •

edited

Loading