Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): add dataset entries to renku log #2633

Merged
merged 20 commits into from Feb 18, 2022

Conversation

Panaetius
Copy link
Member

@Panaetius Panaetius commented Feb 7, 2022

Adds dataset entries to renku log.

Closes #2406
Closes #2576

For testing clone/fork https://dev.renku.ch/gitlab/ralf.grubenmann/test_renku_log

Then

$ renku log  # show all log entries
$ renku log -d  # show dataset log entries
$ renku log -w  # show workflow/activity log entries
$ renku log --format detailed  # default output
$ renku log --format tabular  # short/tabular output
$ renku log --format json  # JSON output

Output looks like
image

@Panaetius Panaetius force-pushed the feature/2406-datasets-renku-log branch 2 times, most recently from e37be04 to 0ea47a6 Compare February 8, 2022 07:29
@Panaetius Panaetius marked this pull request as ready for review February 8, 2022 07:29
@Panaetius Panaetius requested a review from a team as a code owner February 8, 2022 07:29
@Panaetius Panaetius requested a review from a team February 8, 2022 08:45
@Panaetius Panaetius force-pushed the feature/2406-datasets-renku-log branch from 0ea47a6 to 1b23ffb Compare February 8, 2022 12:12
@cmdoret
Copy link
Member

cmdoret commented Feb 8, 2022

Great ! I have played a bit with the different options and formats, it looks good :).

Two comments:

  • When I use renku dataset rm dataset-name, the dataset is correctly removed, but I don't see the action in renku log. Since it shows Dataset created events, wouldn't it make sense to also show Dataset removed events ?
  • This is a minor aesthetic issue, but the color of event descriptions in the default format is not easily readable with the default terminal colors on renkulab (see screenshot below).
    Screenshot from 2022-02-08 18-56-13

@Panaetius
Copy link
Member Author

Panaetius commented Feb 9, 2022

@cmdoret Right now it's set to yellow which on many systems is more of an orange (on mine it's the same as git's color)
This is the color palette I have:
image
But so yellow in the interactive session is something that's hard to see, we might want to change that? In any case, I think I can use RGB values directly, I'll give that a try.

Deleted datasets are a bit complicated. I have code in place that would show them. But once you delete a dataset, we remove all references to it(though the metadata file itself still exists on disk). So it's there, but not accessible from code at the moment. We could add a DB index that tracks deleted datasets and then use that in renku log, but I wasn't sure if that's actually wanted. There have been some unrelated discussions in the past that deleted things shouldn't show up anymore, at all. Which I'm personally not in favor of, especially for this command. But I'm still debating if it makes sense to add this here or in a separate issue/PR. Ok so we do have deleted datasets, but they're named in a bit misleading way. I'll add this and rename the respective part in code.

@Panaetius
Copy link
Member Author

Deleted entries now show up.

I've also added a new file, https://github.com/SwissDataScienceCenter/renku-python/blob/f2d55bcf7692cb891e5b80de4662639056d14994/renku/cli/utils/color.py that defines the terminal color palette we use. Except for yellow it just uses the default (user-defined) colors, but we're free to come up with our own color scheme and replace the colors with our RGB values (maybe ask our designer?), and changing it in this file will change it everywhere. But the yellow should now be readable on both light and dark backgrounds.

@cmdoret
Copy link
Member

cmdoret commented Feb 9, 2022

Thanks! deleted datasets show as expected and it looks much better on renkulab, perfectly readable. 👌

EDIT: the new renku log command seems to crash when attempting to show datasets after changing a dataset's title

Steps to reproduce:

renku dataset edit -t 'demodata' clopidogrel_therapy_foll
renku log

Traceback

Traceback (most recent call last):
  File "[...]/inject/__init__.py", line 342, in injection_wrapper
    return sync_func(*args, **kwargs)
  File "[...]/renku/core/commands/log.py", line 69, in _log
    log_entries.extend(LogViewModel.from_dataset(d) for d in datasets)
  File "[...]/renku/core/commands/log.py", line 69, in <genexpr>
    log_entries.extend(LogViewModel.from_dataset(d) for d in datasets)
  File "[...]/renku/core/commands/view_model/log.py", line 232, in from_dataset
    previous_dataset and sorted(dataset.keywords) != sorted(previous_dataset.keywords)
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]/renku/cli/exception_handler.py", line 132, in main
    result = super().main(*args, **kwargs)
  File "[...]/renku/cli/exception_handler.py", line 96, in main
    return super().main(*args, **kwargs)
  File "[...]/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "[...]/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "[...]/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "[...]/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "[...]/renku/cli/log.py", line 205, in log
    result = log_command().with_database().build().execute(workflows_only=workflows, datasets_only=datasets).output
  File "[...]/renku/core/management/command_builder/command.py", line 242, in execute
    output = context["click_context"].invoke(self._operation, *args, **kwargs)
  File "[...]/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "[...]/inject/__init__.py", line 344, in injection_wrapper
    raise ConstructorTypeError(func, previous_error)
inject.ConstructorTypeError: <function _log at 0x7f18f82ac940> raised an error: 'NoneType' object is not iterable

@Panaetius
Copy link
Member Author

the new renku log command seems to crash when attempting to show datasets after changing a dataset's title

It seems to be due to importing a dataset with no keywords setting them to None instead of an empty list. Should be fixed now.

@cmdoret
Copy link
Member

cmdoret commented Feb 9, 2022

Everything seems to work now ! Looks good to me :)

cmdoret
cmdoret previously approved these changes Feb 9, 2022
Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rokroskar
Copy link
Member

Thanks @Panaetius this is looking great!

A couple of comments:

  • --format detailed doesn't seem to do much? (ok I see now it's the default...)
  • the default output is not a table so the column flag doesn't really make sense
  • I guess all these modifications are still linked to a git sha - would it be useful to show it somewhere? Or asked differently, in the case of datasets, what can I do with this information? I see that it has been modified a number of times but I have no way to recover a specific state or to revert a change etc.

@Panaetius
Copy link
Member Author

Thanks @Panaetius this is looking great!

A couple of comments:

* `--format detailed` doesn't seem to do much? (ok I see now it's the default...)

Yes it's the default (I like it more than the tabular output)

* the default output is not a table so the column flag doesn't really make sense

Well it works with the tabular output like other methods we have.

* I guess all these modifications are still linked to a git sha - would it be useful to show it somewhere? Or asked differently, in the case of datasets, what can I do with this information? I see that it has been modified a number of times but I have no way to recover a specific state or to revert a change etc.

No none of it works with a git sha and it would even work if changes weren't committed, only written to disk. it purely uses our metadata and git is not involved.

@rokroskar
Copy link
Member

I know none of it is using git to generate this, but each of those modifications was made and committed using git in the end. And that might be useful information to have. If I then want to tag a version of the project, for example, I'll still need to go to git log and decipher what the renku commands there mean.

@rokroskar
Copy link
Member

Yes it's the default (I like it more than the tabular output)

Can't we say in the help text that it's the default at least? I assumed I wasn't looking at the detailed output and was surprised that the flag didn't do anything

@Panaetius
Copy link
Member Author

Panaetius commented Feb 9, 2022

I know none of it is using git to generate this, but each of those modifications was made and committed using git in the end. And that might be useful information to have. If I then want to tag a version of the project, for example, I'll still need to go to git log and decipher what the renku commands there mean.

Theoretically there can be more than one change in a commit, like a migration adding a dataset to the DB and then making a derivation from it in a single commit. And in the past we thought about allowing to run commands without a commit and then committing when you're happy with it.

We could just from the database id get the path to the file it maps to and check what the last commit was that if was changed in. Though I'm a bit scared of the performance of that. Maybe with a dedicated flag in a follow up issue?

Yes it's the default (I like it more than the tabular output)

Can't we say in the help text that it's the default at least? I assumed I wasn't looking at the detailed output and was surprised that the flag didn't do anything

Yes I can do that

vigsterkr
vigsterkr previously approved these changes Feb 10, 2022
Copy link
Contributor

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@Panaetius Panaetius dismissed stale reviews from vigsterkr and cmdoret via d4675f0 February 10, 2022 13:55
@Panaetius Panaetius enabled auto-merge (squash) February 11, 2022 13:32
@Panaetius Panaetius merged commit f92fbac into develop Feb 18, 2022
@Panaetius Panaetius deleted the feature/2406-datasets-renku-log branch February 18, 2022 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong description for log subcommand in renku --help Add dataset commands to renku log output
4 participants