Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP feat(datasets): added dataset ls-files, unlink and delete commands #436

Closed
wants to merge 1 commit into from
Closed

Conversation

jsam
Copy link
Contributor

@jsam jsam commented Mar 1, 2019

DO NOT MERGE

This PR adds new commands for dataset:

  • ls-files
  • unlink
  • delete

Fixes #418

@jsam jsam requested a review from a team as a code owner March 1, 2019 08:40
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/models/datasets.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/models/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
renku/api/datasets.py Outdated Show resolved Hide resolved
files_in_dataset = [(
key,
os.path.realpath(
self.renku_datasets_path / dataset.name / dataset_file.path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.renku_datasets_path / dataset.name / dataset_file.path
self.renku_datasets_path / dataset.identifier / dataset_file.path

renku/cli/dataset.py Outdated Show resolved Hide resolved
meta_file = dataset_dir_path / self.METADATA
meta_file.unlink()

def unlink_files(self, dataset, pattern, exclude=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def unlink_files(self, dataset, pattern, exclude=None):
def unlink_files(self, dataset, include=None, exclude=None):


def _match(file):
filename = Path(file[1]).name
result = fnmatch(filename, pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = fnmatch(filename, pattern)
result = fnmatch(filename, include) if include is not None else result

renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/cli/dataset.py Outdated Show resolved Hide resolved
renku/errors.py Outdated Show resolved Hide resolved
renku/errors.py Outdated Show resolved Hide resolved
renku/errors.py Outdated Show resolved Hide resolved
@click.option('--verbose', '-v', is_flag=True)
@pass_local_client(clean=True, commit=True)
def delete(client, name, force, verbose):
"""Deletes a dataset."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Deletes a dataset."""
"""Delete a dataset metadata."""

@@ -46,11 +47,19 @@ class DatasetsApiMixin(object):
DATASETS = 'datasets'
"""Directory for storing dataset metadata in Renku."""

REFS_DATASETS = 'refs'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need it? the LinkReference object should provide you with enough abstraction.

if not force and dataset.files:
raise errors.DatasetNotEmpty()

dataset_dir_path = self.renku_datasets_path / dataset.identifier.hex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might need to iterate over all refs/datasets since there might be more aliases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need a another renku doctor check that find broken refs

"""Find datasets which contain given dataset file.

:param client: LocalClient instance.
:param dataset_file: DatasetFile instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it would be nicer to filter by path.

@@ -54,3 +54,20 @@ def check_missing_files(client):
)

return False


def file_in_datasets(client, dataset_file, exclude_dataset=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider building a map: path -> [dataset, ...]

if exclude_dataset and \
exclude_dataset.identifier == dataset.identifier:
continue
if dataset_file in dataset.files.values():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dataset_file in dataset.files.values():
if dataset_file.path in dataset.files:

meta_file.unlink()
dataset_dir_path.rmdir()

dataset_ref = self.renku_refs_path / Path(dataset.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset_ref = self.renku_refs_path / Path(dataset.name)

dataset_dir_path.rmdir()

dataset_ref = self.renku_refs_path / Path(dataset.name)
if dataset_ref.is_symlink():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dataset_ref.is_symlink():


dataset_ref = self.renku_refs_path / Path(dataset.name)
if dataset_ref.is_symlink():
dataset_ref.unlink()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset_ref.unlink()

relative_path,
os.path.realpath(
str(
self.renku_datasets_path / dataset.name / dataset_file.path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.renku_datasets_path / dataset.name / dataset_file.path
self.renku_datasets_path / dataset.identifier.hex / dataset_file.path

return found_one

file_unlinked = []
for relative, absolute, file_ in filter(_match, files_in_dataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using filter is not very "Pythonic". the better way is to use a list comprehension.

Suggested change
for relative, absolute, file_ in filter(_match, files_in_dataset):
file_unlinked = [
file_
for file_ in files_in_dataset
if _match(file_)
]

:param include: Remove files matching the include pattern.
:param exclude: Keep files matching the exclude pattern.
"""
files_in_dataset = [(
Copy link
Contributor

@jirikuncar jirikuncar Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files_in_dataset = [
    self.renku_datasets_path / dataset.identifier.hex / file_
    for file_ in dataset.files
]

if not files_in_dataset:
raise errors.ResourceNotFound(resource_type='DatasetFile')

def _match(file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _match(file):
def _match(file_):

filename = Path(file[1]).name
found_one = False
for pattern in include:
found_one = fnmatch(filename, pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
found_one = fnmatch(filename, pattern)
found_one = file_.match(pattern)

@jsam jsam changed the title feat(datasets): added dataset ls-files, unlink and delete commands WIP feat(datasets): added dataset ls-files, unlink and delete commands Mar 9, 2019
@jsam jsam closed this Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants