Skip to content

Add filenames return for ddf.to_csv and bag.to_textfiles as they both…#2655

Merged
mrocklin merged 4 commits intodask:masterfrom
asettouf:csv_hdf_paths_#1622
Sep 6, 2017
Merged

Add filenames return for ddf.to_csv and bag.to_textfiles as they both…#2655
mrocklin merged 4 commits intodask:masterfrom
asettouf:csv_hdf_paths_#1622

Conversation

@asettouf
Copy link
Copy Markdown
Contributor

@asettouf asettouf commented Sep 4, 2017

Partial commit for #1622, to enable returning filenames I modified bytes.core.write_bytes, which broke bag.to_textfiles which was "fortunate" as it also allowed to have this method returning the filenames after creation.

If my PR is fine, I will work to integrate hdf filenames return.

Thanks in advance for the review

@martindurant
Copy link
Copy Markdown
Member

Seems like a reasonable thing to do, although extracting the filenames isn't at all hard.
Is it weird that you get filenames if compute=True, but only delayed values if not? Probably changing that behaviour would impact many places.
Is write_bytes not referenced in other places?

@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented Sep 5, 2017

@martindurant it doesn't seem easy to include names in the outputs of write_block_to_file. Are you suggesting that we don't accept this PR in order to maintain consistency? Do you have a fix in mind? Or is this just a comment?

It looks like this fails the flake8 style check (see travis-ci tests). There is also a dangling print statement in there.

@martindurant
Copy link
Copy Markdown
Member

I was meant only as a comment, I think this can be useful, but we have to accept the slight inconsistency.

@asettouf
Copy link
Copy Markdown
Contributor Author

asettouf commented Sep 5, 2017

@martindurant

To address the issues you mention:

  • In my understanding, delayed are used as node in the acyclic graph and will be computed at a later time, which leads me to think that it would seem more appealing to return the filenames when actually computed

  • To my surprise, write_bytes is not referenced anywhere else I could find. Given the slight inconsistency I can modify the name of the function to something more adequate, I just thought it would be nicer to have those 2 parts of the logic go together, as it is somewhat the case (names are computed here, of course computing them directly in the to_csv body wouldn't be hard)

@mrocklin

Actually I did not have any idea about flake8, now everything should be in order. Tests and flake8 did not report any errors.

By the way, after the tests what does xfail means exactly?

@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented Sep 5, 2017

The closest docs we have for write_bytes are here: http://dask.pydata.org/en/latest/bytes.html although I see that we don't include data writing there. This should probably be extended in the future. cc'ing @martindurant , although he may be busy.

Similarly we should extend the developer docs here http://dask.pydata.org/en/latest/develop.html to include the warning about flake8. I'll add a note for this soon.

@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented Sep 5, 2017

By the way, after the tests what does xfail means exactly?

A test marked with xfail is allowed to fail. This is often because we don't fully support this functionality yet.

@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented Sep 5, 2017

This PR looks good to me. Any further comments @martindurant ?

@martindurant
Copy link
Copy Markdown
Member

LGTM. Note to self to check into write_bytes.
Are there any other to_* functions that should be considered?

@mrocklin mrocklin merged commit adf37b9 into dask:master Sep 6, 2017
@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented Sep 6, 2017

This is in . Thanks @asettouf !

@asettouf
Copy link
Copy Markdown
Contributor Author

asettouf commented Sep 6, 2017

@mrocklin My pleasure. By the way, in that case, should I close the #1622 issue or let someone else handle it?

fujiisoup pushed a commit to fujiisoup/dask that referenced this pull request Feb 3, 2018
dask#2655)

* Add filenames return for ddf.to_csv and bag.to_textfiles as they both use bytes/core.write_bytes function

* Fix flake8 warning, wrong os.remove in test_csv

* Fix flake8 no newline at end of file

* Add filenames return to to_hdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants