Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency of read() and write() functions. #14608

Closed
samoconnor opened this issue Jan 8, 2016 · 16 comments
Closed

Consistency of read() and write() functions. #14608

samoconnor opened this issue Jan 8, 2016 · 16 comments

Comments

@samoconnor
Copy link
Contributor

Below is review of current write an read variants in Base.

I propose:

  • deprecate readbytes and readall, replace with read and readstring respectively
  • add filename-as-1st-arg support everywhere

Rationale:

  • "bytes" is the default, write is not writebytes so readbytes should just be read.
  • readall is named as "read_[how much]" but behaves like "read[as type]_" so it should be readstring.
  • filename-as-1st-arg is generally handy, see examples in write(filename::AbstractString, data) #14546

Deprecation and addition indicated by strikethrough and bold below.
Filename = has read(filename::AbstractString...) method.

Function Filename
write
write_[format]_
writecsv yes
writedlm yes
writemime
Function Type Filename Whole File Non Blocking
read_[how much]_
read(io,T) T
readavailable Array{UInt8} yes
readuntil String
readline String
readall String yes yes
read_[as type]_
readbytes Array{UInt8} yes
read Array{UInt8} yes
readstring String yes yes
readlines Array{String} yes
readcsv Array{T} yes yes
readdlm Array{T} yes yes
readdir Array{String} yes yes
readlink String yes yes
read_[and then]_
readchomp(x) = chomp(readall(x)) yes yes

Related:

@nalimilan
Copy link
Member

Thanks. I wonder whether readall should even be changed to read(String, ...).

@JeffBezanson
Copy link
Sponsor Member

Thanks @samoconnor, this is a very good design review.

I still kind of like the idea of tofile("a.dat",write)(data), since it doesn't require multiple functions to have whole-file methods.

@samoconnor
Copy link
Contributor Author

@JeffBezanson, tofile is intriguing, but I'm not sure that I completely follow your intention. Are you suggesting that writecsv(filename, data) would be deprecated in favour of tofile(filename, writecsv)(data)? Also why not tofile(filename, write, data)? What is the advantage of the intermediate anonymous function?

@samoconnor
Copy link
Contributor Author

Following up on what I proposed above, the new methods would be...

      read(io::IO) = readbytes(io)
readstring(io::IO) = readall(io)

     write(filename::AbstractString, d...) = open(io->write(io, d...),     filename, "w")

     read!(filename::AbstractString, a)    = open(io->read!(io, a),        filename)
      read(filename::AbstractString, t...) = open(io->read(io, t...),      filename)
 readuntil(filename::AbstractString, c...) = open(io->readuntil(io, c...), filename)
  readline(filename::AbstractString)       = open(readline,                filename)
 readlines(filename::AbstractString)       = open(readlines,               filename)
readstring(filename::AbstractString)       = open(readstring,              filename)

Unless anyone objects, I'll close #14546 and open a new PR with these methods.

@bicycle1885
Copy link
Member

write(filename...) would be handy, but I bet people will accidentally make and/or overwrite files without noticing it.

@JeffBezanson
Copy link
Sponsor Member

Yes tofile(filename, write, data) would be just as good; for some reason my first instinct was that currying would be useful but perhaps it's not.

The "core" output functions seem to be those that write data to a stream at the current position. From there, I often want to either write a whole file in one go (like you), or get the output as a string or byte vector. I feel like there should be a standard way to add those behaviors, so that you don't need to rely on every I/O function adding a "whole file" method. open and sprint do this for files and strings, but my gripes with them are (1) calling them is too verbose, (2) the names aren't great, particularly sprint, (3) they don't generalize e.g. to getting a byte vector.

writecsv is weird because you almost always want to write a whole file. But ideally yes I would replace it with something more general. One might even want an abstraction that describes the format of the file, e.g. DelimFile(name, delim='\t', eol="\n") etc.

@samoconnor
Copy link
Contributor Author

I think the abstractions are great as a tool for keeping the API implementation simple and efficient, but I think the API should have concrete functions that let you tell the computer to do what you want it to. writecsv is built on top of writedlm, but I'm sure that for a bunch of people who want to get their array across into Excel, they don't need to know or care about that.

As it stands if I say "Computer, put my data in a .CSV file", it says "done!",
but if I say "... put my data in /tmp/foo" it says "Well, actually it would suit me better if you opened a handle that points to /tmp/foo first, then ask me to write your data to the handle".
Sometimes it is just the API designer's job to add the little convenience methods that make the behaviour consistent across functions. .... what's that? oh, the computer is interrupting me to say that with that thing it was saying before about the file handle, it wants to remind me to close it after I'm done. Damn computer!

Another thing, it occurs to me that part of the unnecisary verboseness of open(io->write(io, data), filename, "w") is the "w" part. Maybe: the default should be "a+" and there should be create([f], filename) = open([f], filename, "w+") for the common case where the truncate is wanted. Arrays and Dicts don't have a read/write permission scheme getting in the way of ease of use, why files?

@samoconnor
Copy link
Contributor Author

What about merging tofile and create...

create(filename::String) = open(filename, "w+")
create(f::Function, filename::String) = open(f, filename, "w+")
create(filename::String, f::Function, data...) = create(io->f(io, data...), filename)
write(filename::String, data...) = create(filename, write, data...)

create("/tmp/foo.csv", writecsv, results)
create("/tmp/foo.png", write, plot(results))

But I think I still prefer to do...

writecsv("/tmp/foo.csv", results)
write("/tmp/foo.png", plot(results))

@JeffBezanson
Copy link
Sponsor Member

All good points. I agree the "w" feels unnecessary, and to that I'd add the io-> and io, parts, since the first argument almost always has the form io->f(io, xxx).

I agree a certain set of convenience functions is unavoidable. I'm just a bit leery of expecting every I/O function f to also define

f(filename::AbstractString, x...) = open(io->f(io, x...),      filename)

@StefanKarpinski
Copy link
Sponsor Member

If we had some sort of scoped open/close construct this would be sufficiently concise that it might not even be necessary to define the f(filename::AbstractString, x...) = open(io->f(io, x...), filename) method for everything since you could just do f(open(filename), x...) and the close of the file would be automatically inserted upon exiting the scope. Of course, we'd need some kind of syntactic indication that this is happening.

samoconnor added a commit to samoconnor/julia that referenced this issue Jan 12, 2016
samoconnor added a commit to samoconnor/julia that referenced this issue Jan 12, 2016
replace readall() with readstring()
replace readbytes() with read()

readbytes! returns nb whereas read! returns array, so they can't
trivially be merged.

Fix test/replutil.jl test that matches output of method error message
that has changed due to introduction of readbytes(::AbstractString).
@nalimilan
Copy link
Member

Any opinions about whether readall(x) (readstring(x) in #14660) could simply be read(x, AbstractString) (and soon read(x, String) after Stefan's rework)?

This would give a much more consistent API. More generally, this would mean read(x) is just calling the method read(x, type=Vector{UInt8}) (replacement for readbytes as in #14660). We could also allow e.g. read(x, Vector{Int}) at some point if there's a need for it.

@tkelman
Copy link
Contributor

tkelman commented Jan 13, 2016

Many of the specify-an-output-type API's take the type first, or maybe second after the output array for in-place methods? We've got a few possibly-overlapping conventions for first arguments.

@nalimilan
Copy link
Member

Indeed, this is similar to the discussion at #14412. Given the conclusion there, I would say it's better to keep the output type as second argument here too.

samoconnor added a commit to samoconnor/julia that referenced this issue Jan 13, 2016
replace readall() with readstring()
replace readbytes() with read()

readbytes! returns nb whereas read! returns array, so they can't
trivially be merged.

Fix test/replutil.jl test that matches output of method error message
that has changed due to introduction of readbytes(::AbstractString).

add: +eachline(filename::AbstractString) = EachLine(open(stream), close)
@johnmyleswhite
Copy link
Member

I like the idea of specifying the output type as a type rather than encoding it in the function names.

samoconnor added a commit to samoconnor/julia that referenced this issue Jan 14, 2016
replace readall() with readstring()
replace readbytes() with read()

readbytes! returns nb whereas read! returns array, so they can't
trivially be merged.

Fix test/replutil.jl test that matches output of method error message
that has changed due to introduction of readbytes(::AbstractString).

add: +eachline(filename::AbstractString) = EachLine(open(stream), close)
samoconnor added a commit to samoconnor/julia that referenced this issue Jan 15, 2016
replace readall() with readstring()
replace readbytes() with read()

readbytes! returns nb whereas read! returns array, so they can't
trivially be merged.

Fix test/replutil.jl test that matches output of method error message
that has changed due to introduction of readbytes(::AbstractString).

add: +eachline(filename::AbstractString) = EachLine(open(stream), close)

revert mistaken docstring change -- read! vs readbytes!

update base/docs/helpdb/Base.jl

update rst docs, BBEditTextWrangler-julia.plist

replace missing chomp() in getmetabranch()

doc tweaks

make && make julia-genstdlib

more doc tweaks

whitespace tweak

doc: read!(stream or filename, ...)
JeffBezanson added a commit that referenced this issue Jan 20, 2016
Consistency of read() and write() functions (per #14608)
@samoconnor
Copy link
Contributor Author

Now that #14699 and #14660 are merged, most of what this issue addresses is resolved.

A quick scan of the comment history yields the following loose ends...

I'll leave it to others to create new issues for the other items as they see fit.

@nalimilan
Copy link
Member

@nalimilan suggested readstring could be read(x, String) "after Stefan's rework".

Actually, I think it can be changed immediately to read(x, UTF8String/ASCIIString). It's just that it will be even shorter once we have a default type named String.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants