Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple produce_or_load and savename #322

Closed
goerz opened this issue Jan 20, 2022 · 4 comments
Closed

Decouple produce_or_load and savename #322

goerz opened this issue Jan 20, 2022 · 4 comments

Comments

@goerz
Copy link
Contributor

goerz commented Jan 20, 2022

Rationale

There are two situations where it could be nice to circumvent the internal use of savename in produce_or_load and to directly pass it a full file name:

  1. If the filename is already used in the script before the call to produce_or_load with an explicit call to savename, it can make the code slightly cleaner to pass that existing filename to produce_or_load. It would also eliminate the possibility of bugs due to any inconsistency between the two calls.

  2. If for some reason a project does not want to interpolate the key-value pairs from config into the output filename, e.g., if there are two many parameters (see Workflow for managing multiple output files, or files with too many parameters, and metadata #316), it might be difficult to "fight" the internal savename to give the desired filename.

Implementation

In #315 (comment), I proposed implementing an alternative method for produce_or_load. However, on further reflection that was an overly complicated solution. The "problem" is only a single line of code in produce_or_load,

filename = joinpath(path, savename(prefix, c, suffix; kwargs...))

So it seems easiest to slightly tweak the routine, maybe add a parameter to modify just that line.

I would consider the following possibilities:

  1. Add a filename=nothing keyword argument to produce_or_load

    The above line would simply turn into

    if isnothing(filename)
        filename = joinpath(path, savename(prefix, c, suffix; kwargs...))
    end
    

    Thus, if filename is given as a string, it will serve as the output filename directly.

    Advantages:

    • Easy to understand and explicit
    • Matches current documentation on filename being the return value (including the full path)

    Problems:

    • Requires path=""
    • Possibly clashes with suffix: suffix should be set to the file extension of filename
    • Ignores prefix and kwargs.... If these are non-emtpy, a warning should be printed, maybe even an error.
  2. Allow path to include the filename directly. Whether path is a folder or a filename is determined by whether it has an extension.

    if splitext(path) == ""
        filename = joinpath(path, savename(prefix, c, suffix; kwargs...))
    else
        filename = path
    end
    

    Advantages:

    • Does not need any new parameter for produce_or_load
    • Very concise for the user

    Problems:

    • The behavior of path is rather subtle. Users might find this kind of implicit behavior confusing.
    • Possibly clashes with suffix: suffix should be set to the file extension of path
    • Ignores prefix and kwargs.... If these are non-emtpy, a warning should be printed.
  3. Add a new auto_filename=true (or some other name, maybe use_savename, not sure). Use path as the full filename if auto_filename=false

    if auto_filename
        filename = joinpath(path, savename(prefix, c, suffix; kwargs...))
    else
        filename = path
    end
    

    Advantages:

    • Slightly more explicit than option 2

    Problems:

    • Still a little confusing for path to change meaning depending on auto_filename
    • Possibly clashes with suffix: suffix should be set to the file extension of path
    • Ignores prefix and kwargs.... If these are non-emtpy, a warning should be printed.

The above are in order of my preference (number 1 being my most preferred solution).

@Datseris
Copy link
Member

Datseris commented Jan 20, 2022

TL;DR: Yes for adding a filename option, no for everything else.


If I understood everything, you want to turn produce_or_load to:

if !isfile(filename)
    data = f(config)
    save(filename)
end

This is a tremendously simple code snippet. It is only 4 lines of code. I am surprised that such a simple codesnippet would require you to present such a complicated modification to a function which I would say is already complicated. I'm trying to understand your approach: why do you want to make so fundamental modifications to functions when the solution is so simple? Change how the path works, change how the filename works, or having even more complicated ways to set things path like this auto_filename. All of this business is simply too complicated to perform what is 4 lines of code, one of which is just end.

Nevertheless, the argument that "I don't want to use savename because my config would contain too many parameters" is valid. Thankfully it can be addressed using your suggestion while keeping all of the remaining functionality of produce_or_load exactly the same. We add the keyword filename=nothing as you suggested. If it is a ::String, then we use that filename. Importantly, the remaining functionality of produce_or_load must remain the same. You can't just alter every single thing about a function at liberty like that. This means: suffix, prefix are used if given (and a suffix is always given because it has a default value). Path is also used if given. You don't want to give path but include it in the filename? Well, fine. The code will still work if prefix = nothing, path = nothing and filename includes path.

In the end of the day, we save a string generated by joinpath(path, filename). Julia doesn't care if path is in the filename or not, the result would be the same. The presence of the path argument is important for produce_or_load to do what it is supposed to do: integrate with savename and ensure a scientific workflow independent of operating system. That can't change.

@goerz
Copy link
Contributor Author

goerz commented Jan 20, 2022

TL;DR: Yes for adding a filename option, no for everything else.

So you're more or less okay with option 1, which is implemented in #323?

If I understood everything, you want to turn produce_or_load to:

if !isfile(filename)
    data = f(config)
    save(filename)
end

This is a tremendously simple code snippet. It is only 4 lines of code.

Well, no, I still want all the other stuff in produce_or_load:

  • tagged/non-tagged save
  • logging output
  • error handling

Everything except the automatic savename

I am surprised that such a simple codesnippet would require you to present such a complicated modification to a function which I would say is already complicated.

The modification wouldn't be that complicated: a4d3c63, and a lot of that is error handling/warnings, which could be handled differently, see below.

We add the keyword filename=nothing as you suggested. If it is a ::String, then we use that filename. Importantly, the remaining functionality of produce_or_load must remain the same. You can't just alter every single thing about a function at liberty like that.

I agree with that 100%!

This means: suffix, prefix are used if given (and a suffix is always given because it has a default value).

I took care of the default value of the suffix (by using the extension of filename if given): suffix = (isnothing(filename) ? "jld2" : replace(splitext(filename)[2], "."=>""))

But I wasn't sure

@produce_or_load(filename=datadir("out", "run001.jld2"), suffix="xml")

makes sense. It produces an error in my current patch. I'd be okay with the above producing a file run001.jld2.xml, if you prefer. Likewise,

@produce_or_load(filename=datadir("out", "run001.jld2"), prefix="sim_")

right now produces a warning that prefix will be ignored, but I can change that to produce sim_run001.jld2 instead, if you prefer.

How do you want to handle path?

@produce_or_load(path, filename=filename)

Should it ignore path, or use filename=joinpath(path, filename)?

So if you prefer, I can use the new filename option to only replace the savename call, but keep every other behavior the same (path, prefix, suffix). That would be the least intrusive change, and it seems like you might be on board with that.

@Datseris
Copy link
Member

Datseris commented Jan 20, 2022

Well, no, I still want all the other stuff in produce_or_load:

You need to make a fundamental distinction here. produce_and_load doesn't give you tagging, nor error handling, tagsave does. So, I need to replace my originally 4 lines with the next 5:

if !isfile(filename)
    @info "producing $filename # this is your logging. 1 extra line.
    data = f(config)
    @tagsave(filename, data)
end

and now I'm almost sure you get what you asked for.


How do you want to handle path?

You don't. produce_or_load can be called without path. path is an optional first positional argument. If in a documentation string you see something like f([x,] y, z), then x is optional.


So if you prefer, I can use the new filename option to only replace the savename call, but keep every other behavior the same (path, prefix, suffix). That would be the least intrusive change, and it seems like you might be on board with that

That was the original suggestion, yes. Just replace the call to savename with filename. If path is given, it is still merged with filename. Same with prefix, suffix and everything else. suffix is always given by default, so it doesn't make sense for you to give "file.jld2" as the file name. Just "file". By default, both path and prefix are "" by default. so your specific use case is totally working without any change.

Actually, good design can make it so you don't have to even replace the savename call. Define a new keyword argument, filename = savename(prefix, config, suffix) and merge filename with path. With the current defaults, you get exactly what you want, while all other functionality of produce_or_load stays exactly as it was. With this approach however, you would have to explicitly provide the suffix into filename. That is okay for me, if not even preferred.

I'm convinced that what I suggested here is a clear improvement for produce_or_load, that satisfactorily addresses the very real problem of "my config has too many parameters to be expanded into a name".

goerz added a commit to goerz-forks/DrWatson.jl that referenced this issue Jan 20, 2022
@goerz
Copy link
Contributor Author

goerz commented Jan 20, 2022

Actually, good design can make it so you don't have to even replace the savename call. Define a new keyword argument, filename = savename(prefix, config, suffix) and merge filename with path. With the current defaults, you get exactly what you want, while all other functionality of produce_or_load stays exactly as it was.

Yes, that works perfectly and does exactly what I want.

The one tiny caveat is the kwargs... should also be included (filename = savename(prefix, config, suffix; kwargs...)). Unfortunately, Julia doesn't allow that syntax in the function signature, so I had to do it in the function body, adding one extra line.

With this approach however, you would have to explicitly provide the suffix into filename. That is okay for me, if not even preferred.

Indeed, that would be my preference as well

I'm convinced that what I suggested here is a clear improvement for produce_or_load

I agree! :-)

goerz added a commit to goerz-forks/DrWatson.jl that referenced this issue Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants