Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artificact upload for remote instances #45

Open
freemin7 opened this issue Apr 10, 2024 · 2 comments
Open

Artificact upload for remote instances #45

freemin7 opened this issue Apr 10, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@freemin7
Copy link

freemin7 commented Apr 10, 2024

As per https://github.com/JuliaAI/MLFlowClient.jl/blob/a31b41a91dc32107fa7dcd6ad5a581c116a4ff9c/src/loggers.jl#L69-72 we currently can't upload artifacts to remote servers

    Assumes that artifact_uri is mapped to a local directory.
    At the moment, this only works if both MLFlow and the client are running on the same host or they map a directory that leads to the same location over NFS, for example.

Some (untested) code along the lines of:

function logartifact(mlf::GitLabMLFlow, run_id::AbstractString, basefilename::AbstractString, data)
    mlflowrun = getrun(mlf, run_id)
    artifact_uri = mlflowrun.info.artifact_uri
    filepath = joinpath(artifact_uri, basefilename)
    try
        HTTP.post(url, push!(copy(mlf.headers), "Content-Type" => "multipart/form-data" ),
            HTTP.Form(Dict(:file => data)))
    catch e
        error("Unable to upload $(filepath): $e")
    end
    filepath
end

Would probably work for this purpose and not introduce new dependencies.
The question remains how to differentiate between local and remote file access, as they happen through different interfaces (the headers are irrelevant for file access for example).

I see three options to approach this and would be interested in making a pull request implementing one of these and testing them against a local mlflow instance and the gitlab end points to confirm they are working (for me).

  1. Use a different subtype of AbstractMLFlow to denote remote instances. (Probably not preferred)
  2. Keep using a String and just have an if checking for prefixes of "https://" and use HTTP then.
  3. Switch to using URIs and use file:// to denote file access and check the prefix in an if statement.

This is simply a question of taste i would leave to (some of) the maintainers.

@pebeto
Copy link
Member

pebeto commented Apr 15, 2024

Hello,

I think the issue comes from the different forms of uploading to a mlflow instance. There is no dedicated REST endpoint to upload a run artifact (and that's awful).

We can take this page as a reference.

@pebeto pebeto added the enhancement New feature or request label Apr 15, 2024
@freemin7
Copy link
Author

If it is credible to assume to upload procedures might even depend on the service in question and not just on the protocol we would have to go the AbstractMLFlow route.

Is there buy-in for that?

Common functionality depending on URI can be part of the library. Although to prevent pulling in extra dependencies for every protocol under the sun having a separate package for uploads might be a better idea if such a package taking URIs doesn't exist yet. (I am not aware of that existing)

@freemin7 freemin7 reopened this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants