New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Better artifact preview + use well-known file suffixes for uploaded artifacts for a better preview #418
Comments
Hi @idantene , That's a nice idea- we'll add that to the list 🙂 |
Actually it does, but it looks at the type of the object it serializes (e.g. numpy, Image etc.) then decides on the correct preview, we could probably extend it if uploading a single file (i.e. prefix based)
Actually this is exactly what you will get if you upload the Objects rather then the serialized files (e.g. pass the numpy object instead of .npz), the upload_artifact will pick the best prefix for the object and will also create a preview based on the content. WDYT? |
You are correct, I was actually pointing to the fact that, in your code, when you are actually creating the object itself, from anywhere inside the repo, you can always upload it (notice the upload itself happens in a background process)
I'm not sure I agree. When I pass an object, it means the function is responsible for serializing, when I'm providing a file, I just want it uploaded (i.e. the content of the file in opaque to the function).
Yes that's exactly why in most cases the artifact is uploaded to a server (e.g. the clearml files server, S3 bucket etc.) this would make the URL link clickable.
Sorry, I meant passing a folder/list-of-files , then the upload_artifact function zips them for you (and updates the preview)
Correct the preview is limited to text, anything else is Actually the artifact itself stored on the remote server, this actually connect with my point on remote servers. It seems that in your case you are storing the artifacts on a shared folder, this means the data is not available in the browser (including serving the images). A more general note here, the distinction I'm trying to make here is that it seems that parsing the Content of a file seems too invasive (and might slow things down). We were actually trying to provide two level of access, files as blackbox upload, and objects as a more opinionated interface with visibility into the content, |
I can see where that assumption is coming from, but we don't always create the object itself and in my specific case, we don't have an intermediate object that's compatible with ClearML. Specifically, we use
If the ClearML SDK can serialize an object, surely it can be extended to also attempt deserialization (for preview-specific purposes)..? I meant that for the user point of view, the preview should not be affected whether you pass an object or a path to a file. Obviously the serialization itself (and who is responsible for it) is a different question, and I very much agree with you on that. I understand the points for making things as they are -- and perhaps the decision to deserialize (for preview purposes) should be left to the user. In our use case, we have our own on-premise hardware and we'd like to offer minimal setup to new team members. Settings up e.g. minio to mimic S3 just means more redundant environment variables (secret keys, bucket names, etc), where using our NVME drives is much simpler. I think a bigger picture question is why limit the preview to text only? What if I want to control the serialization but still have a proper preview (e.g. have low-level control of the compression level for zipped files, the DPI used for pngs, etc)? Why does ClearML limit me in those cases? Would it be instead possible to offer e.g. |
But later you mention the debug samples pointing to the
Following on the previous point, if you have the
Let me try to explain, if we need to store an image for example, this means someone needs to serve it, which by default is the files server... I'm sure you see my point :)
Custom preview is fully supported, as long as this is text. You can provide it when uploading the artifact: task.upload_artifact(name, artifact_object, meta={'key': 'value'}, preview='lots of stuff\n on the content here`)
Are you suggesting we add an additional flag to |
Maybe I'm missing something from my setup. For the Debug Samples I'm simply using the Our setup (or well, the setup I'm trying to build for my team) is to use the local file storage and minimal setup steps on their end. We like to use the terminal for simple tasks and it's extremely convenient for us to have e.g. I still don't understand why the preview is limited to text only?
I've noticed that as well (and the documentation could use some cleaning in this respect, as it mentions
Personally, I'm not a fan of boolean arguments unless absolutely necessary, so I suggest a different method for that with a similar flow:
I don't mind providing a code example, but the question still remains about constraining the preview to a textual one. It would be incredibly helpful to have added support there for non-textual previews. |
Are you referring to default_output_uri, this is Not caching, this is the upload destination of all artifacts and models. BTW if you set
I'm intrigued! Could you elaborate? What is the use case, and why UI is not a good interface (or the python API) ?
Let's assume an image is stored as preview, where is it stored? |
Yes, I was referring to that. I may have misconfigured it in that case, but I do prefer to use the local storage over minio for the reasons mentioned earlier.
Sure. Most of us are quite tech-savvy and we're more than happy to use the terminal over any UI. It's faster and more efficient for us, especially when it comes to moving files around. It's not impossible of course to use UI or SDK -- it's just slower for us.
I understand how storing and serving an image works - this still doesn't answer my question. Goes without saying that for our use case, this would indeed mean storing the image twice (once on the local storage as we'd like, and once more with the files-server for preview purposes). For this edge case, this can, of course, be made very clear to the user (even with e.g. assuming that by default such duplicity is not allowed, but it can be tweaked via the configuration file). |
Currently the artifact object itself only contains a text preview, because it is stored with all the rest of the artifact's properties inside the back-end DB . If there are enough use cases, we could add a specific field for additional preview link (for example, pointing to an external image) To summarize the feature request, let's add an additional flag to |
Yes, the image file extensions. At least
This should perhaps then be a different feature request (would you like me to open one?). In any case, the idea would be to avoid the link altogether, and instead replace the text-only preview with a smarter text/image/media preview, depending on the artifact being displayed. |
Looks like this is really something that bothers you 😄 Which bring me to your second point, the main reason this issue seems to resonant with you, is because the artifacts are stored on a local file system, again if the artifact is stored on the cleaml files-server, which is an http file server, there is no need for preview, the artifact link is the preview, the UI just needs to present it (as opposed to put a link and let the user click on the link to actually see the artifact in the browser. e.g. the image). |
Actually, this only happens "naturally" for Debug Samples when they are in a location able to serve them(e.g. the ClearML fileserver). Would it, then, be fair to summarize your suggestion, that when an artifact is an image (obviously, can later be extended to other types the UI can render) AND its location is accessible to the UI (e.g. an HTTP/S URI), that the UI should simply show the resource instead of listing its metadata (i.e. location, size, hash and textual preview)? I'd think we might want to make that a configurable choice since for some use cases such default behaviour could be problematic... |
Sure 👍🏻 The location of artifacts is not something I brought up and I don't consider it part of the request 🤷🏻♂️ (IMO it is transparent wrt this feature request). |
@idantene The artifacts location is only relevant to the extent, as @bmartinn pointed out earlier, that the webapp running on a browser cannot automatically access locally stored files for security concerns. This means that any way we take this request forward will only have value when artifacts are stored in a file serving network location (locally running file server included). |
It would be great if in
task.upload_artifact
, the ClearML agent could look at the file suffix and identify if it's one of the supported objects (e.g. suffixesnpz
,png
,jpg
,json
,csv
,csv.gz
, etc). If it is, it could automatically generate a preview.For example, for image files it could load the image with
Im.load(...)
, forcsv[.gz]
it could load it withpandas
(and limit number of rows for preview and efficiency), etc. Of course, if it fails - it could still show the same "preview" it has now.EDIT: Similarly, it would be quite a nice feature if images in the artifacts section could also be presented in the preview as is. Otherwise, one now has to go to the
Debug Samples
for that, even if those images are not necessarily data (nor are they for debugging purposes).In other words, why is the preview limited only to text? It seems to me that this can be expanded to:
etc...
The text was updated successfully, but these errors were encountered: