-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blob storage #236
Blob storage #236
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments, focusing on code quality
Additionally, as you've said:
We really prefer minio client library here. Mostly because of consistency, as we use it everywhere else, but also because it's much lighter and works better for us in some usecases. It should be 100% compatible with other s3 implementations, at least we never had any problems.
It's also quite important, especially for large files. OTOH mwdb usually works with medium-small files only, so this may not be P1 issue right now. I can't find a working flask example anywhere, but this may help (it uses fastapi library instead of flask tough).
I'll contest psrok1 here a bit. I think what you did here is OK. We should just add another function called |
Co-authored-by: msm-code <msm2e4d534d@gmail.com>
I'll switch over to MinIO, it looks like I can do streaming responses.
I think we can use Flask's stream_with_context for downloads, it might make sense to look at the upload functionality in the future too if it doesn't use chunking - but as you said it's not very often you're going to be analyzing binaries that are bigger than a few mb
I honestly wasn't a fan of the presumption that everything was going to be on disk. I think it makes most sense to expose an API that allows you to get the binary in memory (avoids writing to disk for operations that could be done in memory) and if you need it on disk you can write to disk but if it makes more sense from a usage perspective to just expose an API to write it to disk then I'm happy to do that too. I don't want to spend too much time working on a fancy API for different file providers but happy to implement something basic so that plugins can continue to work. |
i got upload working but seem to be having some issues with the download, I just seem to be getting empty files :/ Any ideas? |
I got streaming to work! I've implemented some methods on the file model to allow reading of files regardless of blob provider. It's a little bit opinionated in that it doesn't expose a way to write a file to disk locally, but I figure if you need that you could just read the file and write it to disk yourself. For the drakvuf plugin you'd be able to do this: --- a 2020-11-18 12:14:34.000000000 +0000
+++ b 2020-11-18 12:14:50.000000000 +0000
@@ -1,8 +1,6 @@
- # Get contents path from "uploads" directory
- contents_path = file.get_path()
# Send request to Drakvuf Sandbox
req = requests.post(f"{config.drakvuf.drakvuf_url}/upload", files={
- "file": (file.sha256 + ".exe", open(contents_path, "rb")),
+ "file": (file.sha256 + ".exe", file),
}, data={
"timeout": config.drakvuf.timeout
}) |
Thank you for all your work so far! Almost there! But we really need an alterantive to Can you add a method that returns a local path if the file is stored locally, or downloads the file to a temporary location if the file is stored on s3? I've written some code that shows what I mean, maybe It'll be useful. It uses contextlib to create a simple context manager: from contextlib import contextmanager
@contextmanager
def with_local_path():
""" Gets a path to a file stored locally, or downloads it to a
temporary file for remote resources """
if app_config.mwdb.storage_provider == StorageProviderType.DISK:
yield self._calculate_path()
# 1. create a temporary file
# 2. download file from minio to this temporary file
# 3. yield path to this file
# 4. delete the file
#
# for example (using the tempfile library):
with tempfile.NamedTemporaryFile() as tf:
# TODO: download file from minio to temporary file `tf` here
# (https://docs.python.org/3/library/tempfile.html#examples)
yield tf.name
def main():
with get_local_path() as lp:
print("ok, local path exists", lp)
print("ok, temporary file was deleted") |
Co-authored-by: msm <msm@tailcall.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Your checklist for this pull request
What is the current behaviour?
Storage only supports local storage
What is the new behaviour?
Storage now supports S3 compatible blob storage (Amazon S3, MinIO etc)
Test plan
Setup MinIO or Amazon S3, set:
Closing issues
closes #235
So I wrote this before reading #235 (oops...), so there's a few changes I need to make: