Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make import/export uniform for all storage plugins #2640

Open
markus2330 opened this issue Apr 22, 2019 · 8 comments

Comments

Projects
None yet
4 participants
@markus2330
Copy link
Contributor

commented Apr 22, 2019

Problem: Currently some plugins implement workarounds (see #2419) others do not support all forms of serialization (e.g. those who use fseek with a pos != 0, e.g. csvstorage). Furthermore, operating systems not having /dev/stdin or /dev/stdout are not fully supported.

Solution: The import/export framework should provide a temporary file where storage plugins write to (as they would do normally). The content of this file is copied from/to stdin/stdout.

@kodebach

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

We should let the plugin decide whether it needs the temporary file or not (e.g. via a status flag in the README). Otherwise IPC the way it is done in specload for example will be slowed down significantly, because the processes can no longer read/write in parallel.

We should also avoid creating the temporary file as far as possible. If there is anyway to find the regular file behind stdin/stdout we should use that, as the time spent copying to/from a temporary file might be significant for larger keysets.

@mpranj

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

I don't think there is a way to find out where something was piped in from. What we can do, is simply use the file path instead of piping it in.

Also, you are definitely right about the slowdown with the temp file. I simply had to use a temp file, because mmap (POSIX) can only work with regular files.

@kodebach

This comment has been minimized.

Copy link
Contributor

commented Apr 22, 2019

I simply had to use a temp file

Yes, I know that. The issue description just made it sound like a temp file should always be used l, which would be a bad solution IMO.

@mpranj

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

sound like a temp file should always be used l, which would be a bad solution IMO.

I fully agree. Many other plugins work well with non-regular files. They must not be slowed down by this.

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2019

I was considering to write an implementation hint that the temporary file should be avoided (if possible via infos/status) or that sendfile should be used. But all of this would be a premature optimization.

Of course it is very valuable if you share your experience here. Do you already know the difference in run-time for running all tests cases if all the content of import/export is copied?

@mpranj

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

No I have neither implemented nor benchmarked this. Writing to disk will be slower by orders of magnitude by definition for some years to come, until persistent memory with DRAM speed is standard.

@markus2330

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2019

Writing to disk will be slower by orders of magnitude by definition for some years to come

This is not so clear as file systems can basically avoid accessing the disk at all as long as fsync is not called. Furthermore temporary files can also be in a tmpfs.

So I would prefer that we have a look where the actual bottleneck with import/export is before we start optimizing.

@kodebach

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2019

No file system optimization will come close to writing to and reading from a pipe in parallel, like it can be done now in specload. This is simply the case because the temporary file forces a new step (copy from temporary to actual and then to another temporary) in between write and read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.