Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean-up /tmp dirctory #15

Open
sl-solution opened this issue May 25, 2023 · 13 comments
Open

Clean-up /tmp dirctory #15

sl-solution opened this issue May 25, 2023 · 13 comments
Assignees

Comments

@sl-solution
Copy link

I notice that on-disk solutions may create large temporary files during their runs, however, they may not clean up afterward (e.g. polars creates .ipc files). This may cause the undefined exception error for other solutions, when they run within the same session.

@hkpeaks
Copy link

hkpeaks commented May 30, 2023

Today I have done benchmark for DuckDB https://youtu.be/zVR77B2bDR0
The tmp file shall be cleaned after process completed.

@Tmonster Tmonster self-assigned this May 31, 2023
@Tmonster
Copy link
Collaborator

Can you provide reproducible steps for when an undefined exception is caused by a temporary file from a different solution (in the same session)?

@sl-solution
Copy link
Author

Can you provide reproducible steps for when an undefined exception is caused by a temporary file from a different solution (in the same session)?

I found this when I was using the _utils/repro.sh script to reproduce result for smaller data sets on a computer with limited hard disk. I noted that after some point all solutions failed to produce any result, and with a little investigation I figured out that the hard drive was full (due to temporary file created during the benchmark run). I would image for large data sets the /tmp directory would be bloated by huge files.

@jangorecki
Copy link

I can confirm that disk space was never a concern and scripts generally won't be handling this kind of exception.

@Tmonster
Copy link
Collaborator

Tmonster commented Jun 7, 2023

I noticed this issue too actually when getting the benchmark back up and running. I never had the issue where another solution encountered an undefined exception.

@Tmonster
Copy link
Collaborator

Tmonster commented Jun 7, 2023

@sl-solution If you still believe this would be a problem, feel free to open a PR to automatically clean the /tmp directory after every run.

@sl-solution
Copy link
Author

@sl-solution If you still believe this would be a problem, feel free to open a PR to automatically clean the /tmp directory after every run.

In Juliads I made sure it is done automatically, however, I am not sure deleting everything from /tmp is a good idea, since some of the files may be essential for other system process.

@Tmonster
Copy link
Collaborator

Tmonster commented Jun 7, 2023

I wouldn't delete everything from /tmp of course, but for R solutions it would be everything in tempdir(). Potentially all R solutions could use the same location for tmpdir() and then it could be cleaned up when the benchmarking ends

@sl-solution
Copy link
Author

I guess for polars it should be straightforward, since it uses absolute path and constant name for temporary files.

@hkpeaks
Copy link

hkpeaks commented Jun 7, 2023

I think sorting of billion rows requires the use of temporary. I have coded for billion-row jointable/filter/groupby using only 32GB ram, in fact it is certified no need using temp file.

@sl-solution
Copy link
Author

I think a systematic way to solve the issue is to assign a directory for temporary files, and ask every solution to use solely the assigned directory for on-disk calculations. The launcher can clean the directory after each run.

@Tmonster
Copy link
Collaborator

Tmonster commented Nov 9, 2023

Since the new machine has more memory, and instance storage, this has become less of an issue. Can this therefore be closed?

@sl-solution
Copy link
Author

Since the new machine has more memory, and instance storage, this has become less of an issue. Can this therefore be closed?

I guess as long as solutions keep using temp files, this will be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants