Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary working folders are left behind on Middle Managers after tasks complete #12332

Open
sergioferragut opened this issue Mar 15, 2022 · 8 comments

Comments

@sergioferragut
Copy link
Contributor

Affected Version

Apache Druid 0.22.1

Description

This problem was originally reported here: https://www.druidforum.org/t/temp-folder-size-was-increasing-due-to-that-peons-processing-taking-more-time-how-to-clear-temp-folder-automatically/7139

I was able to reproduce it by running on a small minikube deployment by running the vanilla wikipedia index_parallel ingestion a few times, each with a different target datasource name and confirmed that after the jobs completed the temporary folders for the tasks are not being removed, after 3 runs, the ~/var/tmp folder still contained the three empty folders:

~/var/tmp $ ls -l
total 12
drwx------    2 druid    druid         4096 Mar 14 23:39 druid-realtime-persist1040350100896362009
drwx------    2 druid    druid         4096 Mar 14 23:32 druid-realtime-persist668375622911252079
drwx------    2 druid    druid         4096 Mar 14 23:34 druid-realtime-persist944793843865837077
~/var/tmp $ ls -l druid-realtime-persist944793843865837077
total 0
~/var/tmp $ ls -l druid-realtime-persist668375622911252079
total 0
~/var/tmp $ ls -l druid-realtime-persist1040350100896362009
total 0

The original report on Druid Forum spoke of thousands of such folders left behind.

@lejinghu
Copy link

We saw this too in our clusters. Also timed out queries are also leaving tmp folders.
As a workaround we are cleaning them manually using cron jobs.

Copy link

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 11, 2023
Copy link

github-actions bot commented Jan 9, 2024

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2024
@asdf2014
Copy link
Member

I recommend reopening this issue, as I've also encountered such problem, which could lead to the failure of ingestion task if the disk space is fully, and this could be a significant concern that this belongs to the resource leak problem 😅

@asdf2014
Copy link
Member

Hi @sergioferragut , have you had a chance to check the ~/var/druid/task/ dir? I find many outdated single_phase_sub_task_xxx directories with druid-input-entity-xxx.tmp file, which is worse than tmp folders..

@asdf2014
Copy link
Member

@abhishekagarwal87 Do you have any idea on this one 😄

@abhishekagarwal87
Copy link
Contributor

What version are you on? I don't see such folders on my local box. Can you post your ingestion spec that you are running?

@asdf2014
Copy link
Member

Hi @abhishekagarwal87 , same as the version that @sergioferragut mentioned in this issue, yes, this indeed is a very low probability event. Now that we are using the MoK mode with the latest version of Druid, this issue no longer affects us 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants