Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonetDB not cleaning intermediate results which leads to filling up disk space and ultimately server crash #3825

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2015-10-17 01:54:13 +0200
From: Alex <<abraun_75>>
To: SQL devs <>
Version: 11.21.5 (Jul2015)
CC: @njnes

Last updated: 2015-11-03 10:18:35 +0100

Comment 21347

Date: 2015-10-17 01:54:13 +0200
From: Alex <<abraun_75>>

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
Build Identifier:

MonetDb stores intermediate results within the bat directory and the corresponding sub directories.

Every time a query is being executed a new .tail file is being created - even if the same query is used.

These .tail files are only being deleted after restarting the server (dbfarm). Closing the connection, resultset and statment does not remove the files.

Eventually this leads to insuffient disk space and the server crashes.

I cannot share the data I was using, but I believe this behavior can be reproduced by generating a random dataset. The data I was using consists of two tables. Table A stores 10 million rows (10 columns) and Table B stores 1 Million rows (3 columns). Table A Column1 refernces Table B Column 1.

Both tables are being joined (using Column 1) and some statistical values are being calculated (QUARTILE, AVG, SUM using GROUP BY).

I tried to execute the same SQL query 1000 times sequentially and after the 60th iteration monetDB crashed because of insuffcient disk space.

I sent a request to the user mailing list and got the response that other users faced the same issue and are forced to restart the server in a nightly maintenance window.

I think this is a bug and the intermediate files should be deleted after the query has been executed.

Reproducible: Always

Steps to Reproduce:

  1. ingest a medium size data set (e.g. 10 Million rows and 1 Million rows) with at least two tables
  2. Create a SQL statement like this pseudo code
    SELECT
    COUNT(*) AS TOTAL, SUM(VALUE_DECIMAL) AS VALUE, STRING_VALUE,
    MIN(VALUE_DECIMAL) as min,
    QUANTILE (VALUE_DECIMAL,0.25) AS Q25,
    QUANTILE (VALUE_DECIMAL,0.5) AS Q50,
    QUANTILE (VALUE_DECIMAL,0.75) AS Q75,
    MAX(VALUE_DECIMAL) as max
    FROM TABLE_A AS a
    JOIN TABLE_B as b on (b.INT_ID = a.INT_ID)
    GROUP BY STRING_VALUE
  3. Execute the SQL statement several times
  4. Monitor the disk usage after the execution

You should see that the used disk space increases with each iteration. Every iteration creates a new .tail file.

  1. Restart the dbFarm
  2. Check the disk usage
  3. Now you should see that the intermediate files were deleted

Actual Results:

New .tal file with each iteration, increasing disk usage until server crashes

Expected Results:

.tail files are deleted after the query has been executed

Comment 21349

Date: 2015-10-17 11:16:32 +0200
From: MonetDB Mercurial Repository <>

Changeset a2d0aed144f5 made by Niels Nes niels@cwi.nl in the MonetDB repo, refers to this bug.

For complete details, see http//devmonetdborg/hg/MonetDB?cmd=changeset;node=a2d0aed144f5

Changeset description:

unfix quantile bat fixes Bug #3825

Comment 21350

Date: 2015-10-17 11:22:58 +0200
From: @njnes

the quantile function leaked bats.

Comment 21448

Date: 2015-11-03 10:18:35 +0100
From: @sjoerdmullender

Jul2015 SP1 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant