Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large virtual memory spike on BLOB column select #6462

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

large virtual memory spike on BLOB column select #6462

monetdb-team opened this issue Nov 30, 2020 · 0 comments
Labels

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2017-11-09 16:40:36 +0100
From: Anton Kravchenko <<kravchenko.anton86>>
To: SQL devs <>
Version: 11.27.9 (Jul2017-SP2)
CC: kravchenko.anton86, @njnes

Last updated: 2017-12-14 14:46:07 +0100

Comment 25859

Date: 2017-11-09 16:40:36 +0100
From: Anton Kravchenko <<kravchenko.anton86>>

User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
Build Identifier:

  1. 'select v_blob from blob_table' produces a virtual memory spike ~800Gb
    [using a 'limit' reproduces virtual memory spike too]
    From 'select * from storage()' it looks like a blob column size is ~25Gb.

  2. And 'create table new_blob_table as (select v_blob from blob_table) with
    data;'
    consumes only ~50Gb of virtual memory.

  3. Also for a table with non-blob column both
    'select * from nonblob_table'
    and
    'create new_nonblobtable as (select * from nonblob_table) with data;'
    consume the ~same virtual memory.

Reproducible: Always

Steps to Reproduce:

  1. to generate data by using Python
    hex_blob='47'*2000
    hex_blob_chunk=(hex_blob+'\n')*1000

f=open('blob_data.txt','w')
for irow in range(10000):
f.write(hex_blob_chunk)
f.close()

  1. to load data into Monet blob_table
    create table blob_table(v_blob blob);
    COPY 10000000 RECORDS INTO blob_table FROM '/home/blob_data.txt' using
    delimiters ',','\n' NULL AS '' LOCKED;

  2. to produce virtual memory spike of ~800GB
    select * from blob_table limit 50000;

Actual Results:

virtual memory spike of ~800GB

Expected Results:

virtual memory of ~25GB [MonetDB storage size of the blob column]

MonetDB 5 server v11.27.9 "Jul2017-SP2" (64-bit, 128-bit integers)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2017 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 503.6GiB available memory, 32 available cpu cores
Libraries:
libpcre: 8.39 2016-06-14 (compiled with 8.32)
openssl: OpenSSL 1.0.2k 26 Jan 2017 (compiled with OpenSSL 1.0.2l 25 May 2017)
libxml2: 2.9.1 (compiled with 2.9.1)
Compiled by: akravchenko@cent7-1 (x86_64-pc-linux-gnu)
Compilation: gcc -O3 -fomit-frame-pointer -pipe -D_FORTIFY_SOURCE=2
Linking : /usr/bin/ld -m elf_x86_64

Comment 25872

Date: 2017-11-12 16:25:57 +0100
From: @njnes

the spike is caused by concurrent worker threads. They all make a slice of the inputs, combined with aggressive heap-copying this results in gdk_nr_threads*25GB (on my test system 200GB, with 8 threads). You have 32 ie leading to even more copies.

Comment 25886

Date: 2017-11-16 10:11:32 +0100
From: MonetDB Mercurial Repository <>

Changeset 230412789aec made by Sjoerd Mullender sjoerd@acm.org in the MonetDB repo, refers to this bug.

For complete details, see https//devmonetdborg/hg/MonetDB?cmd=changeset;node=230412789aec

Changeset description:

Try much harder to share vheaps of transient bats.
This fixes bug #6462, or at least the part about the (virtual) memory
consumption.

Comment 25888

Date: 2017-11-16 10:35:39 +0100
From: @sjoerdmullender

The query now runs very fast and does not produce a spike in VM use.

For the slicing that also shouldn't happen, see bug #6470.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant