Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long startup cost for simple session #3713

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Long startup cost for simple session #3713

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2015-04-22 21:41:14 +0200
From: @mlkersten
To: SQL devs <>
Version: 11.17.13 (Jan2014-SP1)
CC: @njnes

Last updated: 2015-06-22 22:32:19 +0200

Comment 20831

Date: 2015-04-22 21:41:14 +0200
From: @mlkersten

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Gecko/20100101 Firefox/37.0
Build Identifier:

Consider the TPCH sf1 script below, where sf1.load contains the relevant COPYs.

Consistently, we need to wait for 9 seconds on a trivial load:
COPY 5 RECORDS INTO region from '/export/scratch2/mk/tpch/SF-1/data/region.tbl' USING DELIMITERS '|', '|\n' LOCKED;
5 affected rows (9.5s)

compared to the rest of the load:
25 affected rows (695.376ms)
10000 affected rows (1.0s)
150000 affected rows (2.1s)
800000 affected rows (5.1s)
1500000 affected rows (5.4s)
6001215 affected rows (23.4s)

The subsequent alter script runs fine, also for the first instruction
operation successful (6.341ms)

If I place a sleep of 15 seconds after tpch_schema.sql,
I get the same results

If I explicitly stop the server and let it be restarted
by the load script, I don't see the delay.

Seems that the tpch_schema.sql transaction does not propagate the catalog updates immediately, giving this unexpected delay. (who triggers the wait?)

Reproducible: Always

Steps to Reproduce:

monetdb stop sf1
monetdb destroy sf1 -f
monetdb create sf1
monetdb release sf1
time mclient -d sf1 -ei tpch_schema.sql
time mclient -d sf1 -ei sf1.load
time mclient -d sf1 -ei tpch_alter.sql

Comment 20832

Date: 2015-04-22 22:17:38 +0200
From: @sjoerdmullender

I can reproduce the problem.

However, if I do the load UNLOCKED (i.e. not LOCKED), the problem disappears.

From looking at strace output I get the distinct impression that it has something to do with committing data. When using LOCKED, after the schema initialization I see that the first data file is loaded, but before that query returns, there are a lot of BATs that get written and fdatasync()ed. All in all that takes a long time. Without LOCKED I don't see the long list of BATs getting synchronized to disk.

Here are the timings of my CREATE script immediately followed by loading with LOCKED:
create schema
operation successful (20.326ms)
operation successful (16.295ms)
operation successful (15.564ms)
operation successful (16.778ms)
operation successful (15.954ms)
operation successful (24.331ms)
operation successful (16.029ms)
operation successful (24.783ms)
load data
5 affected rows (13195.189ms)
25 affected rows (1165.221ms)
10000 affected rows (1542.335ms)
150000 affected rows (2803.931ms)
200000 affected rows (2359.778ms)
800000 affected rows (4972.089ms)
1500000 affected rows (5651.299ms)
6001215 affected rows (19125.609ms)

And here without LOCKED:
create schema
operation successful (19.183ms)
operation successful (16.115ms)
operation successful (15.683ms)
operation successful (16.322ms)
operation successful (16.175ms)
operation successful (16.586ms)
operation successful (16.048ms)
operation successful (24.764ms)
load data
5 affected rows (163.523ms)
25 affected rows (191.679ms)
10000 affected rows (421.709ms)
150000 affected rows (1959.259ms)
200000 affected rows (1585.701ms)
800000 affected rows (4225.754ms)
1500000 affected rows (5308.068ms)
6001215 affected rows (16584.857ms)

Comment 20844

Date: 2015-04-24 13:41:11 +0200
From: @njnes

changes to the logger caused extra delay. BUNfnd(.., neg-cand list) may
help. Also a vacuum of the catalog table is needed.

Comment 20845

Date: 2015-04-24 15:21:34 +0200
From: @njnes

we use a hash(again) instead of nestedloop (the deleted list is still in efficient)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant