Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monetdb fails on remote tables #6227

Closed
monetdb-team opened this issue Nov 30, 2020 · 0 comments
Closed

Monetdb fails on remote tables #6227

monetdb-team opened this issue Nov 30, 2020 · 0 comments

Comments

@monetdb-team
Copy link

@monetdb-team monetdb-team commented Nov 30, 2020

Date: 2017-02-23 17:12:12 +0100
From: Manuel <>
To: SQL devs <>
Version: 11.23.13 (Jun2016-SP2)
CC: @njnes

Last updated: 2017-03-31 13:28:34 +0200

Comment 25055

Date: 2017-02-23 17:12:12 +0100
From: Manuel <>

User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Build Identifier:

Good afternoon,

I have been trying using distributed query processing monetdb.

This is my setup:

  • two hosts 'A' and 'B' running monetdb. 'A' and 'B' have the same amount of resources (cpus, memory, storage ....)

On host 'A' I have a setup:
- table 'T1' containing 10 million records.
On host 'B' I have setup:
- table 'T2' containing another set of 10 million records
- remote table 'T1' mapped to T1 on host 'A'
- table 'T3' a table containing the records of 'T1' and 'T2' (as a local table: this contains 20 million records),
- merge table 'T3_remote' as the merge table of 'T2' (local table) and 'T1' (remote table).
'T3' and 'T3_remote' will effectively contain the same 20 million records, one with all records saved locally, the other as a merge table of a local table and a remote table.

All tables have the same schema:

(
"customer_gender" CHAR(1),
"customer_region" VARCHAR(15) NOT NULL,
"customer_country" VARCHAR(40) NOT NULL,
"customer_state" VARCHAR(40),
"customer_city" VARCHAR(50) NOT NULL,
"customer_date_of_birth" DATE,
"customer_id" INTEGER NOT NULL,
"product_family" VARCHAR(20) NOT NULL,
"product_category" VARCHAR(30) NOT NULL,
"product_subcategory" VARCHAR(30) NOT NULL,
"product_name" VARCHAR(50) NOT NULL,
"product_id" INTEGER NOT NULL,
"store_id" INTEGER NOT NULL,
"store_longitude" DECIMAL(12,8) NOT NULL,
"store_latitude" DECIMAL(12,8) NOT NULL,
"store_name" VARCHAR(50) NOT NULL,
"store_manager" VARCHAR(50) NOT NULL,
"store_phone_number" VARCHAR(20) NOT NULL,
"store_region" VARCHAR(15) NOT NULL,
"store_country" VARCHAR(40) NOT NULL,
"store_state" VARCHAR(40),
"store_city" VARCHAR(50) NOT NULL,
"order_date" DATE NOT NULL,
"year_begin_date" DATE,
"quarter_begin_date" DATE,
"month_begin_date" DATE,
"yyyymm" INTEGER,
"yyyymmdd" INTEGER,
"ddmonyyyy" VARCHAR(9),
"line_price" DECIMAL(12,2) NOT NULL,
"line_cost" DECIMAL(12,2) NOT NULL,
"line_margin" DECIMAL(13,2) NOT NULL,
"line_margin_percent" DECIMAL(12,8) NOT NULL
);

All queries will be performed against the Monetdb instance running on host 'B'.

If I try to execute a simple query like :

select * from  "T1" where product_id > 1757;

The query execution fails with the error (of course the query runs fine on the local table):

	TypeException:user.l4[6]:'algebra.thetasubselect' undefined in:     algebra.thetasubselect(X_40:bat[:int],X_41:any,">":str);
	SQLState:  22000
	ErrorCode: 0
	Error: (mapi:monetdb://monetdb@192.168.1.46/visokio) 'algebra.thetasubselect' undefined in:     algebra.thetasubselect(X_46:bat[:int],X_47:any,">":str);
	SQLState:  22000
	ErrorCode: 0

I receive similar errors on every query containing a where clause.

Queries with where conditions also fail on T3_remote.

If I try queries without a where condition they are able to run on the remote table, but it seems that same query executed distributedly runs much slower than the query executed on the local table.
For example the query:

select count(*), avg("line_cost"), "product_family" from "T3_remote" group by "product_family"

takes from 17 seconds up to 28 seconds to execute, while the same query executed against the local table (containing the same records)

select count(*), avg("line_cost"), "product_family" from "T3" group by "product_family"

takes 0.2 up to 4 seconds.

Am I doing something wrong? Is there a reason why I cannot execute a query containing a where clause ?

Also, why is the query involving the remote table so much slower than the same query executed on the local table ? Isn't the query executed in a distribute manner supposed to leverage the processing power of both machines ?

Thank you for your help, let me know if you need some additional information, log files, the actual data of the tables ....

Best regards,

Manuel

Reproducible: Always

Actual Results:

Being able to perform a query on a remote table with a where clause

Comment 25153

Date: 2017-03-16 08:41:53 +0100
From: @njnes

fixed in the Dec2016 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant