Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion packages/cubejs-clickhouse-driver/driver/ClickHouseDriver.js
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,43 @@ class ClickHouseDriver extends BaseDriver {

return this.withConnection((connection, queryId) => {
return connection.querying(formattedQuery, { dataObjects: true, queryOptions: { query_id: queryId, join_use_nulls: 1 } })
.then(res => res.data);
.then(res => this._normaliseResponse(res));
});
}

_normaliseResponse(res) {
//
//
// ClickHouse returns DateTime as strings in format "YYYY-DD-MM HH:MM:SS"
// cube.js expects them in format "YYYY-DD-MMTHH:MM:SS.000", so translate them based on the metadata returned
//
// ClickHouse returns some number types as js numbers, others as js string, normalise them all to strings
//
//
if (res.data) {
res.data.forEach(row=>{
for (let field in row) {
let value = row[field]
if (value !== null) {
let meta = res.meta.find(m=>m.name == field)
if (meta.type.includes("DateTime")) {
row[field] = value.substring(0, 10) + "T" + value.substring(11, 22) + ".000"
}
else if (meta.type.includes("Date")) {
row[field] = value + "T00:00:00.000"
}
else if (meta.type.includes("Int") || meta.type.includes("Float")) {
// convert all numbers into strings
row[field] = `${value}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why numbers have to be converted to strings?

Copy link
Contributor Author

@cameronbraid cameronbraid Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is due to the fact that the range of values that the javascript number type supports can overflow. Clickhouse caters for this by returning strings for UInt64,Int64 already. I assume postgres does the same, and assumed that this was the protocol for cube.js

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand this is only an issue with Int64 and UInt64 data types. How is it handled for Postgres which also has support for 64-bit numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paveltiunov what should we be doing here with numbers vs strings ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably have some variations from driver to driver but as a rule of thumb all numeric types should be returned as strings due to JS can't handle correctly big integers and decimal numbers at all.

Also regarding Timestamp format: driver should always return only local dates in YYYY-DD-MMTHH:MM:SS.sss format as all conversions are done in SQL. There's only one case driver can return UTC: it's UTC timestamp representation in cube.js server local timezone. Otherwise there should be no 'Z' at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like something to be added to driver development guidelines :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inikolaev Yep. Definitely worth adding it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re timestamps - I am confused as for example

The following test uses timezone America/Los_Angeles

https://github.com/statsbotco/cube.js/blob/bde5ea1dc493f4af599c6fcdcaecbcd399dfcebd/packages/cubejs-schema-compiler/test/GraphBuilderTest.js#L302

However the timestamp asserted is UTC (including a Z)

https://github.com/statsbotco/cube.js/blob/bde5ea1dc493f4af599c6fcdcaecbcd399dfcebd/packages/cubejs-schema-compiler/test/GraphBuilderTest.js#L315

Actually - Every date in that test is formatted the same.

Is there an example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameronbraid Yep. It's confusing. The thing is Postgres is that exception from the rule: it provides UTC timestamp representation in cube.js server local timezone. Actually it causing problems so we're considering to fix this in the Postgres driver itself.

}
}
}
})
}
return res.data
return res.data
}

async release() {
await this.pool.drain();
await this.pool.clear();
Expand Down
68 changes: 64 additions & 4 deletions packages/cubejs-clickhouse-driver/test/ClickHouseDriverTest.js
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,64 @@ describe('ClickHouseDriver', () => {
})
});

// Int8
// Int16
// Int32
// Int64
// UInt8
// UInt16
// UInt32
// UInt64
// Float32
// Float64
it('should normalise all numbers as strings', async () => {
await doWithDriver(async (driver) => {
let name = `temp_${Date.now()}`
try {
await driver.createSchemaIfNotExists(name);
await driver.query(`CREATE TABLE ${name}.a (int8 Int8, int16 Int16, int32 Int32, int64 Int64, uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, float32 Float32, float64 Float64) ENGINE Log`);
await driver.query(`INSERT INTO ${name}.a VALUES (1,1,1,1,1,1,1,1,1,1)`);

const values = await driver.query(`SELECT * FROM ${name}.a`);
values.should.deepEqual([{
int8: '1',
int16: '1',
int32: '1',
int64: '1',
uint8: '1',
uint16: '1',
uint32: '1',
uint64: '1',
float32: '1',
float64: '1',
}])
}
finally {
await driver.query(`DROP DATABASE ${name}`)
}
})
});

it('should normalise all dates as ISO8601', async () => {
await doWithDriver(async (driver) => {
let name = `temp_${Date.now()}`
try {
await driver.createSchemaIfNotExists(name);
await driver.query(`CREATE TABLE ${name}.a (dateTime DateTime, date Date) ENGINE Log`);
await driver.query(`INSERT INTO ${name}.a VALUES ('2019-04-30 11:55:00', '2019-04-30')`);

const values = await driver.query(`SELECT * FROM ${name}.a`);
values.should.deepEqual([{
dateTime: '2019-04-30T11:55:00.000Z',
date: '2019-04-30T00:00:00.000Z',
}])
}
finally {
await driver.query(`DROP DATABASE ${name}`)
}
})
});

it('should substitute parameters', async () => {
await doWithDriver(async (driver) => {
let name = `temp_${Date.now()}`
Expand All @@ -112,7 +170,7 @@ describe('ClickHouseDriver', () => {
await driver.query(`CREATE TABLE ${name}.test (x Int32, s String) ENGINE Log`);
await driver.query(`INSERT INTO ${name}.test VALUES (?, ?), (?, ?), (?, ?)`, [1, "str1", 2, "str2", 3, "str3"]);
const values = await driver.query(`SELECT * FROM ${name}.test WHERE x = ?`, 2);
values.should.deepEqual([{x: 2, s: "str2"}])
values.should.deepEqual([{x: '2', s: "str2"}])
}
finally {
await driver.query(`DROP DATABASE ${name}`)
Expand All @@ -133,14 +191,16 @@ describe('ClickHouseDriver', () => {

const values = await driver.query(`SELECT * FROM ${name}.a LEFT OUTER JOIN ${name}.b ON a.x = b.x`);
values.should.deepEqual([
{ x: 1, s: 'str1', 'b.x': 0, 'b.s': null },
{ x: 2, s: 'str2', 'b.x': 2, 'b.s': 'str2' },
{ x: 3, s: 'str3', 'b.x': 3, 'b.s': 'str3' }
{ x: '1', s: 'str1', 'b.x': '0', 'b.s': null },
{ x: '2', s: 'str2', 'b.x': '2', 'b.s': 'str2' },
{ x: '3', s: 'str3', 'b.x': '3', 'b.s': 'str3' }
])
}
finally {
await driver.query(`DROP DATABASE ${name}`)
}
})
});


});
209 changes: 209 additions & 0 deletions packages/cubejs-schema-compiler/adapter/ClickHouseQuery.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
const moment = require('moment-timezone');
const R = require('ramda');

const BaseQuery = require('./BaseQuery');
const BaseFilter = require('./BaseFilter');

const GRANULARITY_TO_INTERVAL = {
date: 'Day',
hour: 'Hour',
month: 'Month',
quarter: 'Quarter',
year: 'Year',
};

class ClickHouseFilter extends BaseFilter {
likeIgnoreCase(column, not) {
return `lower(${column}) ${not ? 'NOT':''} LIKE CONCAT('%', lower(?), '%')`;
}
}


class ClickHouseQuery extends BaseQuery {

newFilter(filter) {
return new ClickHouseFilter(this, filter);
}

escapeColumnName(name) {
return `\`${name}\``;
}


convertTz(field) {
//
// field yields a Date or a DateTime so add in the extra toDateTime to support the Date case
//
// https://clickhouse.yandex/docs/en/data_types/datetime/
// https://clickhouse.yandex/docs/en/query_language/functions/date_time_functions/
//
//
return `toTimeZone(toDateTime(${field}), '${this.timezone}')`;
}

timeGroupedColumn(granularity, dimension) {
if (granularity == 'week') {
return `toDateTime(toMonday(${dimension}, '${this.timezone}'), '${this.timezone}')`
}
else {
let interval = GRANULARITY_TO_INTERVAL[granularity]
return `toDateTime(toStartOf${interval}(${dimension}, '${this.timezone}'), '${this.timezone}')`;
}
}

_calcInterval(operation, date, interval) {
const [intervalValue, intervalUnit] = interval.split(" ");
let fn = operation + intervalUnit[0].toUpperCase() + intervalUnit.substring(1) + "s"
return `${fn}(${date}, ${intervalValue})`;
}

subtractInterval(date, interval) {
return this._calcInterval("subtract", date, interval)
}

addInterval(date, interval) {
return this._calcInterval("add", date, interval)
}

timeStampCast(value) {
// value yields a string formatted in ISO8601, so this function returns a expression to parse a string to a DateTime

//
// ClickHouse provides toDateTime which expects dates in UTC in format YYYY-MM-DD HH:MM:SS
//
// However parseDateTimeBestEffort works with ISO8601
//
return `parseDateTimeBestEffort(${value})`
}

dateTimeCast(value) {

// value yields a string formatted in ISO8601, so this function returns a expression to parse a string to a DateTime

//
// ClickHouse provides toDateTime which expects dates in UTC in format YYYY-MM-DD HH:MM:SS
//
// However parseDateTimeBestEffort works with ISO8601
//
return `parseDateTimeBestEffort(${value})`
}

getFieldAlias(id) {
const equalIgnoreCase = (a, b) => (
typeof a === 'string' && typeof b === 'string'
&& a.toUpperCase() === b.toUpperCase()
);

let field;

field = this.dimensionsForSelect().find(d =>
equalIgnoreCase(d.dimension, id)
);

if (field) {
return field.aliasName();
}

field = this.measures.find(d =>
equalIgnoreCase(d.measure, id) || equalIgnoreCase(d.expressionName, id)
);

if (field) {
return field.aliasName();
}

return null;
}

orderHashToString(hash) {

//
// ClickHouse doesn't support order by index column, so map these to the alias names
//

if (!hash || !hash.id) {
return null;
}

const fieldAlias = this.getFieldAlias(hash.id);

if (fieldAlias === null) {
return null;
}

const direction = hash.desc ? 'DESC' : 'ASC';
return `${fieldAlias} ${direction}`;
}

groupByClause() {

//
// ClickHouse doesn't support group by index column, so map these to the alias names
//

const names = this.dimensionAliasNames();
return names.length ? ` GROUP BY ${names.join(', ')}` : '';
}


primaryKeyCount(cubeName, distinct) {
const primaryKeySql = this.primaryKeySql(this.cubeEvaluator.primaryKeys[cubeName], cubeName);
if (distinct) {
return `uniqExact(${primaryKeySql})`
}
else {
return `count(${primaryKeySql})`
}
}


seriesSql(timeDimension) {
/*
postgres uses :

SELECT parseDateTimeBestEffort(date_from), parseDateTimeBestEffort(date_to) FROM
(
VALUES
('2017-01-01T00:00:00.000', '2017-01-01T23:59:59.999'),
('2017-01-02T00:00:00.000', '2017-01-02T23:59:59.999'),
('2017-01-03T00:00:00.000', '2017-01-03T23:59:59.999'),
('2017-01-04T00:00:00.000', '2017-01-04T23:59:59.999'),
('2017-01-05T00:00:00.000', '2017-01-05T23:59:59.999'),
('2017-01-06T00:00:00.000', '2017-01-06T23:59:59.999'),
('2017-01-07T00:00:00.000', '2017-01-07T23:59:59.999'),
('2017-01-08T00:00:00.000', '2017-01-08T23:59:59.999'),
('2017-01-09T00:00:00.000', '2017-01-09T23:59:59.999'),
('2017-01-10T00:00:00.000', '2017-01-10T23:59:59.999')
) AS dates (date_from, date_to)
) AS `visitors.created_at_series`

*/
/*

ClickHouse uses :

select
parseDateTimeBestEffort(arrayJoin(['2017-01-01T00:00:00.000','2017-01-02T00:00:00.000'])) as date_from,
parseDateTimeBestEffort(arrayJoin(['2017-01-01T23:59:59.999','2017-01-02T23:59:59.999'])) as date_to
...
)
*/

let dates_from = []
let dates_to = []
timeDimension.timeSeries().forEach(([from, to]) => {
dates_from.push(from)
dates_to.push(to)
}

);
return `SELECT parseDateTimeBestEffort(arrayJoin(['${dates_from.join("','")}'])) as date_from, parseDateTimeBestEffort(arrayJoin(['${dates_to.join("','")}'])) as date_to`;
}

concatStringsSql(strings) {
return "toString(" + strings.join(") || toString(") + ")";
}

}

module.exports = ClickHouseQuery;
5 changes: 3 additions & 2 deletions packages/cubejs-schema-compiler/adapter/QueryBuilder.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ const redshift = require('./RedshiftQuery');
const prestodb = require('./PrestodbQuery');
const vertica = require('./VerticaQuery');
const snowflake = require('./SnowflakeQuery');
const clickhouse = require('./ClickHouseQuery');

const ADAPTERS = {
postgres,
Expand All @@ -19,9 +20,9 @@ const ADAPTERS = {
qubole_prestodb: prestodb,
athena: prestodb,
vertica,
snowflake
snowflake,
clickhouse,
};

exports.query = (compilers, adapter, queryOptions) => {
if (!ADAPTERS[adapter]) {
return null;
Expand Down
17 changes: 17 additions & 0 deletions packages/cubejs-schema-compiler/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: "3"

services:

clickhouse-server:
image: yandex/clickhouse-server
ports:
- "18123:8123"

postgresql-server:
image: postgres:11.2
ports:
- "5432:5432"
environment:
POSTGRES_DB: model_test
POSTGRES_PASSWORD: test
POSTGRES_USER: test
Loading