NODEJS-666: Extend driver vector support to arbitrary subtypes and fix handling of variable length types (OSS C* 5.0)#427
Conversation
absurdfarce
left a comment
There was a problem hiding this comment.
This is really good work @SiyaoIsHiding! I love the increased test coverage we're getting here; it's good to see the same robust coverage we have for the other drivers that support vectors coming to node.js as well. There are a few things here I'd really like to see changed, although fortunately most (all?) of them are internal things that shouldn't be super-visible to users.
| return 0xff >> extraBytesToRead; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Note to self (and perhaps future reviewers): everything above this spot was copied over from lib/types/duration.js, everything below was added with this PR.
| uvintUnpack: uvintUnpack | ||
| }; | ||
| })(); | ||
|
|
There was a problem hiding this comment.
Near as I can tell there aren't any tests for this functionality in isolation. We wind up testing some of the new stuff via the encoder tests but we don't test these functions directly. I think that's okay for now since the older functions (the ones that were copied here from the duration impl) also aren't tested anywhere.... but it probably is something we should remedy at some point.
| // dimension: 3 | ||
| // }, | ||
| // value: [new types.Duration(1, 2, 3), new types.Duration(4, 5, 6), new types.Duration(7, 8, 9)] | ||
| // }, |
There was a problem hiding this comment.
@SiyaoIsHiding is there a reason we can't just test duration directly?
There was a problem hiding this comment.
I couldn't make it work when I submitted this PR but I just made it work with the following test case
{
subtypeString: 'duration',
typeInfo: {
code: types.dataTypes.custom,
info: [{
code: types.dataTypes.custom,
info: 'org.apache.cassandra.db.marshal.DurationType'
},3],
customTypeName: 'vector',
},
value: [new types.Duration(1, 2, 3), new types.Duration(4, 5, 6), new types.Duration(7, 8, 9)]
},Note that I have to put 'org.apache.cassandra.db.marshal.DurationType' instead of duration. Is it expected?
There was a problem hiding this comment.
Per our conversation, this does not raise any concern.
| }, | ||
| value: [{ f1: 'a' }, { f1: 'b' }, { f1: 'c' }] | ||
| } | ||
| ]); |
There was a problem hiding this comment.
We have a lot of duplication here between the data provider above and the one in encoder-vector-tests. Can we move these to a common spot in the testing framework so that both tests can leverage the same provider(s)?
| "link": true | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Is there a reason we're checking this in for the examples module now? This seems.... kinda strange.
| * @param {ColumnInfo | null} columnInfo | ||
| * @returns | ||
| */ | ||
| this.decodeCustom = function (bytes, typeName, columnInfo = null) { |
There was a problem hiding this comment.
I really dislike the idea of adding an extra param here.
The original impl of this function passes in "type.info" from the type info it gets as an arg. Your new impl passes in "type.info" as well but also passes in... "type" as well. One of these args seems redundant; why pass "type.info" when you can just as easily get the "info" property of the type information you're also passing.
Seems like you can do the exact same thing by just passing "columnInfo" here and adding "typeName = columnInfo.info" near the top of this function.
| if (offset + elemLength > buffer.length) { | ||
| throw new TypeError('Not enough bytes to decode the vector'); | ||
| } | ||
| rv[i] = this.decode(buffer.subarray(offset, offset + elemLength), subtype); |
There was a problem hiding this comment.
We already have the type code of the subtype here, don't we? So we can actually look up the decode function for the subtype using that code rather than having to go all the way back through decode() can't we?
There was a problem hiding this comment.
If we don't call decode(). Then we need to
const decoder = this.decoders[type.code];
return decoder.call(this, buffer, 'info' in type? type.info : null);Which is basically calling decode().
There was a problem hiding this comment.
To be clear: this is a pretty insignificant one to me. It's a relatively minor savings not having to go back through decode() but the difference is pretty small.
I'm perfectly fine with continuing to use decode() here.
| } | ||
| return -1; | ||
| default: | ||
| throw new TypeError('Cannot calculate size'); |
There was a problem hiding this comment.
Hmmmm... I'm wondering if it's better to make the default case return -1. You wouldn't have to specify all the data types that should have this behaviour (like you do in this impl)... you'd just specify size functions for the types that are of a fixed size and everybody else returns -1.
Best argument I can think of not to do it that way is that there may be some benefit in being explicit for every single type: is it fixed size or not? It does make the code a bit more cumbersome, though.
| if (!guessedType) { | ||
| throw new TypeError('Target data type could not be guessed, you should use prepared statements for accurate type mapping. Value: ' + util.inspect(value)); | ||
| }else{ | ||
| type = guessedType; |
There was a problem hiding this comment.
Why... do we have an else block on an if test which throws if it's successful? Seems like the original impl says exactly the same thing but in a more succinct way.
| if (customTypeName.startsWith(customTypeNames.vector)) { | ||
| return this.encodeVector(value, this.parseVectorTypeArgs(customTypeName, customTypeNames.vector, this.parseFqTypeName)); | ||
|
|
||
| //TODO: here we cannot tell whether it's a vector or not, because we don't have the customTypeName |
There was a problem hiding this comment.
My only concern is here: Assume the type of a vector is
{
code: types.dataTypes.custom,
customTypeName: 'vector',
info: [{ code: types.dataTypes.int }, 3]
}the customTypeName argument we get here is
[{ code: types.dataTypes.int }, 3]We basically lost the customTypeName: 'vector', and does not have a good way to tell it's a vector. Unless we say as long as we get an array and the second element is a number, then we treat it as a vector.
There was a problem hiding this comment.
Per our discussion, we decided to change the API of encoding and decoding functions to (value, type) instead of (value, type.info).
absurdfarce
left a comment
There was a problem hiding this comment.
This is looking really good after the last round of changes! I love how the API seems simpler and more consistent now!
One lingering question for me about how best to represent custom types generally in a way that makes sense for vectors and other non-vector custom types but that's really it.
| */ | ||
| this.decodeList = function (bytes, subtype) { | ||
| this.decodeList = function (bytes, columnInfo) { | ||
| const subtype = columnInfo.info; |
|
Pausing briefly; @SiyaoIsHiding pointed out that we ought to update the docs to reflect the behavioural changes contained in this PR. Will be merged once those edits are in place. |
Will add more tests