Skip to content

NODEJS-666: Extend driver vector support to arbitrary subtypes and fix handling of variable length types (OSS C* 5.0)#427

Merged
absurdfarce merged 107 commits intoapache:masterfrom
SiyaoIsHiding:vector
Feb 20, 2025
Merged

NODEJS-666: Extend driver vector support to arbitrary subtypes and fix handling of variable length types (OSS C* 5.0)#427
absurdfarce merged 107 commits intoapache:masterfrom
SiyaoIsHiding:vector

Conversation

@SiyaoIsHiding
Copy link
Copy Markdown
Contributor

Will add more tests

Copy link
Copy Markdown
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good work @SiyaoIsHiding! I love the increased test coverage we're getting here; it's good to see the same robust coverage we have for the other drivers that support vectors coming to node.js as well. There are a few things here I'd really like to see changed, although fortunately most (all?) of them are internal things that shouldn't be super-visible to users.

Comment thread lib/utils.js
return 0xff >> extraBytesToRead;
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self (and perhaps future reviewers): everything above this spot was copied over from lib/types/duration.js, everything below was added with this PR.

Comment thread lib/utils.js
uvintUnpack: uvintUnpack
};
})();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Near as I can tell there aren't any tests for this functionality in isolation. We wind up testing some of the new stuff via the encoder tests but we don't test these functions directly. I think that's okay for now since the older functions (the ones that were copied here from the duration impl) also aren't tested anywhere.... but it probably is something we should remedy at some point.

Comment thread test/unit/encoder-vector-tests.js Outdated
// dimension: 3
// },
// value: [new types.Duration(1, 2, 3), new types.Duration(4, 5, 6), new types.Duration(7, 8, 9)]
// },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SiyaoIsHiding is there a reason we can't just test duration directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't make it work when I submitted this PR but I just made it work with the following test case

  {
    subtypeString: 'duration',
    typeInfo: {
      code: types.dataTypes.custom,
      info: [{
        code: types.dataTypes.custom,
        info: 'org.apache.cassandra.db.marshal.DurationType'
      },3],
      customTypeName: 'vector',
    },
    value: [new types.Duration(1, 2, 3), new types.Duration(4, 5, 6), new types.Duration(7, 8, 9)]
  },

Note that I have to put 'org.apache.cassandra.db.marshal.DurationType' instead of duration. Is it expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our conversation, this does not raise any concern.

Comment thread test/integration/short/vector-tests.js Outdated
},
value: [{ f1: 'a' }, { f1: 'b' }, { f1: 'c' }]
}
]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of duplication here between the data provider above and the one in encoder-vector-tests. Can we move these to a common spot in the testing framework so that both tests can leverage the same provider(s)?

Comment thread examples/package-lock.json Outdated
"link": true
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're checking this in for the examples module now? This seems.... kinda strange.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

Comment thread lib/encoder.js Outdated
* @param {ColumnInfo | null} columnInfo
* @returns
*/
this.decodeCustom = function (bytes, typeName, columnInfo = null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike the idea of adding an extra param here.

The original impl of this function passes in "type.info" from the type info it gets as an arg. Your new impl passes in "type.info" as well but also passes in... "type" as well. One of these args seems redundant; why pass "type.info" when you can just as easily get the "info" property of the type information you're also passing.

Seems like you can do the exact same thing by just passing "columnInfo" here and adding "typeName = columnInfo.info" near the top of this function.

Comment thread lib/encoder.js
Comment thread lib/encoder.js
if (offset + elemLength > buffer.length) {
throw new TypeError('Not enough bytes to decode the vector');
}
rv[i] = this.decode(buffer.subarray(offset, offset + elemLength), subtype);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the type code of the subtype here, don't we? So we can actually look up the decode function for the subtype using that code rather than having to go all the way back through decode() can't we?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't call decode(). Then we need to

  const decoder = this.decoders[type.code];
  return decoder.call(this, buffer, 'info' in type? type.info : null);

Which is basically calling decode().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: this is a pretty insignificant one to me. It's a relatively minor savings not having to go back through decode() but the difference is pretty small.

I'm perfectly fine with continuing to use decode() here.

Comment thread lib/encoder.js Outdated
}
return -1;
default:
throw new TypeError('Cannot calculate size');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm... I'm wondering if it's better to make the default case return -1. You wouldn't have to specify all the data types that should have this behaviour (like you do in this impl)... you'd just specify size functions for the types that are of a fixed size and everybody else returns -1.

Best argument I can think of not to do it that way is that there may be some benefit in being explicit for every single type: is it fixed size or not? It does make the code a bit more cumbersome, though.

Comment thread lib/encoder.js Outdated
if (!guessedType) {
throw new TypeError('Target data type could not be guessed, you should use prepared statements for accurate type mapping. Value: ' + util.inspect(value));
}else{
type = guessedType;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why... do we have an else block on an if test which throws if it's successful? Seems like the original impl says exactly the same thing but in a more succinct way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

Comment thread lib/encoder.js Outdated
if (customTypeName.startsWith(customTypeNames.vector)) {
return this.encodeVector(value, this.parseVectorTypeArgs(customTypeName, customTypeNames.vector, this.parseFqTypeName));

//TODO: here we cannot tell whether it's a vector or not, because we don't have the customTypeName
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is here: Assume the type of a vector is

{
      code: types.dataTypes.custom,
      customTypeName: 'vector',
      info: [{ code: types.dataTypes.int }, 3]
    }

the customTypeName argument we get here is

[{ code: types.dataTypes.int }, 3]

We basically lost the customTypeName: 'vector', and does not have a good way to tell it's a vector. Unless we say as long as we get an array and the second element is a number, then we treat it as a vector.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion, we decided to change the API of encoding and decoding functions to (value, type) instead of (value, type.info).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored!

Copy link
Copy Markdown
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good after the last round of changes! I love how the API seems simpler and more consistent now!

One lingering question for me about how best to represent custom types generally in a way that makes sense for vectors and other non-vector custom types but that's really it.

Comment thread lib/types/index.js
Comment thread lib/encoder.js
*/
this.decodeList = function (bytes, subtype) {
this.decodeList = function (bytes, columnInfo) {
const subtype = columnInfo.info;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

Comment thread lib/encoder.js
@absurdfarce
Copy link
Copy Markdown
Contributor

Pausing briefly; @SiyaoIsHiding pointed out that we ought to update the docs to reflect the behavioural changes contained in this PR. Will be merged once those edits are in place.

@absurdfarce absurdfarce merged commit ae4784e into apache:master Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants