Skip to content

Commit

Permalink
lua: add varbinary type
Browse files Browse the repository at this point in the history
Implementation notes:
 - The varbinary type is implemented as VLS cdata so we can't use
   the existing luaL_pushcdata and luaL_checkcdatas helpers for
   pushing an object of this type to Lua stack. Instead, we copied
   the implementation from the Lua JIT internals.
 - We already have the code handling `MP_BIN` fields in all built-in
   serializers. We just need to patch it to convert the data to/from
   a varbinary object instead of a plain string.
 - We updated the tuple.tostring method to set the NOWRAP base64
   encoder flag when dumping binary blobs. The flag was apparently
   omitted by mistake because we mask all other new line characters
   while converting a tuple to a string.
 - The box/varbinary_type test was rewritten using the luatest
   framework with all the FFI code needed to insert binary data
   replaced with the new varbinary object.
 - We have to update quite a few SQL tests involving varbinary type
   because binary blobs are now returned as varbinary objects, not
   as plain strings, as they used to be.

Closes tarantool#1629

@TarantoolBot document
Title: Document the varbinary type

The new module `varbinary` was introduced. The module implements the
following functions:
 - `varbinary.new` - constructs a varbinary object from a plain string
   or cdata pointer and size (to be used with the `buffer` module).
 - `varbinary.is` - returns true if the argument is a varbinary object.

```Lua
local bin = varbinary.new('data')
assert(varbinary.is(bin))
assert(not varbinary.is('data'))
```

Like a plain string, a varbinary object stores arbitrary data. Unlike
a plain string, it's encoded as a binary blob by the built-in encoders
that support the binary type (MsgPack, YAML). (Actually, encoding binary
blobs with the proper type is the main goal of the new type.)

```
tarantool> '\xFF\xFE'
---
- "\xFF\xFE"
...

tarantool> varbinary.new('\xFF\xFE')
---
- !!binary //4=
...

tarantool> msgpack.encode('\xFF\xFE')
---
- "\xA2\xFF\xFE"
...

tarantool> msgpack.encode(varbinary.new('\xFF\xFE'))
---
- "\xC4\x02\xFF\xFE"
...
```

Note, the JSON format doesn't support the binary type so a varbinary
object is still encoded as a plain string:

```
tarantool> json.encode('\xFF\xFE')
---
- "\"\xFF\xFE\""
...

tarantool> json.encode(varbinary.new('\xFF\xFE'))
---
- "\"\xFF\xFE\""
...
```

The built-in decoders now decode binary data fields (fields with the
'binary' tag in YAML; the `MP_BIN` type in MsgPack) to a varbinary
object by default:

```
tarantool> varbinary.is(msgpack.decode('\xC4\x02\xFF\xFE'))
---
- true
...

tarantool> varbinary.is(yaml.decode('!!binary //4='))
---
- true
...
```

This also implies that the data stored in the database under the
'varbinary' field type is now returned to Lua not as a plain string,
but as a varbinary object. It's possible to revert to the old behavior
by toggling the new compat option `binary_data_decoding` because this
change may break backward compatibility:

```
tarantool> compat.binary_data_decoding = 'old'
---
...

tarantool> varbinary.is(msgpack.decode('\xC4\x02\xFF\xFE'))
---
- false
...

tarantool> varbinary.is(yaml.decode('!!binary //4='))
---
- false
...
```

Please create a documentation page for the new compat option:
https://tarantool.io/compat/binary_data_decoding

A varbinary object implements the following meta-methods:
- `__len` - returns the length of the binary data, in bytes.
- `__tostring` - returns the data in a plain string.
- `__eq` - returns true if the varbinary object contains
  the same data as another varbinary object or a string.

```Lua
local bin = varbinary.new('foo')
assert(#bin == 3)
assert(tostring(bin) == 'foo')
assert(bin == 'foo')
assert(bin ~= 'bar')
assert(bin == varbinary.new('foo'))
assert(bin ~= varbinary.new('bar'))
```

There are no string manipulation methods, like `string.sub` or
`string.match`. If you need to match a substring in a varbinary
object, you have to convert it to a string first.

For more details, see the [design document][1].

[1]: https://www.notion.so/tarantool/varbinary-in-Lua-a0ce453dcf5a46e3bc421bf80d4cc276
  • Loading branch information
locker committed Jun 28, 2023
1 parent 739be77 commit ba749e8
Show file tree
Hide file tree
Showing 31 changed files with 638 additions and 413 deletions.
9 changes: 9 additions & 0 deletions changelogs/unreleased/gh-1629-varbinary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## feature/lua

* **[Breaking change]** Added the new `varbinary` type to Lua. An object of
this type is similar to a plain string but encoded in MsgPack as `MP_BIN` so
it can be used for storing binary blobs in the database. This also works the
other way round: data fields stored as `MP_BIN` are now decoded in Lua as
varbinary objects, not as plain strings, as they used to be. Since the latter
may cause compatibility issues, the new compat option `binary_data_decoding`
was introduced to revert the built-in decoder to the old behavior (gh-1629).
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ lua_source(lua_sources lua/timezones.lua timezones_lua)
lua_source(lua_sources lua/print.lua print_lua)
lua_source(lua_sources lua/pairs.lua pairs_lua)
lua_source(lua_sources lua/compat.lua compat_lua)
lua_source(lua_sources lua/varbinary.lua varbinary_lua)
if (ENABLE_COMPRESS_MODULE)
lua_source(lua_sources ${COMPRESS_MODULE_LUA_SOURCE} compress_lua)
endif()
Expand Down
4 changes: 2 additions & 2 deletions src/box/tuple_convert.c
Original file line number Diff line number Diff line change
Expand Up @@ -176,14 +176,14 @@ encode_node(yaml_emitter_t *emitter, const char **data)
str = *data;
*data += len;
style = YAML_ANY_SCALAR_STYLE;
binlen = base64_encode_bufsize(len, 0);
binlen = base64_encode_bufsize(len, BASE64_NOWRAP);
bin = (char *) malloc(binlen);
if (bin == NULL) {
diag_set(OutOfMemory, binlen, "malloc",
"tuple_to_yaml");
return 0;
}
binlen = base64_encode(str, len, bin, binlen, 0);
binlen = base64_encode(str, len, bin, binlen, BASE64_NOWRAP);
str = bin;
len = binlen;
tag = (yaml_char_t *) LUAYAML_TAG_PREFIX "binary";
Expand Down
16 changes: 16 additions & 0 deletions src/lua/compat.lua
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,13 @@ with all its replicasets.
https://tarantool.io/compat/box_info_cluster_meaning
]]

local BINARY_DATA_DECODING_BRIEF = [[
Whether a binary data field should be stored in a varbinary object or a plain
string when decoded in Lua.
https://tarantool.io/compat/binary_data_decoding
]]

-- Returns an action callback that toggles a tweak.
local function tweak_action(tweak_name, old_tweak_value, new_tweak_value)
return function(is_new)
Expand Down Expand Up @@ -109,6 +116,15 @@ local options = {
brief = BOX_INFO_CLUSTER_MEANING_BRIEF,
action = tweak_action('box_info_cluster_new_meaning', false, true),
},
binary_data_decoding = {
default = 'new',
obsolete = nil,
brief = BINARY_DATA_DECODING_BRIEF,
action = function(is_new)
tweaks.yaml_decode_binary_as_string = not is_new
tweaks.msgpack_decode_binary_as_string = not is_new
end,
},
}

-- Array with option names in order of addition.
Expand Down
2 changes: 2 additions & 0 deletions src/lua/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ extern char minifio_lua[],
table_lua[],
trigger_lua[],
string_lua[],
varbinary_lua[],
swim_lua[],
jit_p_lua[], /* LuaJIT 2.1 profiler */
jit_zone_lua[], /* LuaJIT 2.1 profiler */
Expand Down Expand Up @@ -279,6 +280,7 @@ static const char *lua_modules[] = {
"env", env_lua,
"buffer", buffer_lua,
"string", string_lua,
"varbinary", varbinary_lua,
"table", table_lua,
"msgpackffi", msgpackffi_lua,
"crypto", crypto_lua,
Expand Down
16 changes: 14 additions & 2 deletions src/lua/msgpack.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@

#include "core/assoc.h"
#include "core/decimal.h" /* decimal_unpack() */
#include "core/tweaks.h"
#include "lua/decimal.h" /* luaT_newdecimal() */
#include "mp_extension_types.h"
#include "mp_uuid.h" /* mp_decode_uuid() */
Expand Down Expand Up @@ -106,6 +107,13 @@ struct luamp_iterator {

static const char luamp_iterator_typename[] = "msgpack.iterator";

/**
* If this flag is set, a binary data field will be decoded to a plain Lua
* string, not a varbinary object.
*/
static bool msgpack_decode_binary_as_string = false;
TWEAK_BOOL(msgpack_decode_binary_as_string);

void
luamp_error(void *error_ctx)
{
Expand Down Expand Up @@ -223,7 +231,8 @@ luamp_encode_with_translation_r(struct lua_State *L,
type = MP_STR;
break;
case MP_BIN:
mpstream_encode_strn(stream, field->sval.data, field->sval.len);
mpstream_encode_binl(stream, field->sval.len);
mpstream_memcpy(stream, field->sval.data, field->sval.len);
type = MP_BIN;
break;
case MP_INT:
Expand Down Expand Up @@ -430,7 +439,10 @@ luamp_decode(struct lua_State *L, struct luaL_serializer *cfg,
{
uint32_t len = 0;
const char *str = mp_decode_bin(data, &len);
lua_pushlstring(L, str, len);
if (msgpack_decode_binary_as_string)
lua_pushlstring(L, str, len);
else
luaT_pushvarbinary(L, str, len);
return;
}
case MP_BOOL:
Expand Down
32 changes: 29 additions & 3 deletions src/lua/msgpackffi.lua
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
-- msgpackffi.lua (internal file)

local tweaks = require('internal.tweaks')
local ffi = require('ffi')
local buffer = require('buffer')
local builtin = ffi.C
local msgpack = require('msgpack') -- .NULL, .array_mt, .map_mt, .cfg
local varbinary = require('varbinary')
local int8_ptr_t = ffi.typeof('int8_t *')
local uint8_ptr_t = ffi.typeof('uint8_t *')
local uint16_ptr_t = ffi.typeof('uint16_t *')
Expand Down Expand Up @@ -216,6 +218,20 @@ local function encode_str(buf, str)
ffi.copy(p, str, len)
end

local function encode_bin(buf, bin)
local len = #bin
buf:reserve(5 + len)
if len <= 0xff then
encode_u8(buf, 0xc4, len)
elseif len <= 0xffff then
encode_u16(buf, 0xc5, len)
else
encode_u32(buf, 0xc6, len)
end
local p = buf:alloc(len)
ffi.copy(p, bin, len)
end

local function encode_array(buf, size)
if size <= 0xf then
encode_fix(buf, 0x90, size)
Expand Down Expand Up @@ -357,6 +373,7 @@ on_encode(ffi.typeof('const unsigned char'), encode_int)
on_encode(ffi.typeof('bool'), encode_bool_cdata)
on_encode(ffi.typeof('float'), encode_float)
on_encode(ffi.typeof('double'), encode_double)
on_encode(ffi.typeof('struct varbinary'), encode_bin)
on_encode(ffi.typeof('decimal_t'), encode_decimal)
on_encode(ffi.typeof('struct tt_uuid'), encode_uuid)
on_encode(ffi.typeof('const struct error &'), encode_error)
Expand Down Expand Up @@ -518,6 +535,15 @@ local function decode_str(data, size)
return ret
end

local function decode_bin(data, size)
if tweaks.msgpack_decode_binary_as_string then
return decode_str(data, size)
end
local ret = varbinary.new(data[0], size)
data[0] = data[0] + size
return ret
end

local function decode_array(data, size)
assert (type(size) == "number")
local arr = {}
Expand Down Expand Up @@ -599,9 +625,9 @@ end

local decoder_hint = {
--[[{{{ MP_BIN]]
[0xc4] = function(data) return decode_str(data, decode_u8(data)) end;
[0xc5] = function(data) return decode_str(data, decode_u16(data)) end;
[0xc6] = function(data) return decode_str(data, decode_u32(data)) end;
[0xc4] = function(data) return decode_bin(data, decode_u8(data)) end;
[0xc5] = function(data) return decode_bin(data, decode_u16(data)) end;
[0xc6] = function(data) return decode_bin(data, decode_u32(data)) end;

--[[MP_FLOAT, MP_DOUBLE]]
[0xca] = decode_float;
Expand Down
11 changes: 10 additions & 1 deletion src/lua/serializer.c
Original file line number Diff line number Diff line change
Expand Up @@ -538,8 +538,17 @@ luaL_tofield(struct lua_State *L, struct luaL_serializer *cfg, int index,
field->type = MP_NIL;
return 0;
}
/* Fall through */
field->type = MP_EXT;
field->ext_type = MP_UNKNOWN_EXTENSION;
return 0;
default:
if (ctypeid == CTID_VARBINARY) {
field->type = MP_BIN;
field->sval.data = luaT_tovarbinary(
L, index, &field->sval.len);
assert(field->sval.data != NULL);
return 0;
}
field->type = MP_EXT;
if (ctypeid == CTID_DECIMAL) {
field->ext_type = MP_DECIMAL;
Expand Down
47 changes: 47 additions & 0 deletions src/lua/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ static uint32_t CTID_STRUCT_IBUF;
static uint32_t CTID_STRUCT_IBUF_PTR;
uint32_t CTID_CHAR_PTR;
uint32_t CTID_CONST_CHAR_PTR;
uint32_t CTID_VARBINARY;
uint32_t CTID_UUID;
uint32_t CTID_DATETIME = 0;
uint32_t CTID_INTERVAL = 0;
Expand Down Expand Up @@ -156,6 +157,48 @@ luaT_pushvclock(struct lua_State *L, const struct vclock *vclock)
luaL_setmaphint(L, -1); /* compact flow */
}

/*
* Note: varbinary is a VLS object so we can't use luaL_pushcdata and
* luaL_checkcdata helpers.
*/
void
luaT_pushvarbinary(struct lua_State *L, const char *data, uint32_t len)
{
assert(CTID_VARBINARY != 0);
/* Calculate the cdata size. */
CTState *cts = ctype_cts(L);
CType *ct = ctype_raw(cts, CTID_VARBINARY);
CTSize size;
CTInfo info = lj_ctype_info(cts, CTID_VARBINARY, &size);
size = lj_ctype_vlsize(cts, ct, (CTSize)len);
assert(size != CTSIZE_INVALID);
/* Allocate a new cdata. */
GCcdata *cd = lj_cdata_newx(cts, CTID_VARBINARY, size, info);
/* Anchor the uninitialized cdata with the stack. */
TValue *o = L->top;
setcdataV(L, o, cd);
incr_top(L);
/* Initialize the cdata. */
memcpy(cdataptr(cd), data, len);
lj_gc_check(L);
}

const char *
luaT_tovarbinary(struct lua_State *L, int index, uint32_t *len)
{
assert(CTID_VARBINARY != 0);
TValue *o = index2adr(L, index);
if (!tviscdata(o))
return NULL;
GCcdata *cd = cdataV(o);
if (cd->ctypeid != CTID_VARBINARY)
return NULL;
CTSize size = cdatavlen(cd);
assert(size != CTSIZE_INVALID);
*len = size;
return cdataptr(cd);
}

struct tt_uuid *
luaT_newuuid(struct lua_State *L)
{
Expand Down Expand Up @@ -909,6 +952,10 @@ tarantool_lua_utils_init(struct lua_State *L)
assert(CTID_CHAR_PTR != 0);
CTID_CONST_CHAR_PTR = luaL_ctypeid(L, "const char *");
assert(CTID_CONST_CHAR_PTR != 0);
rc = luaL_cdef(L, "struct varbinary { char data[?]; };");
assert(rc == 0);
CTID_VARBINARY = luaL_ctypeid(L, "struct varbinary");
assert(CTID_VARBINARY != 0);
rc = luaL_cdef(L, "struct tt_uuid {"
"uint32_t time_low;"
"uint16_t time_mid;"
Expand Down
15 changes: 15 additions & 0 deletions src/lua/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,11 +72,26 @@ extern struct lua_State *tarantool_L;

extern uint32_t CTID_CHAR_PTR;
extern uint32_t CTID_CONST_CHAR_PTR;
/** Type ID of struct varbinary. */
extern uint32_t CTID_VARBINARY;
extern uint32_t CTID_UUID;
extern uint32_t CTID_DATETIME;
/** Type ID of struct interval. */
extern uint32_t CTID_INTERVAL;

/**
* Pushes a new varbinary object with the given content to the Lua stack.
*/
void
luaT_pushvarbinary(struct lua_State *L, const char *data, uint32_t len);

/**
* If the value stored in the Lua stack at the given index is a varbinary
* object, returns its content, otherwise returns NULL.
*/
const char *
luaT_tovarbinary(struct lua_State *L, int index, uint32_t *len);

/**
* Push vclock to the Lua stack as a plain Lua table.
*/
Expand Down
65 changes: 65 additions & 0 deletions src/lua/varbinary.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
local ffi = require('ffi')

ffi.cdef([[
int memcmp(const char *s1, const char *s2, size_t n);
]])

local memcmp = ffi.C.memcmp

local const_char_ptr_t = ffi.typeof('const char *')
local varbinary_t = ffi.typeof('struct varbinary')

local function is_varbinary(obj)
return ffi.istype(varbinary_t, obj)
end

local function new_varbinary(data, size)
if data == nil then
size = 0
elseif type(data) == 'string' then
size = #data
elseif ffi.istype(varbinary_t, data) then
size = ffi.sizeof(data)
elseif not ffi.istype(const_char_ptr_t, data) or type(size) ~= 'number' then
error('Usage: varbinary.new(str) or varbinary.new(ptr, size)', 2)
end
local bin = ffi.new(varbinary_t, size)
ffi.copy(bin, data, size)
return bin
end

local function varbinary_len(bin)
assert(ffi.istype(varbinary_t, bin))
return ffi.sizeof(bin)
end

local function varbinary_tostring(bin)
assert(ffi.istype(varbinary_t, bin))
return ffi.string(bin, ffi.sizeof(bin))
end

local function varbinary_eq(a, b)
if not (type(a) == 'string' or ffi.istype(varbinary_t, a)) or
not (type(b) == 'string' or ffi.istype(varbinary_t, b)) then
return false
end
local size_a = #a
local size_b = #b
if size_a ~= size_b then
return false
end
local data_a = ffi.cast(const_char_ptr_t, a)
local data_b = ffi.cast(const_char_ptr_t, b)
return memcmp(data_a, data_b, size_a) == 0
end

ffi.metatype(varbinary_t, {
__len = varbinary_len,
__tostring = varbinary_tostring,
__eq = varbinary_eq,
})

return {
is = is_varbinary,
new = new_varbinary,
}

0 comments on commit ba749e8

Please sign in to comment.