Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

twitter-archive/torch-thrift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thrift

A codec based Thrift library for Torch. Supports very fast deserialization of arbitrary Thrift binary data to Lua native types. Also includes serialization of Lua native types back into Thrift binary based on a provided schema.

Reading

Thrift binary data is self descriptive. If you just want to quickly read it and convert it to Lua native types then no schema is required when creating the codec.

local thrift = require 'libthrift'
local codec = thrift.codec()
local binary = io.open('thrift_data.bin', 'r'):read('*all')
local result = codec:read(binary)
print(result)

You can also read Thrift using a schema, this allows for nicer naming of fields.

local thrift = require 'libthrift'
local codec = thrift.codec({
   ttype = "struct",
   fields = {
      [1] = { ttype = "i32", name = "an_int" },
      [2] = { ttype = "bool", name = "someBoolean" },
      [3] = { ttype = "list", value = "double", name = "vector" },
   }
})
local binary = io.open('thrift_data.bin', 'r'):read('*all')
local result = codec:read(binary)
print(result)

It is possible to read directly from a ByteTensor instead of a string using the readTensor function.

Writing

Writing Thrift binary requires a schema as there is no 1:1 mapping of Lua and Thrift types. We support all Thrift types and they can be nested indefinitely.

local thrift = require 'libthrift'
local codec = thrift.codec({
   ttype = "struct",
   fields = {
      [1] = "i32",
      [2] = "bool",
      [3] = { ttype = "list", value = "double" },
   }
})
local binary = codec:write({
   42,
   true,
   { 3.14, 13.13, 543.21 },
})
io.open('thrift_data.bin', 'w'):write(binary)

Just like reading, you can supply field names for better readability.

local thrift = require 'libthrift'
local codec = thrift.codec({
   ttype = "struct",
   fields = {
      [1] = { ttype = "i32", name = "an_int" },
      [2] = { ttype = "bool", name = "someBoolean" },
      [3] = { ttype = "list", value = "double", name = "vector" },
   }
})
local binary = codec:write({
   an_int = 42,
   someBoolean = true,
   vector = { 3.14, 13.13, 543.21 },
})
io.open('thrift_data.bin', 'w'):write(binary)

It is possible to write directly to a ByteTensor instead of a string using the writeTensor function.

Codec

The schema table passed into a codec during creation has a simple format. We support the following Thrift types.

  • void
  • bool
  • byte
  • double
  • i16
  • i32
  • i64
  • string
  • struct
  • map
  • set
  • list

A more complicated schema can be found below.

local descA = {
   ttype = "struct",
   fields = {
      [1] = "i32",
      [2] = "bool",
      [3] = { ttype = "list", value = "double" },
   }
}
local desc = {
   ttype = "struct",
   fields = {
      [1] = { ttype = "map", key = "i32", value = "i32" },
      [2] = { ttype = "map", key = "i64", value = { ttype = "set", value = "string" } },
      [3] = descA,
      [4] = { ttype = "list", value = descA },
      [5] = { ttype = "set", value = descA },
      [7] = { ttype = "map", key = "i16", value = descA },
   }
}

It corresponds to this Thrift file.

struct A {
   1: i32 x
   2: bool y
   3: list<double> z
}

struct B {
   1: map<i32, i32> a
   2: map<i64, set<string>> b
   3: A c
   4: list<A> d
   5: set<A> e
   7: map<i16, A> f
}

Lua and 64 bit integers

Lua 5.1 and earlier uses doubles as its internal number format. That means we can not represent the full range of i64 values natively in the Lua VM. The default behavior is to throw an error when reading or writing any value that would be out of range for either Thrift or for Lua. That works for most cases, however if you need the full range of i64 you can tell the codec to turn i64 values into strings or LongTensors (of size 1) and vice versa on write.

local codec1 = thrift.codec({ i64string = true })  -- i64 to strings
local codec2 = thrift.codec({ i64tensor = true })  -- i64 to LongTensors

Torch Tensors

Most of the time you want to map Thrift lists and sets directly to Torch Tensors. This can happen automatically for you by setting the tensors option to true. The following mapping will occur.

  • list or set converts to/from torch.ByteTensor
  • list or set converts to/from torch.DoubleTensor
  • list or set converts to/from torch.ShortTensor
  • list or set converts to/from torch.IntTensor
  • list or set converts to/from torch.LongTensor
local codec = thrift.codec({ tensors = true })

Releases

No releases published

Packages

No packages published