Skip to content
Mu Li edited this page Jan 21, 2016 · 1 revision

Data Structures

Input data

Input data are represented by sparse example-by-feature matrices, and stored with row-major order.

Feature Index

Key value pairs

Difacto uses key-value pairs as the major way for data communication. A key, often the feature index, is always an unsigned 64-bit integer, while value can be a single or a vector of number. To store a list of pairs, we concatenate keys and values into vectors separately, and then maintain an offset to store the position of the i-th value in the value vector (the offset can be skipped if all values have the same length.)

For example, assume we have three key-value pairs {1, 3}, {3, 6}, {9, 3}. Then we store them by

keys = [1, 3, 9]
values = [3, 6, 3]

Consider pairs with vector value {1, [3, 2, 4]}, {3, [6]}, {8, []}, {9, [3, 8]}. We store them by

keys = [1, 3, 8, 9]
values = [3, 2, 4, 6, 3, 8]
value_offsets = [0, 3, 4, 4, 6]

User Guides

Development Notes

Clone this wiki locally