Fintan Halpenny edited this page Dec 1, 2017 · 11 revisions

The native C++ codebase can ingest JSON by passing --json

The following JSON format can be ingested into VW:

  • Top-level properties are considered features for the default namespace.
  • Top-level properties of type object or array are considered namespaces.
  • Features are JSON strings, integer, float, boolean, arrays of integers and/or floats.
  • Top-level properties starting with _ are ignored, except if they match a special property (e.g. "_label", "_multi", "_text").
  • Labels can be passed using top-level "_label" property. This is also supported for multiline examples, but the label needs to be part of one of the multiline examples.
  • If the JSON value is either a string, integer or float is converted to a string and passed directly to VW label parser.
  • If the JSON value is an object, the first property needs to match one of the JSON properties of SimpleLabel or ContextualBanditLabel.
  • Special text handling through "_text": properties named "_text" are processed using string splitting and not string escaping (see sample below).
  • Multiline examples as used by contextual bandits are specified by using the "_multi" property. Each entry itself is an example as described above and can optionally contain a label. The top-level properties are used for the optional shared example.

The C# layer can ingest

  • JSON strings
  • JSON.NET's JsonReader
  • C# objects serializable to the above JSON format using JSON.NET serializing rules. Thus JsonProperty annotations are inspected and so on. This is particularly useful if one needs to score a given object, then serialize it JSON and train from the JSON serialization as it circumvents the de-serialization for the scoring part.

Examples

JSON VW String
 
{
 "f1":25,"f2":true,
 "_aux":"some ignored info"
} 
 | f1:25 f2
 
{
 "ns1":{"location":"New York"},
 "f2":[1,0.2,3]
} 
 |ns1 locationNew_York | :1 :.2 :.3
{
 "ns1":{"location":"New York"},
 "ns2":{"f2":3.4},"_label":1
} 
1 |ns1 locationNew_York |ns2 f2:3.4
 
{
 "ns1":{"location":"New York", "f2":3.4},
 "_label":{"Label":2,"Weight":0.3}
} 
2 0.3 |ns1 locationNew_York f2:3.4
 
{
 "x":2,
 "_text":"elections US iowa"
} 
| x:2 elections US iowa
 
{
 "UserAge":15,
 "_multi":[
   {"_text":"elections maine", "Source":"TV"},
   {"Source":"www", "topic":4, "_label":"2:3:.3"}
 ]
} 
shared | UserAge:15
| elections maine SourceTV
2:3:.3 | Sourcewww topic:4
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.