Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON string to Map/List #20295

Closed
Ethan-DeBandi99 opened this issue Jul 25, 2022 · 11 comments
Closed

JSON string to Map/List #20295

Ethan-DeBandi99 opened this issue Jul 25, 2022 · 11 comments

Comments

@Ethan-DeBandi99
Copy link

Summary

Steps to Reproduce

Source Code:
Please note: Single quotes used here, but have also tried with \". Is this correct? It works when reading into a record.

var json_str = "{'key1': 'val1', 'key2': 'val2'}";
var newmem = openmem();
        newmem.writer().write(json_str);
        var nreader = newmem.reader();
        var m: map(keyType=string, valType=string);
        m.readThis(nreader);

Produces This Error

uncaught BadFormatError: bad format: missing expected literal (while reading ioLiteral "}" with path "unknown" offset 1)
  test_mod.chpl:14: thrown here
  test_mod.chpl:8: uncaught here

I receive similar error messages to the above for reading in a list. I did not provide additional source code because it is essentially the same, but I can provide it if needed.

chpl --version

chpl version 1.27.0
  built with LLVM version 14.0.6
Copyright 2020-2022 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)

printchplenv

machine info: Darwin ethans-mbp.lan 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64
CHPL_HOME: /opt/homebrew/Cellar/chapel/1.27.0/libexec
script location: /opt/homebrew/Cellar/chapel/1.27.0/libexec/util/chplenv
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: arm64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: fifo
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: none
CHPL_HWLOC: none
CHPL_RE2: bundled
CHPL_LLVM: system
CHPL_AUX_FILESYS: none
@lydia-duncan
Copy link
Member

Hello and thanks for the report! I'm not surprised this doesn't behave as expected today, but I think it's reasonable to want. Note that we're currently in the process of revamping our IO implementation and will be implementing a new strategy for handling JSON more naturally. We'll try to keep this case in mind as we do, but this effort is a bit extensive so it might be a while before you see it available.

@bradcray
Copy link
Member

bradcray commented Aug 6, 2022

I'm nothing like a JSON expert, but am wondering whether—in the meantime before the new JSON encoder that Lydia mentions here—it might be possible to use the regex package on the string in question (or maybe just string operations themselves) in order to manually parse the string and populate the map or list as you go? I suppose the answer to that might depend on how simple/flat/straightforward the JSON string is vs. the extent to which it's deeply hierarchical or contains challenging special cases like escape characters and the like. E.g., I'm imagining something like a stringToMap() or stringToList() function. Or maybe that's what you're doing already and just wishing that there was more of a built-in capability for doing this?

@Ethan-DeBandi99
Copy link
Author

Thank you both for the update. @bradcray - I think for the time being we have a work around in place that fits our needs. We ended up needing more information for components so a simple map actually would have ended up a bit more complicated overall. For the time being, I think we are good with the current functionality, but I am very excited to see the updated JSON handling when it is available.

@bradcray
Copy link
Member

bradcray commented Aug 8, 2022

Great, thanks for the update Ethan.

@Ethan-DeBandi99
Copy link
Author

@lydia-duncan, @bradcray - I just wanted to get an update on this because it is still causing a lot of heartburn for Arkouda. We are increasingly seeing instances where we need to take a json formatted string and create a map from it. Some cases are quite simple, but others are more complex (nested). I understand in a most cases we will get a map(string, string), but that should be all we really need. This is listed as functional with the Formatted I/O docs, but is not working. I just wanted to bring it up again because we are repeatedly having to work around this.

@bradcray
Copy link
Member

Noting that using readf("%t", myMap); gives the same error as in the OP:

use IO, Map;
var m: map(keyType=string, valType=string);
readf("%jt", m);
writeln(m);

ATO

@benharsh
Copy link
Member

benharsh commented May 10, 2023

This will be a bit of a long response, so I'll try to summarize the main points here:

  • I have a PR merged that fixes %jt for maps and lists: Add some basic JSON IO support to standard Map and List #22279
  • We have been working on a new serialization API for reading and writing types in a particular format, and I will include an example in this post
  • These new "serializers" are still going through our stabilization process, so minor aspects of my example may change.

Also, JSON requires double-quotes for strings, and I don't think single-quotes are allowed, so all my examples will be using double-quotes.


Historically we have supported %jt as a format option in readf and writef. PR #22279 modifies the standard list and map types to work better with that formatter option. Translating your original example would look like this:

use IO;
use Map;

proc main() {
  var json_str = '{"key1": "val1", "key2": "val2"}';
  var newmem = openMemFile();
  newmem.writer().write(json_str);
  var nreader = newmem.reader();
  var m: map(keyType=string, valType=string);
  // note: 'readThis' is meant for the IO module to invoke, not users
  // m.readThis(nreader);
  // instead use...
  nreader.readf("%jt", m);
  writeln(m);
  // prints (in Chapel's default format, not JSON)
  // {key2: val2, key1: val1}
}

The list type works in much the same way.

I tested various types of data, which I will list here. You should be able to read these JSON objects into the noted Chapel types:

JSON object Chapel type
[1, 2, 3, 4] list(int)
["val1", "val2", "val3", "val4"] list(string)
[[1, 2, 3], [4, 5, 6]] list(list(int))
{"key2": "val2", "key1": "val1"} map(string,string)
{"y": {"S": 33, "R": 99}, "x": {"A": 5, "B": 42}} map(string, map(string,int))
[{"A": 1, "B": 2}, {"Y": 4, "X": 3}] list(map(string,int))
{"y": [4, 5, 6], "x": [1, 2, 3]} map(string,list(int))

There are some other examples in the PR that demonstrate success working with records that contain simple types.


"Serializers" are a new feature we are working on to replace %jt and other baked-in formatter options. The general idea is that you will create a fileReader with a specified deserializer, and then use read calls as you normally would. Your example would then be translated to:

use IO;
use Map;
use Json;

proc main() {
  var json_str = '{"key1": "val1", "key2": "val2"}';
  var newmem = openMemFile();
  newmem.writer().write(json_str);
  var nreader = newmem.reader(deserializer=new JsonDeserializer());
  var m : map(keyType=string, valType=string);
  nreader.read(m);

  // Alternatively, no need to default-initialize the map
  //var m = nreader.read(map(string,string));

  writeln(m);
}

Writing deserializer=new JsonDeserializer() is obviously more verbose than %jt, but this feature provides us as developers, or anyone else, with the means to more easily add other kinds of formats (e.g. YAML). There are corresponding "serializer" types/arguments for fileWriter. Serializers/Deserializers also work with the table of types above.

A nice feature of the JsonSerializer is that it can convert non-string map keys into strings, and vice versa. For example, representing a map(int, string) might look something like this:

{
  "713": "Houston", 
  "212": "NYC", 
  "206": "Seattle"
}

This functionality can also support using a record as a key. I can speak more to that if anyone's interested.

Some advantages of serializers:

  • better memory usage when initializing a type from IO
  • not baked into impenetrable internal code, so easier to modify and add functionality
  • able to work with most types out of the box

This is a relatively recent feature that still needs to undergo some review, but I believe the example code using JsonDeserializer is representative of what we will end up calling "stable". Initially I expect these types to be available through a Json package module.

Serializers have been a focus of mine recently, so if there's some particular use-case you are concerned about, I'd be interested in hearing more! I hope this work will help to relieve the issues you are running into.

@Ethan-DeBandi99
Copy link
Author

@benharsh - thanks for the info. Are the serializers available in Chapel 1.30? I tried adding the serializer code in for a case and it is not found. I wasn't able to find it in the Chapel docs either, but maybe I did not look hard enough. Maybe I am jumping ahead and these are not yet available and if so no problem, I look forward to the addition.

I have tried using readf previously and did not have much luck. I tried it again this morning for more clarity, here is the JSON string:

{"codes": "id_OACLOtw_12", "categories": "id_OACLOtw_18", "NA_codes": "id_OACLOtw_19", "permutation": "id_OACLOtw_8", "segments": "id_OACLOtw_7"}

This string is generated using json.dumps from our Python client and sent to the Arkouda Chapel code. I believe that the format here matches the format of the test case provided, but I am still getting this error:

RuntimeError: bad format: missing expected literal (while reading ioLiteral "}" with path "unknown" offset 1)

@benharsh
Copy link
Member

Sorry for not being clear, the serializers are not available in 1.30 as the API needs more discussion before stabilizing, and because the implementation was also not ready for 1.30.

The readf fix was also just merged last week, so unless you are working off of the main branch then I don't believe you'll be able to benefit. I tried your string on a local build from the latest commit and it appears to work, for what it's worth:

use IO;
use Map;

proc main() {
  var f = openMemFile();
  {
    // syntax highlighting isn't great here, but """ indicates an uninterpreted string literal
    var str = """{"codes": "id_OACLOtw_12", "categories": "id_OACLOtw_18", "NA_codes": "id_OACLOtw_19", "permutation": "id_OACLOtw_8", "segments": "id_OACLOtw_7"}""";
    f.writer().write(str);
  }
  {
    var r = f.reader();
    var m : map(string, string);
    r.readf("%jt", m);
    writeln(m); // printing in the "default" format
  }
}

This program prints:

{NA_codes: id_OACLOtw_19, categories: id_OACLOtw_18, permutation: id_OACLOtw_8, segments: id_OACLOtw_7, codes: id_OACLOtw_12}

@bradcray
Copy link
Member

@Ethan-DeBandi99 has confirmed that Ben's PR worked for him, so I'm closing this as resolved by #22279 and it'll be available in Chapel 1.31.

@bradcray
Copy link
Member

(Thanks for the quick turnaround once this got marked as high priority, @benharsh!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants