Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dot syntax for getting value within tables #174

Closed
Markus-included opened this issue Nov 1, 2021 · 9 comments
Closed

Dot syntax for getting value within tables #174

Markus-included opened this issue Nov 1, 2021 · 9 comments

Comments

@Markus-included
Copy link

I would like to have convienient a syntax like this: toml::find(file, "Game.Name") to pull this value:

[Game]
Name = "Name of a game"
@ToruNiina
Copy link
Owner

In TOML, a kay can include a dot (though it is not the best practice).

"Game.Name" = "Name of a game"

This is a valid TOML file and the structure is equivalent to the following JSON.

{"Game.Name": "Name of a game"}

In toml11, toml::find(file, "Game.Name") corresponds to this case ("Game.Name" = "Name of a game").

Of course I know a key including dots is not recommended, but this library is for general purpose and it sometimes makes sense to use dots in a key. For example, IP address or URL. So I want to strictly distinguish those cases.

For your purpose, you can use toml::find(file, "Game", "Name") and, at least to me, it represents the recursive structure more clearly. The function can take as many number of keys you want. Does this function works with your case?

@mchaptel
Copy link

mchaptel commented Apr 23, 2023

I believe this:

This is a valid TOML file and the structure is equivalent to the following JSON.

{"Game.Name": "Name of a game"}

is wrong. A proper JSON equivalent would be:

{
  "Game" :
  {
    "Name" : "Name of a game"
  }
}

Otherwise this makes the use of the period character ambivalent. This is specified here: https://toml.io/en/v1.0.0#keys

The examples given of using IPs or URL are more for values than keys, as using IPs as keys would be problematic for the above reason.

I think it could be very useful to make an overload for this function which gets the full dot string and splits it then under the hood uses the multiple arguments find function to access the value. This would make the library able to access keys based on the inherent structure of the data inside the file as opposed to based on how it was written.

@marzer
Copy link

marzer commented Apr 24, 2023

@mchaptel If you have a dot in a quoted key in TOML, then that dot is simply a part of the key, not any indication of structure, so:

"Game.Name" = "Shooty McBlastFace"

is exactly equivalent to

{
	"Game.Name" : "Shooty McBlastFace"
}

You can see an example of this in the TOML spec section you've linked, under Quoted Keys:

"127.0.0.1" = "value"

This would be an absolutely useless KVP if each dot expanded to a sub-table 😅

Whether allowing that ambiguity was actually wise on part of the TOML spec is a subject for debate I suppose, but that ship has sailed unfortunately. TOML++ gets around this in at_path() simply by ignoring this possibility altogether because periods in keys is a bad idea. The spec doesn't provide any guidance on how fully-qualified key lookups should work in this situation so libraries will handle it all sorts of different ways (including avoiding it altogether, as in toml11's case).

@mchaptel
Copy link

mchaptel commented Apr 24, 2023

Hm, I didn't realize that. This is a very strange contradiction to me on the specs part. I guess it makes sense because of the optional inclusion of the quote marks but poses a problem for any compatibility with basically any other serialized format...

I have been using at_path extensively for that reason in fact. Thanks for letting me know, this is an interesting point to consider.

I did try to figure out a c++ split on the "." and a subsequent call to the toml::find function but calling a function with an arbitrarily large list of arguments determined at runtime seems to be another can of worms for c++...

@ToruNiina
Copy link
Owner

Thanks @marzer, your explanation is (of course) perfect.

@mchaptel,
As marzer said, quoted keys are different from bare keys. bare keys only contain 0-9a-zA-Z_-, but quoted keys may contain any unicode characters including dots, because the range of the key is explicit (quote should be escaped, though).

Generally a language spec is hard to read. If you find something that you think is weird, you can try other references. For toml, there is a language-agnostic test suite, toml-test, which contains many valid toml files with the corresponding json files. In this case, you can see many examples in tests/valid/key/quoted-dots.toml and .json. From the examples, you can see that the toml standard is well defined so there is no problem when converting a toml file to json (or any other config file format) and vice versa.
Both toml11 and toml++ (and many other well-known libraries) pass all the toml-tests. This means that these libraries conform to the standard and handle many edge cases correctly.

But I digress. The original comment is about a way to extract a value from a nested table. In the last comment, I pointed out that toml11 already has toml::find(table, key1, key2, ...) which does exactly this, and asked if this was sufficient or not.
I still think this function is sufficient because it works and there is no ambiguity.
The potential problem here is that toml::find only takes a fixed number of keys. In my experience, I have never found this to be a real problem, because the structure of the config file is mostly defined by the application itself that reads it. If there is a realistic and compelling situation where the nesting depth is unstable, I will consider implementing it.

Note that you can write dot-syntax based function by recursively toml::find-ing subtables after splitting a string by dots.

@mchaptel
Copy link

@ToruNiina
Thank you for your reply, and I understand better the quoted key aspect.
I have in fact tried to write a generic dot syntax function based on split and recurse but couldn't get it to work, I think maybe because of the auto keyword? I am worried it cast it to a specific type at initialization but that type may change while recursing?

Another thing I wanted to attempt was to split my string into a list of arguments I could then pass to the find function but it seems unsupported by c++, with which, you may have guessed, I'm not the most familiar with yet.

@marzer
Copy link

marzer commented Apr 25, 2023

Another thing I wanted to attempt was to split my string into a list of arguments I could then pass to the find function

@mchaptel indeed there's no way to do that sort of thing dynamically in C++ (like you might do it in e.g. python by unpacking a list), since C++ is all statically-compiled with no JIT component. Function arguments need to be known in advance, and can't be dynamically determined at runtime. Problems like this are typically solved with recursion (or by pre-populating a container like std::vector though that's not ideal for unbounded inputs).

The whole thing is made more complex if you also want to handle array indexing (e.g. toml::find(tbl, "a.b[2]")), since splitting on dots isn't sufficient in this case.

A 'best-of-both-worlds' approach is to parse through the key, querying for children whenever you encounter a subkey name or an array element, and early-exiting when you find something that doesn't exist or hit a parse error.

Quite a lot of work! Fortunately I have an implementation I'm happy for you to rip from: toml++'s at_path() is built on top of this implementation function: toml::impl::parse_path(). That gets called with some callbacks that take action when a subkey or index is encountered (here). That should be fully-featured enough that you could copy+paste it into whatever toy project you have with whatever C++ toml library you wish to use (e.g. if you wanted to pair toml11's comment preservation with toml++'s at_path()-style lookup and semantics).

@mchaptel
Copy link

Haha I did start looking at va_args, and, I'll follow your advice on that!

I really appreciate the help, and will look into the source code of at_path, thanks a lot! Thankfully for the array indexing, the project I have remains relatively simple in terms of what will be stored in the toml.

I'm still wondering about the comments as I've started to consider writing the comments in a "defaults config" file, and then serialize a new file for the user settings, which wouldn't need the comments necessarily. Having a custom at_path might make it the best of both worlds though.

Again, really appreciate your guidance on this!

@ToruNiina
Copy link
Owner

A function that uses recursion to find a value looks like the following.

#include "toml.hpp"

#include <deque>
#include <iostream>

template<typename T>
T find_recursive(const toml::value& v, std::deque<std::string> keys)
{
    if(keys.size() == 1)
    {
        return toml::find<T>(v, keys.front());
    }
    else
    {
        const toml::value& sub = toml::find(v, keys.front());
        keys.pop_front();
        return find_recursive<T>(sub, std::move(keys));
    }
}

int main()
{
    using namespace toml::literals::toml_literals;

    const auto v = R"(
        [fruit.banana]
        color.ripe = "yellow"
    )"_toml;

    std::cout << find_recursive<std::string>(v, {"fruit", "banana", "color", "ripe"}) << std::endl;

    return 0;
}

This discussion is getting long and a bit off topic. Since the new problem seems to be solved, I'm closing this issue.
If anyone has an example of a realistic and compelling situation where this feature is needed, please open a new issue. I will discuss it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants