Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Parsing of Array of Tables #16

Closed
gerrymanoim opened this issue Apr 24, 2017 · 18 comments
Closed

Bad Parsing of Array of Tables #16

gerrymanoim opened this issue Apr 24, 2017 · 18 comments

Comments

@gerrymanoim
Copy link

Here's my toml file, taken straight from the spec page:

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

Using the following code:

library(RcppTOML)
cfg <- parseTOML("toml_test.toml")
str(cfg)

I get the following output:

Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$         : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
 - attr(*, "class")= chr [1:2] "toml" "list"
 - attr(*, "file")= chr "toml_test.toml"

It seems to me that in both situations "variety" doesn't actually seem to be parsed as a list, but instead parsed an an empty character?

cfg$fruit
[[1]]
[[1]]$name
[1] "apple"

[[1]]$physical
[[1]]$physical$color
[1] "red"

[[1]]$physical$shape
[1] "round"


[[1]][[3]]
[1] "variety"


[[2]]
[[2]]$name
[1] "banana"

[[2]][[2]]
[1] "variety"

Session Info:

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppTOML_0.1.2

loaded via a namespace (and not attached):
 [1] R6_2.2.0        magrittr_1.5    IRdisplay_0.4.4 pbdZMQ_0.2-4   
 [5] tools_3.3.2     Rcpp_0.12.10    crayon_1.3.2    uuid_0.1-2     
 [9] stringi_1.1.5   IRkernel_0.7.1  jsonlite_1.4    stringr_1.2.0  
[13] digest_0.6.12   repr_0.12.0     evaluate_0.10  
@eddelbuettel
Copy link
Owner

Confirming -- I see the same result. Also try setting verbose=TRUE to see this:

<default print method>
[[fruit]]
        name = "apple"
        [fruit.physical]
                color = "red"
                shape = "round"
        [[fruit.variety]]
                name = "red delicious"
        [[fruit.variety]]
                name = "granny smith"
[[fruit]]
        name = "banana"
        [[fruit.variety]]
                name = "plantain"
</default print method>

@eddelbuettel
Copy link
Owner

But look closely at the spec page. I suspect there is a typo -- [[fruit.physical]] seems more logical for the TOML file. If we do that we get

R> parseToml("/tmp/issue16.toml")
Other: physical
Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
R> 

@gerrymanoim
Copy link
Author

gerrymanoim commented Apr 24, 2017

The problem seems to be having an array of tables inside of another table.

The following works fine:

[[fruit]]
  name = "apple"

[[fruit]]
  name = "banana"

This breaks:

[[fruit]]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

As does this:

[fruit]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

The produced JSON on the spec page makes my thing [fruit.physical] was not a typo, i.e. it shows the output as an object

"physical": {
        "color": "red",
        "shape": "round"
      },

rather than an array of objects.

@eddelbuettel
Copy link
Owner

I was not talking about the produced JSON; that is precisely how I surmised that the shown input is not what was meant. You need double [[ .. ]] not single.

@gerrymanoim
Copy link
Author

Sorry - I'm not sure I follow? Though I think this is all separate from the issue here, let me see if I can articulate why I'm confused by what you're saying.

My understanding of that section of the TOML spec is that any [[table]] will produce a array named table with the elements being objects in that section.

For the fruit example, which seems to be demonstrating nested tables, I would expect an array named fruit, with fruit objects in it, the first one having a physical object and an array of objects named variety (i..e what you see in the JSON). I don't necessarily see why physical needs to be an array of tables. Sure it could be, but the output makes me think it is just a subtable of the first fruit table.

Apologies if there's something obvious I'm missing here.

@eddelbuettel
Copy link
Owner

After just committing one minor fix to quieten one print:

edd@max:~/git/rcpptoml(master)$ cat /tmp/issue16.toml 
[[fruit]]
  name = "apple"

  [[fruit.physical]]
      color = "red"
          shape = "round"

  [[fruit.variety]]
      name = "red delicious"

  [[fruit.variety]]
      name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
      name = "plantain"
edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
edd@max:~/git/rcpptoml(master)$ 

Note that I altered the TOML file.

@eddelbuettel
Copy link
Owner

But I hear you on the lack of proper nesting / recursion. That may or may not be easy. I'll take a look.

@gerrymanoim
Copy link
Author

But fruit[[1]][[2]] and fruit[[1]][[3]] are still just chrs with value physical and variety. There's no way to access "granny smith"?

@eddelbuettel
Copy link
Owner

As I just said:

But I hear you on the lack of proper nesting / recursion.

@gerrymanoim
Copy link
Author

Ah understood, thanks.

@eddelbuettel
Copy link
Owner

The recursion for the 'TableArray' was simply missing:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 

Looks better, right?

@eddelbuettel
Copy link
Owner

eddelbuettel commented Apr 24, 2017

And if I revert the input to just one [ ]:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 

@gerrymanoim
Copy link
Author

Yep - looks perfect!

@eddelbuettel
Copy link
Owner

Now comitted. Thanks for catching that!

@gerrymanoim
Copy link
Author

Just out of curiosity, why are the lists sometimes lists and sometimes pairlists? It doesn't seem to make any difference, I'm just curious in terms of the implementation as to why this happens.

In your example above it seems to sometimes come out one way, sometimes another way:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"

@eddelbuettel
Copy link
Owner

It may just be a side effect of mixing the Rcpp types StretchyList and List....

If you want to dig deeper, could I suggest that you chase that yourself in the code? It's just a few hundred lines, and reasonably good fun to follow. Enabling verbose parse gives you pointers.

@gerrymanoim
Copy link
Author

Thanks for the pointer, you were exactly right!

I don't know if this is the "right" fix (my C++ knowledge is extremely limited) but making the following two changes in parse.cpp changed everything to be lists:

diff --git a/src/parse.cpp b/src/parse.cpp
index dd4228d..b7729f4 100644
--- a/src/parse.cpp
+++ b/src/parse.cpp
@@ -252,7 +252,7 @@ SEXP getTable(const std::shared_ptr<cpptoml::table>& t, bool verbose=false) {
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));
         } else {
             if (verbose) Rcpp::Rcout << "Other: " << p.first << std::endl;
             sl.push_back(p.first);
@@ -300,7 +300,7 @@ Rcpp::List tomlparseImpl(const std::string input, bool verbose=false, bool fromf
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));

         } else if (p.second->is_table()) {
             auto ga = std::dynamic_pointer_cast<cpptoml::table>(p.second);
(END)

Appreciate all the help and the package work. Cheers!

@eddelbuettel
Copy link
Owner

eddelbuettel commented May 2, 2017

Thanks for working the details out. That is nice and straightforward and I committed this now as 7aae502

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants