Bad Parsing of Array of Tables #16

Closed
newtux opened this Issue Apr 24, 2017 · 18 comments

Comments

Projects
None yet
2 participants
@newtux

newtux commented Apr 24, 2017

Here's my toml file, taken straight from the spec page:

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

Using the following code:

library(RcppTOML)
cfg <- parseTOML("toml_test.toml")
str(cfg)

I get the following output:

Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$         : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
 - attr(*, "class")= chr [1:2] "toml" "list"
 - attr(*, "file")= chr "toml_test.toml"

It seems to me that in both situations "variety" doesn't actually seem to be parsed as a list, but instead parsed an an empty character?

cfg$fruit
[[1]]
[[1]]$name
[1] "apple"

[[1]]$physical
[[1]]$physical$color
[1] "red"

[[1]]$physical$shape
[1] "round"


[[1]][[3]]
[1] "variety"


[[2]]
[[2]]$name
[1] "banana"

[[2]][[2]]
[1] "variety"

Session Info:

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppTOML_0.1.2

loaded via a namespace (and not attached):
 [1] R6_2.2.0        magrittr_1.5    IRdisplay_0.4.4 pbdZMQ_0.2-4   
 [5] tools_3.3.2     Rcpp_0.12.10    crayon_1.3.2    uuid_0.1-2     
 [9] stringi_1.1.5   IRkernel_0.7.1  jsonlite_1.4    stringr_1.2.0  
[13] digest_0.6.12   repr_0.12.0     evaluate_0.10  
@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

Confirming -- I see the same result. Also try setting verbose=TRUE to see this:

<default print method>
[[fruit]]
        name = "apple"
        [fruit.physical]
                color = "red"
                shape = "round"
        [[fruit.variety]]
                name = "red delicious"
        [[fruit.variety]]
                name = "granny smith"
[[fruit]]
        name = "banana"
        [[fruit.variety]]
                name = "plantain"
</default print method>
Owner

eddelbuettel commented Apr 24, 2017

Confirming -- I see the same result. Also try setting verbose=TRUE to see this:

<default print method>
[[fruit]]
        name = "apple"
        [fruit.physical]
                color = "red"
                shape = "round"
        [[fruit.variety]]
                name = "red delicious"
        [[fruit.variety]]
                name = "granny smith"
[[fruit]]
        name = "banana"
        [[fruit.variety]]
                name = "plantain"
</default print method>
@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

But look closely at the spec page. I suspect there is a typo -- [[fruit.physical]] seems more logical for the TOML file. If we do that we get

R> parseToml("/tmp/issue16.toml")
Other: physical
Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
R> 
Owner

eddelbuettel commented Apr 24, 2017

But look closely at the spec page. I suspect there is a typo -- [[fruit.physical]] seems more logical for the TOML file. If we do that we get

R> parseToml("/tmp/issue16.toml")
Other: physical
Other: variety
Other: variety
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
R> 
@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux Apr 24, 2017

The problem seems to be having an array of tables inside of another table.

The following works fine:

[[fruit]]
  name = "apple"

[[fruit]]
  name = "banana"

This breaks:

[[fruit]]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

As does this:

[fruit]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

The produced JSON on the spec page makes my thing [fruit.physical] was not a typo, i.e. it shows the output as an object

"physical": {
        "color": "red",
        "shape": "round"
      },

rather than an array of objects.

newtux commented Apr 24, 2017

The problem seems to be having an array of tables inside of another table.

The following works fine:

[[fruit]]
  name = "apple"

[[fruit]]
  name = "banana"

This breaks:

[[fruit]]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

As does this:

[fruit]
  name = "apple"
  [[fruit.variety]]
    name = "red delicious"

The produced JSON on the spec page makes my thing [fruit.physical] was not a typo, i.e. it shows the output as an object

"physical": {
        "color": "red",
        "shape": "round"
      },

rather than an array of objects.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

I was not talking about the produced JSON; that is precisely how I surmised that the shown input is not what was meant. You need double [[ .. ]] not single.

Owner

eddelbuettel commented Apr 24, 2017

I was not talking about the produced JSON; that is precisely how I surmised that the shown input is not what was meant. You need double [[ .. ]] not single.

@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux Apr 24, 2017

Sorry - I'm not sure I follow? Though I think this is all separate from the issue here, let me see if I can articulate why I'm confused by what you're saying.

My understanding of that section of the TOML spec is that any [[table]] will produce a array named table with the elements being objects in that section.

For the fruit example, which seems to be demonstrating nested tables, I would expect an array named fruit, with fruit objects in it, the first one having a physical object and an array of objects named variety (i..e what you see in the JSON). I don't necessarily see why physical needs to be an array of tables. Sure it could be, but the output makes me think it is just a subtable of the first fruit table.

Apologies if there's something obvious I'm missing here.

newtux commented Apr 24, 2017

Sorry - I'm not sure I follow? Though I think this is all separate from the issue here, let me see if I can articulate why I'm confused by what you're saying.

My understanding of that section of the TOML spec is that any [[table]] will produce a array named table with the elements being objects in that section.

For the fruit example, which seems to be demonstrating nested tables, I would expect an array named fruit, with fruit objects in it, the first one having a physical object and an array of objects named variety (i..e what you see in the JSON). I don't necessarily see why physical needs to be an array of tables. Sure it could be, but the output makes me think it is just a subtable of the first fruit table.

Apologies if there's something obvious I'm missing here.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

After just committing one minor fix to quieten one print:

edd@max:~/git/rcpptoml(master)$ cat /tmp/issue16.toml 
[[fruit]]
  name = "apple"

  [[fruit.physical]]
      color = "red"
          shape = "round"

  [[fruit.variety]]
      name = "red delicious"

  [[fruit.variety]]
      name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
      name = "plantain"
edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
edd@max:~/git/rcpptoml(master)$ 

Note that I altered the TOML file.

Owner

eddelbuettel commented Apr 24, 2017

After just committing one minor fix to quieten one print:

edd@max:~/git/rcpptoml(master)$ cat /tmp/issue16.toml 
[[fruit]]
  name = "apple"

  [[fruit.physical]]
      color = "red"
          shape = "round"

  [[fruit.variety]]
      name = "red delicious"

  [[fruit.variety]]
      name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
      name = "plantain"
edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name: chr "apple"
  .. ..$     : chr "physical"
  .. ..$     : chr "variety"
  ..$ :List of 2
  .. ..$ name: chr "banana"
  .. ..$     : chr "variety"
edd@max:~/git/rcpptoml(master)$ 

Note that I altered the TOML file.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

But I hear you on the lack of proper nesting / recursion. That may or may not be easy. I'll take a look.

Owner

eddelbuettel commented Apr 24, 2017

But I hear you on the lack of proper nesting / recursion. That may or may not be easy. I'll take a look.

@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux Apr 24, 2017

But fruit[[1]][[2]] and fruit[[1]][[3]] are still just chrs with value physical and variety. There's no way to access "granny smith"?

newtux commented Apr 24, 2017

But fruit[[1]][[2]] and fruit[[1]][[3]] are still just chrs with value physical and variety. There's no way to access "granny smith"?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

As I just said:

But I hear you on the lack of proper nesting / recursion.

Owner

eddelbuettel commented Apr 24, 2017

As I just said:

But I hear you on the lack of proper nesting / recursion.

@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux Apr 24, 2017

Ah understood, thanks.

newtux commented Apr 24, 2017

Ah understood, thanks.

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

The recursion for the 'TableArray' was simply missing:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 

Looks better, right?

Owner

eddelbuettel commented Apr 24, 2017

The recursion for the 'TableArray' was simply missing:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 

Looks better, right?

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

And if I revert the input to just one [ ]:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 
Owner

eddelbuettel commented Apr 24, 2017

And if I revert the input to just one [ ]:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:List of 2
  .. .. ..$ color: chr "red"
  .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
edd@max:~/git/rcpptoml(master)$ 
@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux Apr 24, 2017

Yep - looks perfect!

newtux commented Apr 24, 2017

Yep - looks perfect!

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 24, 2017

Owner

Now comitted. Thanks for catching that!

Owner

eddelbuettel commented Apr 24, 2017

Now comitted. Thanks for catching that!

@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux Apr 28, 2017

Just out of curiosity, why are the lists sometimes lists and sometimes pairlists? It doesn't seem to make any difference, I'm just curious in terms of the implementation as to why this happens.

In your example above it seems to sometimes come out one way, sometimes another way:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"

newtux commented Apr 28, 2017

Just out of curiosity, why are the lists sometimes lists and sometimes pairlists? It doesn't seem to make any difference, I'm just curious in terms of the implementation as to why this happens.

In your example above it seems to sometimes come out one way, sometimes another way:

edd@max:~/git/rcpptoml(master)$ r -lRcppTOML -e 'print(parseToml("/tmp/issue16.toml"))'
List of 1
 $ fruit:Dotted pair list of 2
  ..$ :List of 3
  .. ..$ name    : chr "apple"
  .. ..$ physical:Dotted pair list of 1
  .. .. ..$ :List of 2
  .. .. .. ..$ color: chr "red"
  .. .. .. ..$ shape: chr "round"
  .. ..$ variety :Dotted pair list of 2
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "red delicious"
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "granny smith"
  ..$ :List of 2
  .. ..$ name   : chr "banana"
  .. ..$ variety:Dotted pair list of 1
  .. .. ..$ :List of 1
  .. .. .. ..$ name: chr "plantain"
@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel Apr 28, 2017

Owner

It may just be a side effect of mixing the Rcpp types StretchyList and List....

If you want to dig deeper, could I suggest that you chase that yourself in the code? It's just a few hundred lines, and reasonably good fun to follow. Enabling verbose parse gives you pointers.

Owner

eddelbuettel commented Apr 28, 2017

It may just be a side effect of mixing the Rcpp types StretchyList and List....

If you want to dig deeper, could I suggest that you chase that yourself in the code? It's just a few hundred lines, and reasonably good fun to follow. Enabling verbose parse gives you pointers.

@newtux

This comment has been minimized.

Show comment
Hide comment
@newtux

newtux May 2, 2017

Thanks for the pointer, you were exactly right!

I don't know if this is the "right" fix (my C++ knowledge is extremely limited) but making the following two changes in parse.cpp changed everything to be lists:

diff --git a/src/parse.cpp b/src/parse.cpp
index dd4228d..b7729f4 100644
--- a/src/parse.cpp
+++ b/src/parse.cpp
@@ -252,7 +252,7 @@ SEXP getTable(const std::shared_ptr<cpptoml::table>& t, bool verbose=false) {
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));
         } else {
             if (verbose) Rcpp::Rcout << "Other: " << p.first << std::endl;
             sl.push_back(p.first);
@@ -300,7 +300,7 @@ Rcpp::List tomlparseImpl(const std::string input, bool verbose=false, bool fromf
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));

         } else if (p.second->is_table()) {
             auto ga = std::dynamic_pointer_cast<cpptoml::table>(p.second);
(END)

Appreciate all the help and the package work. Cheers!

newtux commented May 2, 2017

Thanks for the pointer, you were exactly right!

I don't know if this is the "right" fix (my C++ knowledge is extremely limited) but making the following two changes in parse.cpp changed everything to be lists:

diff --git a/src/parse.cpp b/src/parse.cpp
index dd4228d..b7729f4 100644
--- a/src/parse.cpp
+++ b/src/parse.cpp
@@ -252,7 +252,7 @@ SEXP getTable(const std::shared_ptr<cpptoml::table>& t, bool verbose=false) {
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));
         } else {
             if (verbose) Rcpp::Rcout << "Other: " << p.first << std::endl;
             sl.push_back(p.first);
@@ -300,7 +300,7 @@ Rcpp::List tomlparseImpl(const std::string input, bool verbose=false, bool fromf
                 l.push_back (getTable(ta, verbose));
                 ++ait;
             }
-            sl.push_back(Rcpp::Named(p.first) = l);
+            sl.push_back(Rcpp::Named(p.first) = Rcpp::as<Rcpp::List>(l));

         } else if (p.second->is_table()) {
             auto ga = std::dynamic_pointer_cast<cpptoml::table>(p.second);
(END)

Appreciate all the help and the package work. Cheers!

@eddelbuettel

This comment has been minimized.

Show comment
Hide comment
@eddelbuettel

eddelbuettel May 2, 2017

Owner

Thanks for working the details out. That is nice and straightforward and I committed this now as 7aae502

Owner

eddelbuettel commented May 2, 2017

Thanks for working the details out. That is nice and straightforward and I committed this now as 7aae502

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment