Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow parsing of very long flow-style lines #289

Closed
captain-yoshi opened this issue Jul 31, 2022 · 1 comment · Fixed by #293
Closed

Slow parsing of very long flow-style lines #289

captain-yoshi opened this issue Jul 31, 2022 · 1 comment · Fixed by #293

Comments

@captain-yoshi
Copy link
Contributor

captain-yoshi commented Jul 31, 2022

When parsing a large 34MB file, it seems stuck in an infinite loop (waited 5 minutes). This file is almost the same as the one from #288 except that it has 500 flow style sequences of 10 000 floats each instead of a block style. The block style file takes less then a second to load... I'm using 17 precision floats (which are doubles) and used the ryml::_WIP_STYLE_FLOW_SL for generating the file.

Step to reproduce:

std::string loadFileToString(const std::string& path)
{
  std::ifstream ifs(path.c_str(), std::ios::in | std::ios::binary | std::ios::ate);
  
  std::ifstream::pos_type size = ifs.tellg();
  ifs.seekg(0, std::ios::beg);
  
  std::vector<char> bytes(size);
  
  ifs.read(bytes.data(), size);
  std::cout << "Read bytes finished" << std::endl;
  
  return std::string(bytes.data(), size);
}

int main()
{
  std::string path = "your_path";
  std::string buf = loadFileToString(path);

  ryml::Tree t;
  ryml::NodeRef n = t.rootref();

  // not tested with parse_in_place
  ryml::parse_in_arena("flow.yaml", ryml::to_csubstr(buf), n);
}
@captain-yoshi captain-yoshi changed the title Infinite loop when parsing 34MB file with flow sequence of 10 000 floats Infinite loop when parsing 34MB file with flow sequences of 10 000 floats Jul 31, 2022
@biojppm
Copy link
Owner

biojppm commented Aug 1, 2022

Wow, this one reveals something that was overlooked. Thanks for reporting.

There is an implicit assumption throughout the code that a YAML line is smallish. Because of subtle token-scanning intrincacies (eg searching for # before searching for ,), in some places (not many, but some), there will be a scan to the end of the line, which will cause quadratic complexity for single-line flow-style containers.

For example, here. The check for , should be done before this, and is being done after.

I will have to spend some time on this.

In the meantime, until there is a fix addressing places where this assumption was instilled, you will improve your parse speed substantially if you stick to block style in these cases, or if you break the lines (which I know may be a problem if you're using ryml to create these files as IIRC it still has no facility to break the lines when emitting).

@biojppm biojppm changed the title Infinite loop when parsing 34MB file with flow sequences of 10 000 floats Slow parsing of very long flow-style lines Aug 1, 2022
captain-yoshi added a commit to captain-yoshi/moveit_benchmark_suite that referenced this issue Aug 2, 2022
Parsing is currently slow with very long flow-style lines. Enables fast
parsing using block style at the cost of a larger file size.
captain-yoshi added a commit to captain-yoshi/moveit_benchmark_suite that referenced this issue Aug 3, 2022
Parsing is currently slow with very long flow-style lines. Enables fast
parsing using block style at the cost of a larger file size.
captain-yoshi added a commit to captain-yoshi/moveit_benchmark_suite that referenced this issue Aug 3, 2022
Parsing is currently slow with very long flow-style lines. Enables fast
parsing using block style at the cost of a larger file size.
captain-yoshi added a commit to captain-yoshi/moveit_benchmark_suite that referenced this issue Aug 3, 2022
Parsing is currently slow with very long flow-style lines. Enables fast
parsing using block style at the cost of a larger file size.
biojppm added a commit that referenced this issue Aug 17, 2022
termination tokens before potential scans to the end of the line.
biojppm added a commit that referenced this issue Aug 18, 2022
termination tokens before potential scans to the end of the line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants