Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XML #8

Open
bovee opened this issue Sep 25, 2020 · 1 comment
Open

Support XML #8

bovee opened this issue Sep 25, 2020 · 1 comment

Comments

@bovee
Copy link
Owner

bovee commented Sep 25, 2020

This is mostly necessary to support XML-based file formats like some of the Agilent MassHunter formats, mzML, etc.

There are a couple existing streaming XML Rust parsers that we could possibly wrap, but it may be "easy" enough to just write one on top of the existing ReadBuffer interface:
https://github.com/netvl/xml-rs
https://github.com/tafia/quick-xml

Passing a raw XML file into entab should probably result in a stream with fields like:

  • Materialized path to current node (Value::List([key, subkey, ...]))
  • Attributes for current node (Value::Record<string, string? value?>)
  • Text for current node (String?)

We may not want to actually do that though because it will probably require saving up all the data and emitting the nodes post-traversal (which isn't the most natural format to view and requires more memory).

This was referenced Sep 25, 2020
@bovee
Copy link
Owner Author

bovee commented Mar 22, 2022

Rather than storing the path as a Vec<String>, we could store it as a Vec<usize> of "delimiter" positions (e.g. for XML a > and for JSON maybe a ,) and a Vec<u8> that we memcpy each new tag into at the end (overwriting existing tags as we move up and down the stack). This would allow direct comparisons between the current state and a search as long as we don't need to track tag attributes (so it would work better for JSON, but I think we could shoehorn it in here too).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant