Proper indenting XMLStreamWriter; support for loading comments; support for storing location metadata #7

Closed
wants to merge 11 commits into
from

2 participants

@netvl

My pull request consists of three main parts:
1. proper implementation of XML indentation via custom implementation of XMLStreamWriter;
2. support for loading comments;
3. support for storing location information.

Justification for each one is as follows.
1. Currently indenting an XML document created via clojure.data.xml is very hacky: XML document is first printed into a string and then read again via another API and then emitted even again, which is certainly not acceptable for applications requiring either performance or processing of large XML documents or both.
An implementation of custom XMLStreamWriter wrapper which will indent the document serially while it is being written to the output is the preferred way to use StAX API. This pull request includes pure Clojure implementation of such indenting writer. It is heavily inspired by stax-utils implementation, though it differs from the latter in several ways.
I guess there is no point in describing why this feature is important.
My implementation may be too hacky and it is certainly imperfect; suggestions on its improvement and fixes are welcome.
2. Currently clojure.data.xml ignores comments from the document. Sometimes this is unacceptable, for example, when the program should process, transform and write back XML configuration files, i.e. ones which are supposed to be read by humans. Comments are important part of such files, and lack of their support is serious flaw.
This pull request adds the support of loading comments. It is triggered via an optional keyword parameter to parse and parse-str functions. Adding this support was simple, esp. because emitting of comments is already supported, and there are corresponding data structures present.
3. The last feature is loading location information. Sometimes it is useful to get an information where exactly some element is located in the XML file. This could be used to process heavily formatted XML document (e.g. some unusual comments placement which is needed for human readability): first a line:column position of an XML node inside a file is found via locating it with, say, XPath selector, and reading its metadata, and then the file is processed manually with string processing functions which are very local in their effect and will not kill formatting.
This pull request adds the support for storing location information provided by StAX parser within elements metadata. Currently only start elements metadata location is supported. This feature is triggered via an optional parameter to parse and parse-str functions.

Hope this would be useful. There are no pure clojure XML parsers with support for indentation, for example, so it would be great if this pull request gets integrated into the main branch.

If something is required from me first before this could be merged, please do tell me, I'll try to do what I can. I'm not very used to advanced Git/Github features related to branching, so I'm sorry if I did/will do something wrong.

Cheers,
Vladimir.

netvl added some commits Dec 17, 2012
@netvl netvl Reading comments and correct indentation writer
Added support for reading comments from XML document and started implementing indenting writer.
07cd79b
@netvl netvl Indenting XMLStreamWriter
* Wrote supporting macros for indenting XMLStreamWriter implementation
8ee3930
@netvl netvl Indenting XMLStreamWriter
* Some comments added
4fb95b6
@netvl netvl Indenting XMLStreamWriter implementation
* Started writing indentation code (somehow IntStack does not work; needs investigating)
5546bdc
@netvl netvl Indenting XMLStreamWriter
* Fixed stack protocol/type to compile
6211fd0
@netvl netvl Indenting XMLStreamWriter implementation
* Added integer bitwise operations
3f729f1
@netvl netvl Indenting XMLStreamWriter
* Some initial implementation of indenting writer is done, not tested
5852a88
@netvl netvl Merge remote-tracking branch 'origin/master'
Conflicts:
	src/main/clojure/clojure/data/xml.clj
704366b
@netvl netvl Indenting XMLStreamWriter implementation
* Implementation of indenting writer seems to be complete
9c84034
@netvl netvl Indenting XMLStreamWriter implementation
* Final tweaks and rearrangements of the code, removed some redundant parts
484ed3c
@netvl netvl Indenting XMLStreamWriter, location metadata loading
* Removed and relocated even more code
* Added support for loading location of starting tags into elements metadata
* Slightly refactored support for loading comments
0446b27
@senior
Clojure member

Vladimir,

Wow lots of stuff here! I'm interested in the changes and would like to dig into them, but I can't accept pull requests (it's a clojure contrib policy). Changes should be a patch file (patched against master) attached to a Jira issue: http://dev.clojure.org/jira/browse/DXML. Looks like you've got 2 or 3 separate issues here. Also patches can only be accepted from those that have signed a signed contributor agreement on file. I didn't see your name on the list: http://clojure.org/contributing. Would you be able to send one in?

Thanks for using the library and making changes!

-Ryan

@senior senior closed this Dec 29, 2012
@netvl

Hello, Ryan,
Thank you very much for the explanations. These are news for me that I have to send physical letter in order to be able to contribute. I have read the page you linked to, and though I wholly understand the reasons behind this, I cannot say that this is motivating.
In any case, unfortunately, I won't be able to send the letter at least for the next half of month (I will be visiting another country), so this will have to wait.
Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment