-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xhtml inline png image support #35
Open
kippr
wants to merge
14
commits into
brendonh:master
Choose a base branch
from
kippr:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Images are already present in the Document model when read from RTF. This change converts any PNG images found into <img> tags with inline base64 encoded data elements. Other images (for example WMF alternatives and jpegs) are ignored. Previous behaviour was to write out the hex-encoded image string.
Prior commit took images being read in RTF and added them as inline png images to XHTML. This completes the reverse: inline png images in XHTML are read into the Document model, and writing those Documents to RTF will now include the inline image. Width/ height attributes are also transformed, assuming a standard conversion of 15 twips per pixel. Only PNG images are supported.
Comment regarding ~, - and _ in commit c72d457 suggests dropping them was intended behavior.. but instead they are included as text output. Spec at http://www.biblioscape.com/rtf15_spec.htm: \~: Nonbreaking space. \-: Optional hyphen. \_: Nonbreaking hyphen. A future extension might be to extend document to represent these, and then let writers decide whether they want to include them or not (e.g. as &NBSP; in XHTML).
Previously in RTF documents containing nested lists that 'ended' on a nested item, the outer most item would be added into the list above it, but the list above it would never be added in the lists/ doc above that, so would get dropped.
Also confirmed round trip from XHTML to RTF with tests: - Checking that RTF reads underlining markup into Document - Checking that RTF writes underline formatting - Checking that XHTML reads u tags or css underline styling into Document - Checking that XHTML writes u tags
As per http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-strict.dtd_sub this is the recommended way.
For now its better to parse html ordered lists as unordered lists rather than creating invalid document structures that crash parsing. (ListItems right under Paras because ol is ignored)
Found plenty of examples of these in the wild.. This fix adds a para up front but doesn't add it to list stack, so we also hold on the last pop of the list stack when unwinding lists because there is no final holding paragraph
Previously, sublists were always added to their own li element, but this renders as double bullets in HTML: * Top level * - Sub list item Now we add the nested ul directly to the prior non-list flow item (Top level para in example above), which gives expected single-bullet nesting: * Top level - Sub list item
Currently these characters get writter out verbatim to RTF stream, rendering the result invalid. Instead they should be escaped with a leading backslash.
If HTML entities were escaped when converting from HTML to whatever other format, don't escape the ampersands in them again on the way out from whatever format back to HTML.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Images are already present in the Document model when read from RTF.
This change converts any PNG images found into img tags with inline
base64 encoded data elements. Other images (for example WMF alternatives
and jpegs) are ignored.
Previous behaviour was to write out the hex-encoded image string.
I needed this behaviour - hope it might be useful to others as well. Inline data images are widely supported now (http://caniuse.com/#feat=datauri) so I think this is a reasonable way to handle rtf to html conversion.
Thanks