Skip to content

Simple text clipping #114

Closed
wants to merge 6 commits into from

2 participants

@jahewson
jahewson commented Apr 5, 2013

Here's my attempt at text clipping, using GfxState::getClipBBox which returns a rectangle which bounds the clipping path. I simply transform the character positions from text space to device space and check if they fall completely outside of the clipping rectangle. If it does, then an offset is inserted in its place.

This is the simplest possible implementation I could think of. It could be enhanced by detecting partially clipped characters (easy) and somehow tagging them as clipped, and then use the CSS clip property on them. That'll work for rectangles only, but that's probably 95% of the real use cases.

@coolwanglu coolwanglu commented on the diff Apr 5, 2013
src/HTMLRenderer/text.cc
@@ -26,8 +26,6 @@ void HTMLRenderer::drawString(GfxState * state, GooString * s)
return;
auto font = state->getFont();
- double cur_letter_space = state->getCharSpace();
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

I assume caching these values would be (a tiny bit) faster?

@jahewson
jahewson added a note Apr 5, 2013

Nope... see GfxState.h

double getCharSpace() { return charSpace; }

That'll get inlined.

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

Oh. I see..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coolwanglu coolwanglu and 1 other commented on an outdated diff Apr 5, 2013
src/HTMLRenderer/text.cc
bool is_space = false;
- if (n == 1 && *p == ' ')
+
+ dx *= state->getFontSize();
+ dy *= state->getFontSize();
+
+ if (font->getWMode())
+ {
+ dx += state->getCharSpace();
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

should be dy?

@jahewson
jahewson added a note Apr 5, 2013

Yes, WMode stuff should be dy, will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coolwanglu coolwanglu and 1 other commented on an outdated diff Apr 5, 2013
src/HTMLRenderer/text.cc
- if(is_space && (param->space_as_offset))
+ // transform from text space to device space
+ double dev_total_dx, dev_total_dy;
+ state->textTransformDelta(total_dx, total_dy, &dev_total_dx, &dev_total_dy);
+ state->transformDelta(dev_total_dx, dev_total_dy, &dev_total_dx, &dev_total_dy);
+ dev_total_dx *= text_zoom_factor();
+ dev_total_dy *= text_zoom_factor();
+
+ double dev_dx, dev_dy;
+ state->textTransformDelta(dx, dy, &dev_dx, &dev_dy);
+ state->transformDelta(dev_dx, dev_dy, &dev_dx, &dev_dy);
+ dev_dx *= text_zoom_factor();
+ dev_dy *= text_zoom_factor();
+
+ // check if char is entirely outside the clipping rect
+ if(x + dev_total_dx + dev_dx < xMin || x + dev_total_dx > xMax ||
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

Please use a<b-EPS and a>b+EPS

@jahewson
jahewson added a note Apr 5, 2013

Will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coolwanglu coolwanglu and 1 other commented on an outdated diff Apr 5, 2013
src/HTMLRenderer/text.cc
{
// ignore horiz_scaling, as it's merged in CTM
- text_line_buf->append_offset((dx1 * cur_font_size + cur_letter_space + cur_word_space) * draw_text_scale);
+ text_line_buf->append_offset((char_dx * state->getFontSize() + state->getCharSpace()) * draw_text_scale);
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

if is_space, word space should be included

@jahewson
jahewson added a note Apr 5, 2013

Ah ok, I'll refactor the if-then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coolwanglu coolwanglu commented on the diff Apr 5, 2013
src/HTMLRenderer/text.cc
bool is_space = false;
- if (n == 1 && *p == ' ')
+
+ dx *= state->getFontSize();
+ dy *= state->getFontSize();
+
+ if (font->getWMode())
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

anyway writing mode fonts are not supported yet, there will be many other problems in the next few lines...

@jahewson
jahewson added a note Apr 5, 2013

Yes, I took that from PSOutputDev, it's just meant to be a starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coolwanglu
Owner

Thank you very much for your effort!

Please check my comments.

I think I'll be able to reduce the number of transformations, I think 1-2 is enough.
Also I'm not sure if text_zoom_factor should be involved.

@coolwanglu coolwanglu commented on an outdated diff Apr 5, 2013
src/HTMLRenderer/text.cc
+ // transform from text space to device space
+ double dev_total_dx, dev_total_dy;
+ state->textTransformDelta(total_dx, total_dy, &dev_total_dx, &dev_total_dy);
+ state->transformDelta(dev_total_dx, dev_total_dy, &dev_total_dx, &dev_total_dy);
+ dev_total_dx *= text_zoom_factor();
+ dev_total_dy *= text_zoom_factor();
+
+ double dev_dx, dev_dy;
+ state->textTransformDelta(dx, dy, &dev_dx, &dev_dy);
+ state->transformDelta(dev_dx, dev_dy, &dev_dx, &dev_dy);
+ dev_dx *= text_zoom_factor();
+ dev_dy *= text_zoom_factor();
+
+ // check if char is entirely outside the clipping rect
+ if(x + dev_total_dx + dev_dx < xMin || x + dev_total_dx > xMax ||
+ y + dev_total_dy + dev_dy < yMin || y + dev_total_dy > yMax)
{
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

For horizontal text, dev_dx is the width of the current char, and dev_dy is 0.
To get the bounding box of the current char, ascent, descent and font size should be involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jahewson
jahewson commented Apr 5, 2013

For horizontal text, dev_dx is the width of the current char, and dev_dy is 0.
To get the bounding box of the current char, ascent, descent and font size should be involved.

Ah, I didn't realise dev_dy was 0. This needs fixing, any suggestions?

@coolwanglu
Owner

In the text coordinate system, current position (tx,ty) marks the origin of the next character.
Depending on wmode or not, one of char_dx and char_dy should be 0. (Maybe there are other cases? I'm not 100% sure)
For horizontal text, char_dy is 0, and char_dx is the width of the next char.

Let A=ascent * font_size and B=descent * font_size (B is usually negative)
The BBox of the next char should be (tx, ty+B) - (tx+dx, ty+A), so we need to test if this box is completely outside the clipping path.

@coolwanglu coolwanglu and 1 other commented on an outdated diff Apr 5, 2013
src/HTMLRenderer/text.cc
if(is_space && (param->space_as_offset))
{
// ignore horiz_scaling, as it's merged in CTM
- text_line_buf->append_offset((dx1 * cur_font_size + cur_letter_space + cur_word_space) * draw_text_scale);
+ text_line_buf->append_offset((char_dx * state->getFontSize() + state->getCharSpace() + state->getWordSpace()) * draw_text_scale);
+ }
+ else if(x + dev_total_dx + dev_dx < xMin - EPS || x + dev_total_dx > xMax + EPS ||
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

Note that dev_dx could be negative.

@jahewson
jahewson added a note Apr 5, 2013

That's ok, all of the values can.

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

Seems that you are testing if the interval (x+dev_total_dx, x+dev_total+dev_dx) intersects with the interval (xMin, xMax).
And you assume that dev_dx > 0, otherwise the first interval should be (x+dev_total_dx+dev_dx, x+dev_total).

Later if you transform the bbox from the text coordinate system into device system, you need to find min/max x/y from the 4 (transformed) corners.

@jahewson
jahewson added a note Apr 5, 2013

No, I'm testing if the character rectangle is entirely outside the clipping rectangle. This is not a test for intersection.

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013
@jahewson
jahewson added a note Apr 5, 2013

That would be 5 + (-100) < 0 which is correct - the character box lies entirely to the left of the clipping area.

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013
@jahewson
jahewson added a note Apr 5, 2013

I'm not sure how likely that is to happen? This should solve it...

x + max(dev_total_dx, dev_total_dx + dev_dx) < xMin 

etc.

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013
@jahewson
jahewson added a note Apr 5, 2013

min/max x/y should be determined (again) after any transformations.

What do you mean? Transformations can't occur partway through a string.

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013
@jahewson
jahewson added a note Apr 5, 2013

What is char_dx?

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013
@jahewson
jahewson added a note Apr 5, 2013

Sorry, yes, char_dx is the unmodified value, I forgot.

dev_dx is always -char_dx

Seems wrong to me...

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013
@jahewson
jahewson added a note Apr 5, 2013

Yeah that makes sense, you said "always" which confused me, really you mean "could be". Cheers!

@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

oops..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coolwanglu coolwanglu commented on the diff Apr 5, 2013
src/HTMLRenderer/text.cc
+ else
+ {
+ dx += state->getCharSpace();
+ if (n == 1 && *p == ' ')
+ {
+ dx += state->getWordSpace();
+ is_space = true;
+ }
+ }
+ dx *= state->getHorizScaling();
+
+ // transform from text space to device space
+ double dev_total_dx, dev_total_dy;
+ state->textTransformDelta(total_dx, total_dy, &dev_total_dx, &dev_total_dy);
+ state->transformDelta(dev_total_dx, dev_total_dy, &dev_total_dx, &dev_total_dy);
+ dev_total_dx *= text_zoom_factor();
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

I think text_zoom_factor should not be involved, this function is something about pdf2htmlEX, but the clip-visibility test here should have nothing to do with (the behaviour of) pdf2htmlEX?

@jahewson
jahewson added a note Apr 5, 2013

Yes, you're right. I had some debugging code which required text_zoom_factor to work properly. I'll take it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jahewson
jahewson commented Apr 5, 2013

Ok, I've fixed it to use the font size instead of dy. Then I subtract fontSize * descent from y to account for the baseline.

@coolwanglu coolwanglu commented on the diff Apr 5, 2013
src/HTMLRenderer/text.cc
+ {
+ dx += state->getWordSpace();
+ is_space = true;
+ }
+ }
+ dx *= state->getHorizScaling();
+
+ // transform from text space to device space
+ double dev_total_dx, dev_total_dy;
+ state->textTransformDelta(total_dx, total_dy, &dev_total_dx, &dev_total_dy);
+ state->transformDelta(dev_total_dx, dev_total_dy, &dev_total_dx, &dev_total_dy);
+ dev_total_dx *= text_zoom_factor();
+ dev_total_dy *= text_zoom_factor();
+
+ double dummy, dev_dx;
+ state->textTransformDelta(dx, 0, &dev_dx, &dummy);
@coolwanglu
Owner
coolwanglu added a note Apr 5, 2013

Why dummy? So you do not consider any rotations?

@jahewson
jahewson added a note Apr 5, 2013

Because dy is always zero. I'm ignoring WMode for now.

@jahewson
jahewson added a note Apr 5, 2013

Hmmm... this isn't supposed to be ignoring rotations, I guess that dev_dy could be non-zero when dy is zero, so perhaps dev_dy does need to be taken into account...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jahewson
jahewson commented Apr 5, 2013

Right, looks like it all works, I'm ready to hand this over to you... :)

@coolwanglu
Owner

OK. Thank you very much!
I'll merge it manually

@jahewson
jahewson commented Apr 5, 2013

I'll merge it manually

Why?

@coolwanglu
Owner
@jahewson
jahewson commented Apr 5, 2013

Oh I see, new commits to master...

@jahewson
jahewson commented Apr 5, 2013

So it turns out that my visibility test does not work without text_zoom_factor, so I reverted that commit.

@coolwanglu
Owner
@jahewson
jahewson commented Apr 5, 2013

Cheers! I'll let you do your manual merge now.

@coolwanglu
Owner

While reading the code, I kept thinking, it might be easier to use clip CSS properties, than detecting inside-or-out ourselves. Further, if we will support all kinds of paths in the future, we can export the path into SVG and use clip-path, without doing the complicated job of handling the paths.

And to export the rectangles, we just need the values from getClipBBox and the current transform matrix, it would be much easier than the careful jobs done in this pull request.

What do you think?

@coolwanglu
Owner

Since getClipBBox is in device coordinate system, even current matrix is not relevant...

@jahewson
jahewson commented Apr 8, 2013

Using CSS clip instead should work, you'll need to transform the co-ordinates of the ClipBBox back into user-space. Give it a try.

@coolwanglu
Owner
  • GfxState already provides clip x/y min/max in user space
  • Translating into user space would bring more error in the estimation
  • Actually I don't need coordinates in user-space, the HTML file is in the device space, so probably the format will be
<div clip="...">
<div>text line 1</div>
<div>text line 2</div>
</div>
<!-- clip bbox changed -->
<div clip="...">
<div>text line 1</div>
<div>text line 2</div>
</div>
  • For each line, if I want to add the clip property, I will need to translate the clip-bbox into text-space, not user-space
@jahewson
@coolwanglu
Owner
@jahewson
@coolwanglu
Owner

Implemented.
Thanks!

@coolwanglu coolwanglu closed this May 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.