Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Linebreak space when space-as-offset is used. #41

Open
iapain opened this issue Oct 23, 2012 · 23 comments
Open

Linebreak space when space-as-offset is used. #41

iapain opened this issue Oct 23, 2012 · 23 comments

Comments

@iapain
Copy link
Collaborator

iapain commented Oct 23, 2012

At EOL a soft space should be added so that text selection works smoothly.

@coolwanglu
Copy link
Owner

Right, I'll do that.

@iapain
Copy link
Collaborator Author

iapain commented Oct 24, 2012

Thanks 👍

@coolwanglu
Copy link
Owner

I cannot reproduce it, there should already a newline character there.
Could you please provide an affected PDF files?

@iapain
Copy link
Collaborator Author

iapain commented Dec 7, 2012

I can still produce it (only on mobile devices). Obvious fix is to introduce space between line div. Probably you may close it as it might be not relevant for some usecases.

@coolwanglu
Copy link
Owner

I guess it's a bug of the browsers then, I remember that for div, a 'newline' should be appended by the browser.

@iapain
Copy link
Collaborator Author

iapain commented Jul 12, 2013

I'd encourage to re-open this. As of Firefox 22 it can be now reproduced in both firefox and webkit based browsers. To repeat it, select text (multiline) and paste it. Last word would overlap with first work of newline.

@coolwanglu
Copy link
Owner

@iapain Let me try again.

@coolwanglu
Copy link
Owner

@iapain I cannot reproduce it, e.g. demo.pdf which is one of the demo files of pdf2htmlEX. Can you please provide an example PDF?

@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu See this attached screenshot. I tried this demo.pdf as well and I was able to repeat it.

Steps:

  • Select text multiline. (as shown in image).
  • Paste this text. It'll not have extra space as it should have between last char of first and first char of next line.

case_without_space

@coolwanglu
Copy link
Owner

@iapain Which browser are you using?

@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu That screenshot is from Firefox 22 but I can also repeat it on Google Chrome 27

@coolwanglu
Copy link
Owner

@iapain As we've discussed before, this has always been the behaviour of Chrome.

For Firefox 22, I've just tested on Windows and Linux (Ubuntu), If you select the text and paste them to a multi-line text editor, you will see the line breaks, but if you paste them to the location bar, the Linux version will consume all the line breaks, and the Windows version will convert them into whitespaces.

So I think it's how the location bar handles the line breaks, but the line breaks are there.

Can you please verify this?

@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu You're right about Firefox but still it fails on both WebKit based browsers and IE 9/10.

Looks like:
Webkit omits newline char.
IE 9/10 preserver it as newline (and if you paste it then it keeps the text before newline char)
Firefox convert it to space.

In my opinion we should unify this behaviour.

@coolwanglu
Copy link
Owner

@iapain This is indeed an issues, and I'll reopen it. But I don't have a good solution right row.

@coolwanglu coolwanglu reopened this Jul 18, 2013
@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu I will try to patch this bug.

@coolwanglu
Copy link
Owner

@iapain Thanks! Maybe we can discuss about your solution before you implement it.

@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu Possible solution is to get rid of newline char as in HTML it has no or little influence and substitute it with proper HTML equivalent. Do you have a better idea?

@coolwanglu
Copy link
Owner

@iapain I think you can add a <br> there, but I'm not sure how you can git rid of the newline there.

@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu <br> is not really required, I was thinking about scanning line and if end of line is a new line char then just replace it with empty space.

@coolwanglu
Copy link
Owner

@iapain We should never modify the content unless necessary. When you see a char 0xa, which is supposed to be a char, it might be actually something else, due to the evil encodings of the fonts, which is exactly the reason that sometimes characters could be lost due to --space-as-offset.

Also in PDF file there are rarely actually line break characters, afaik, instead text are simply repositioned with a PDF instruction.

@iapain
Copy link
Collaborator Author

iapain commented Jul 18, 2013

@coolwanglu You're correct about that but I found much more simpler way. It'd be great if you can test it with IE. I have tested it with gecko and webkit.

@coolwanglu
Copy link
Owner

@iapain No it didn't work. I've tested on Firefox and Chrome on Windows.
I remembered that all whitespaces between tags are ignored in HTML.

@ficolo
Copy link

ficolo commented May 18, 2015

How about using & n b s p ; before every < / d i v>? (it is a workaround but worked for me)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants