Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underlined text appears different in the HTML than in the PDF #797

Closed
iyuvalk opened this Issue Feb 10, 2019 · 6 comments

Comments

3 participants
@iyuvalk
Copy link

iyuvalk commented Feb 10, 2019

Hi,

First of all - I've tried many *-to-PDF converters so far and I have to say that WeasyPrint is by far the very best. Thanks for all the hard work.

I have python code that generates an HTML file and then uses WeasyPrint to convert it to PDF. I think that almost everything works really well except one strange thing. When I have underlined text in the HTML, in the PDF version the underline is sometimes too short.

Here's the text as it looks in the HTML:
in_the_html

and here it is from the PDF:
in_the_pdf

The HTML source code for this paragraph is this:

<h3>PIM/PAM <div class="anchor" id="CloudServices_PimPam"></div></h3>
  <ul>
  <li>Access with privileged accounts to AWS services should be managed by CyberArk. The following guidelines should be applied:<ul>
  <li>CPM, PVWA and PSM should be installed on AWS in dedicated network zones that are restricted by network access control lists.</li>
  <li>Management access to CPM and PVWA should be allowed only from the PSM.</li>
  <li>Use the on-premise CyberArk vault to store credentials. </li>
  <li>Configure a site-to-site VPN between the LAN and the AWS infrastructure. The CPM, PVWA and PSM should reside on dedicated network segments. The VPN access controls should be configured as follows:<ul>
  <li><em>Connection from AWS towards the LAN</em>: Allowed from the CPM, PVWA and PSM segments towards the Vault segment.</li>
  <li><em>Connection from LAN towards AWS</em>: Allow using the VPN from the user segment towards the PVWA and PSM segments.</li>
  </ul>
  </li>
  <li>Avoid using bastion infrastructure for management and administration of servers on the AWS infrastructure. These activities should be performed through the PSM.</li>
  </ul>
  </li>

As you can see, in the PDF version, the underline in the first bullet ends just under the letter A in "LAN:" (i.e. one letter from the end) and in the second bullet it ends just under the letter A in "AWS:" (i.e. two letters from the end).

I couldn't find any satisfactory explanation for this weird behaviour - can you?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Feb 10, 2019

In your HTML-Snippet there is no <u>nderline. When I wrap the sublist-items with <u> and <strong> instead of <em> the bold underline is rendered as expected:

797

@iyuvalk What CSS are you applying? What WeasyPrint version are you using?

@iyuvalk

This comment has been minimized.

Copy link
Author

iyuvalk commented Feb 11, 2019

Hi,

Thanks for the super-fast reply. Indeed you're right - I forgot to attach the CSS that I use, here it is:

@charset "UTF-8";
@page {
  @top-left {
    content: "PROPRIETARY AND CONFIDENTIAL";
    font-family: "calibri";
    font-size: 12px;
    font-style: normal;
  }
  @top-right {
    content: url("sygnia-logo.png");
  }
  @bottom-left {
    content: "Application Security Framework";
    font-family: "calibri";
    font-size: 12px;
    font-style: normal;
  }
  @bottom-right {
    content: "© Sygnia  | " counter(page) "/" counter(pages);
    font-family: "calibri";
    font-size: 12px;
    font-style: normal;
  }
  size: A4;
  border-bottom: 1px solid #4472C4;
}
html body article#cover_page {
  page: no-chapter;
}
html body article#contents {
  break-before: right;
  break-after: left;
  page: no-chapter;
}
html body article#contents h2 {
  font-size: 20pt;
  font-weight: 400;
  margin-bottom: 3cm;
}
html body article#contents h3 {
  font-weight: 500;
  margin: 3em 0 1em;
}
html body article#contents h3::before {
  content: '';
  display: block;
  height: .08cm;
  margin-bottom: .25cm;
  width: 2cm;
}
html body article#contents ul {
  list-style: none;
  padding-left: 0;
}
html body article#contents ul li {
  margin: .25cm 0;
  padding-top: .25cm;
  border-bottom: 1px dotted grey;
}
html body article#contents ul li::before {
  font-size: 40pt;
  line-height: 16pt;
  vertical-align: bottom;
}
html body article#contents ul li a {
  color: #002960;
  text-decoration: none;
  content: target-text(attr(href));
}
html body article#contents ul li a::after {
  color: #002960;
  content: target-counter(attr(href), page);
  float: right;
  text-decoration: none;
}
body {
  font-family: "calibri";
  font-size: 15px;
  font-style: normal;
  counter-reset: l1counter;
  width: 1000;
  text-align: justify;
  counter-reset: l1counter;
}
hr {

}
div.anchor {
	display: hidden;
}
a[href^="http"] {
	display: inline;
}
a[href^="#"] {
	display: inline;
}
code {
  display: block;
  font-family: "Courier New";
  white-space: pre-wrap;
  margin: 1em 0;
  font-weight: normal;
  color: black;
  background-color: #dfdfdf;
  overflow-x: auto;
  border: solid 1px #c5c5c5;
  margin-left: 25px;
  padding: 3px;
  word-wrap: break-word;
  text-overflow: wrap;
}
li:last-child {
  padding-bottom: 5;
}
div#cover_page_body {
  left: 0;
  line-height: 200px;
  margin: auto;
  margin-top: -200px;
  position: absolute;
  top: 50%;
  width: 100%;
  font-weight: bold;
  color: #002960;
  font-size: 29px;
}
div#cover_page_date {
  left: 0;
  position: absolute;
  bottom: 50px;
  color: black;
  font-family: "calibri";
  font-size: 12px;
  font-style: normal;
}
div#TOC_Header {
  font-weight: bold;
  color: #002960;
  font-size: 19px;
}
h1:before {
  content: counter(l1counter) ".  ";
  counter-increment: l1counter;
  font-size: 19px;  
}
h1 {
  counter-reset: l2counter;
  font-weight: bold;
  color: #002960;  
  font-size: 19px;  
  text-transform: capitalize;
}
h1+p {
  display: block;
  padding-left: 30;
}
h2:before {
  content: counter(l1counter) "." counter(l2counter) ".  ";
  counter-increment: l2counter;
  padding-left: 1em;  
  font-size: 17px;
}
h2 {
  counter-reset: l3counter; 
  font-weight: bold;
  color: #002960;
  font-size: 17px;
  text-transform: capitalize;
}
h2+p {
  display: block;
  padding-left: 40;

}
h3:before {
  content: counter(l1counter) "." counter(l2counter) "." counter(l3counter) ".  ";
  counter-increment: l3counter;
  padding-left: 2em;  
  font-size: 15px;
}
h3 {
  counter-reset: l4counter; 
  font-weight: bold;
  color: #002960;  
  font-size: 15px;
}
h3+p {
  display: block;
  padding-left: 50;

}
h4:before {
  content: counter(l1counter) "." counter(l2counter) "." counter(l3counter) "." counter(l4counter) ".  ";
  counter-increment: l4counter;  
  font-size: 14px;
}
h4 {
  font-weight: bold;
  color: #002960;
  padding-left: 3em;  
  font-size: 14px;
}
h4+p {
  display: block;
  padding-left: 60;

}
h1~ul {
	display: block;
	padding-left: 40px;
}
h2~ul {
	display: block;
	padding-left: 50px;
}
h3~ul {
	display: block;
	padding-left: 60px;
}
em {
	text-decoration: underline;
	font-style: normal;
	font-weight: bold;
}
h1 {
  page-break-before: always;
}
li {
  page-break-inside: avoid;
}
div.pagebreak {
  page-break: always;
}

It is true that I use the "<em>" element instead of "<u><strong>" elements, however, as you can see I have configured the CSS of the "<em>" so that it should "behave" exactly the same, or am I missing something here:

em {
	text-decoration: underline;
	font-style: normal;
	font-weight: bold;
}

and, like you saw in the previous example, it works in the HTML but it's only in the PDF that it looks different. Also, I tried to move the closing "</em>" tag to include the colon ":" sign just to see how it would look like in the PDF and the result was even weirder as the underline ended just under the middle of the W of the word AWS which is something that cannot be done (to the best of my knowledge) with CSS as either a letter is underlined or it's not but it can't be half-underlined, right?
image

@iyuvalk

This comment has been minimized.

Copy link
Author

iyuvalk commented Feb 11, 2019

Also - I use WeasyPrint version 44 (which is, according to what I see - is the latest version):
image

P.S
I'm using WeasyPrint on Windows 10 in Python 3.7 (just in case it matters...)

@iyuvalk

This comment has been minimized.

Copy link
Author

iyuvalk commented Feb 11, 2019

Ok... I tested it now with the renderer (in python -m weasyprint.tools.renderer) it happened the same way even when I used <u><strong> instead of <em>, after few tweaking I found out the culprit, it's the text-align: justify; statement in the CSS (configured for the "body"). If I remove it from the CSS the underline reaches the end of the text and it looks just fine.

Is that a known issue?

@iyuvalk

This comment has been minimized.

Copy link
Author

iyuvalk commented Feb 11, 2019

I found a way to work around it... if I still use the <em> and configure it in the CSS to be displayed as an inline-block it works correctly. But still, isn't it some sort of a bug that WeasyPrint doesn't create the underline correctly if the text is in text-align: justify; mode?

@Tontyna

This comment has been minimized.

Copy link
Contributor

Tontyna commented Feb 11, 2019

Yep, that's a bug. Snippet to reproduce it:

<style>
p {
  text-align: justify;
}
em {
  text-decoration: underline;
  font-style: normal;
  font-weight: bold;
}
</style>
<p>Lorem <em>ipsum dolor sit et</em> amet
  consetetur sadipscing elitr, sed diam nonumy eirmod tempor 
  invidunt ut labore et dolore magna aliquyam erat.
</p>
<p>Lorem <em style="display:inline-block">ipsum dolor sit et</em> amet 
  consetetur sadipscing elitr, sed diam nonumy eirmod tempor 
  invidunt ut labore et dolore magna aliquyam erat.
</p>

@liZe liZe added the bug label Feb 11, 2019

liZe added a commit that referenced this issue Apr 1, 2019

@liZe liZe closed this in 80f9a34 Apr 1, 2019

@liZe liZe added this to the 47 milestone Apr 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.