-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with hyphen #10
Comments
Hello Philip, Well, to tell the truth, the initial version of my class did suppress I finally suppressed it because during the following weeks, I did not have However, now it seems that it makes sense to put it back. I think I will add I will come back to you when the new version will be available. Christian. De : phisu [mailto:notifications@github.com] hello Christian, in mostly every pdf we find hyphens. when the hyphens are on the end of a philipp � L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. |
Ooops I completely forgot : do you have a sample to give to me ? or De : phisu [mailto:notifications@github.com] hello Christian, in mostly every pdf we find hyphens. when the hyphens are on the end of a philipp � L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. |
Hello Philipp, I�m glad to tell you that the PdfToText V1.2.36 class is now able to I�ve noticed one unwanted side-effect in your sample
Is displayed as :
Maybe it will be better once I�ll have implemented a more robust management However, the rest of the text contents, which contains many hyphenated Christian. De : phisu [mailto:notifications@github.com] hello Christian, in mostly every pdf we find hyphens. when the hyphens are on the end of a philipp � L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. |
hello christian, the output starts with:
with version [Version : 1.2.35] [Date : 2016/08/06] the output of the same file was very fine. philipp |
hello christian, philipp |
Hello Philipp, It�s too late ! I implemented this feature in the early versions of my class I added it again : it was nothing and took me an hour to complete. Sometimes Christian. De : phisu [mailto:notifications@github.com] hello christian, philipp � L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. |
Hello Philipp, I solved this problem late this night before you performed your testings. It was due to my complete reworking of how I’m handling Unicode to UTF8 translations. One internal function, which was accepting a character s a parameter, now accepts an integer value. I just missed 2 calls in my code which were still supplying a character value as a parameter. The latest version, 1.2.38, solved that (I tried it on the sample you sent to me). Christian. De : phisu [mailto:notifications@github.com] hello christian, the output starts with: 61111113111399111111111111391121111911111311137111111146111911113 43 with version [Version : 1.2.35] [Date : 2016/08/06] the output of the same file was very fine. philipp — L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. |
I want hyphens in my pdf. Is there an option not to remove it with layout, because as of now it removes all the hyphens from my table in pdf. |
hello Christian,
in mostly every pdf we find hyphens. when the hyphens are on the end of a line, i guess, we are mostly not interested in them. the quality of the extracted text is maybe better, if they are eliminated. this could be done by a extra cleanup of the output of your class or by your class itself. what do you think about that?
philipp
The text was updated successfully, but these errors were encountered: