Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tail characters getting stripped off #15

Open
rohitgadia opened this issue Aug 28, 2017 · 6 comments
Open

Tail characters getting stripped off #15

rohitgadia opened this issue Aug 28, 2017 · 6 comments

Comments

@rohitgadia
Copy link

I am working with a host of PDF reports and while I am able to maintain the layout using your class, sometimes the tail characters are getting stripped off, but the parent class i.e. PDFTextStripper works fine.

Does this have anything to do with this.setCurrentPageWidth(pageRectangle.getWidth()); ??

By the way great work with the class, made the process of extracting tables so easy.

@JonathanLink
Copy link
Owner

Hi! Thank you, great to hear that the class helps you! Can you send me (here or through my email) a PDF file which doesn't work?

@rohitgadia
Copy link
Author

rohitgadia commented Aug 29, 2017

` Independent Auditors Report
To members of Silverlake Axis Ltd.

          rePort  on  the finAnciAl   stAtements                                                                                                   
                                                                                                                                                   
          We have audited  the  accompanying  fnancial  statements  of  Silverlake Axis  Ltd.  and its  subsidiaries  (collectively,  the  Group), 
          50  to  159,  which  comprise  the  statements  of  fnancial position  of  the  Group  and  the  Company  as  at 30  June  2016,  the  co
          of  changes  in equity,  consolidated  income  statement, consolidated statement  of  comprehensive  income  and  consolidated  statement
          fows of  the  Group  for  the  year  then ended,  and a  summary  of signifcant  accounting  policies and  other  explanatory  informatio
                                                                                                                                                   
          Management’s   Responsibility for  the  Financial  Statements                                                                            
          Management  is  responsible  for  the  preparation  of  fnancial statements  that give  a  true  and  fair view  in accordance  with  Int
          Reporting  Standards,  and for  devising  and  maintaining  a  system  of internal  accounting  controls suffcient  to  provide  a  **reaso**
          that  assets  are safeguarded  against  loss  from  unauthorised  use  or  disposition; and transactions  are  properly  authorised  and 
          recorded  as  necessary  to  permit  the  preparation  of  true  and fair consolidated  income  statement and  statements  of  fnancial  
          maintain  accountability  of  assets.                                                                                                    
                                                                                                                                                   
          Auditors’  Responsibility                                                                                                                
          Our  responsibility  is  to express an  opinion  on  these  fnancial  statements  based  on  our  audit.  We  conducted our  audit in  ac
          International  Standards  on  Auditing.  Those  standards require  that we  comply  with  ethical  requirements  and  plan  and  perform 
          obtain  reasonable  assurance  about whether  the  consolidated  fnancial  statements  are free  from  material  misstatement.           
                                                                                                                                                   
          An  audit  involves  performing  procedures  to  obtain  audit evidence about  the  amounts  and  disclosures  in  the consolidated  **fnan**
          The  procedures  selected  depend  on  the auditor’s judgement, including  the  assessment  of  risks  of  material  misstatement of  the
          fnancial  statements,  whether  due to  fraud or  error.  In making  those  risk  assessments,  the auditor  considers  internal  control
          the  entity’s  preparation  of  the consolidated  fnancial  statements  that  give  a  true and  fair  view in order  to  design  audit  
          appropriate  in the circumstances,  but  not  for the  purpose  of expressing an opinion  on  the  effectiveness of  the entity’s  intern
          audit  also  includes  evaluating  the  appropriateness  of  accounting policies  used  and the  reasonableness  of accounting  estimates
          management, as well  as  evaluating  the  overall presentation  of  the  consolidated  fnancial  statements.                             
                                                                                                                                                   
          We believe  that  the audit evidence  we  have obtained  is  suffcient and appropriate to  provide  a  basis  for our  audit opinion.    
                                                                                                                                                   
          Opinion                                                                                                                                  
          In  our  opinion,  the  consolidated  fnancial  statements  of the  Group  and  the  statement of  fnancial position  of  the  Company  a
          up  in  accordance  with the  International  Financial  Reporting  Standards  so  as to  give  a  true  and fair  view  of  the  **fnancial**
          and  of  the Company  as  at  30 June 2016  and  the results,  changes  in  equity  and  cash  **fows**  of  the  Group  for the  year ended 
                                                                                                                                                   
          other  mAtters                                                                                                                           
                                                                                                                                                   
          This  report  is  made solely to  the  members  of the  Company, as  a body,  and for  no  other  purpose.  We do  not  assume  **responsib**
          person  for  the content  of  this report.                                                                                               
                                                                                                                                                   
                                                                                                                                                   
                                                                                                                                                   
          eRNSt & YouNG                                                                                                                            
          AF:  0039                                                                                                                                
          Chartered  Accountants                                                                                                                   
                                                                                                                                                   
          Kuala  Lumpur,  Malaysia                                                                                                                 
          28  September  2016  `

This is what the extracted text looks like, if you look closely few of the characters are missing from words that I have highlighted and have also highlighted the issue where the tail characters are getting stripped.

I have attached the file as well, and the page number is 51 for the above extract. Thanks

AR 2016.pdf

@JonathanLink
Copy link
Owner

JonathanLink commented Aug 31, 2017

Thanks, I am going to investigate on that this week-end

@rohitgadia
Copy link
Author

Thanks a lot, I was wondering if you could explain why were certain characters getting stripped!

@JonathanLink
Copy link
Owner

You were right, it has to to with this.setCurrentPageWidth(pageRectangle.getWidth());
I'll make an update but meanwhile you can change that line to: this.setCurrentPageWidth(pageRectangle.getWidth() * 1.2);
I also noticed that the space between the columns were sometimes not big enough (for instance with page 6). I'll try to fix that too.

JonathanLink added a commit that referenced this issue Jan 4, 2018
@zanonmark
Copy link

@JonathanLink:
Thanks for this class, very useful.

About Your last commit (88bfd8c): I see it's still in the 'dev' branch and hasn't been merged to 'master'. Is there any reason for that?

Also, do we have any way to set the page width externally (i.e.: call pdflayouttextstripper.setPageWidth() or something like that)?
That would be very useful to decide case-by-case how to behave...

Thanks,
MZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants