Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improved] PdfMerger: Allow to select the pages when merging documents #248

Merged
merged 2 commits into from Jan 5, 2021

Conversation

InusualZ
Copy link
Collaborator

@InusualZ InusualZ commented Dec 12, 2020

Supersede #245

This PR offers a much better option and flexible approach to allow a page selection when merging documents. Also, I address the drawback of the other PR which was related to copying pages out of order. As a by product we now allow a page to be copied multiple times (By repeating the page index multiple time in the selection)

Illustration of the problem (Copied)

This is my main document

  • Root (Page Tree)
    • Tree (Child Page Tree)
      • Resources
        • Image (F1) (Logo)
        • Font (F2) (Custom Font)
      • Page (Page No. 1)
        • Resources
          • Image (F3) (User Icon)
      • Page (Page No. 2)
        • Resources
          • Image (F3) (User Icon)
    • Tree (Child Page Tree)
      • Resources
        • Image (F1) (Another Logo)
        • Font (F2) ( Another Custom Font)
      • Page (Page No. 3)
        • Resources
          • Image (F3) (User Icon)

Now, let's try to merge the selection of pages (1, 3, 2) in that order:

  • Root (Page Tree)
    • Tree (Child Page Tree)
      • Resources
        • Image (F1) (Logo)
        • Font (F2) (Custom Font)
      • Page (Page No. 1)
        • Resources
          • Image (F3) (User Icon)
    • Tree (Child Page Tree)
      • Resources
        • Image (F1) (Another Logo)
        • Font (F2) ( Another Custom Font)
      • Page (Page No. 3)
        • Resources
          • Image (F3) (User Icon)
    • Tree (Child Page Tree)
      • Resources
        • Image (F1) (Logo)
        • Font (F2) (Custom Font)
      • Page (Page No. 2)
        • Resources
          • Image (F3) (User Icon)

I solve that by copying the page from bottom to top, instead of top to bottom like it was before. By doing so, we got rid of one recursive method (DocumentMerger::CopyPagesTree()). I think thanks to this last bit, we can introduce Parallelism, but that's a topic for another day.

I would like to comment, that I don't like the current state of public PdfMerger API

cc @Poltuu I think this would entirely resolve #244, any comments?

@codecov-io
Copy link

codecov-io commented Dec 12, 2020

Codecov Report

Merging #248 (1c9b86b) into master (eabe6b4) will increase coverage by 0.01%.
The diff coverage is 89.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #248      +/-   ##
==========================================
+ Coverage   64.39%   64.40%   +0.01%     
==========================================
  Files         600      600              
  Lines       39714    39758      +44     
==========================================
+ Hits        25572    25607      +35     
- Misses      14142    14151       +9     
Impacted Files Coverage Δ
src/UglyToad.PdfPig/Writer/PdfMerger.cs 82.00% <89.60%> (-0.30%) ⬇️
src/UglyToad.PdfPig.Tokens/DictionaryToken.cs 82.35% <100.00%> (+0.81%) ⬆️
src/UglyToad.PdfPig/Content/Catalog.cs 81.81% <0.00%> (-2.28%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eabe6b4...1c9b86b. Read the comment docs.

Copy link
Contributor

@Poltuu Poltuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as i'm concerned, this looks great.
I can't pretend I know too much on the subject, so someone else should be looking into this

}

/// <summary>
/// Merge the set of PDF documents.
/// </summary>
public static byte[] Merge(IReadOnlyList<byte[]> files)
public static byte[] Merge(IReadOnlyList<byte[]> files, IReadOnlyList<IReadOnlyCollection<int>> pagesBundle = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth considering, a IReadOnlyList<(byte[] File, IReadOnlyCollection<int> Pages)> signature.

@plaisted
Copy link
Contributor

This looks really useful. PdfPig looks nice for text extraction but is missing some features such as splitting / merging documents, modify existing documents, etc. Looks like you've put a lot of work into these areas @InusualZ thanks for the effort and hope these get merged soon.

}

/// <summary>
/// Merge the set of PDF documents.
/// </summary>
public static byte[] Merge(IReadOnlyList<byte[]> files)
public static byte[] Merge(IReadOnlyList<byte[]> files, IReadOnlyList<IReadOnlyCollection<int>> pagesBundle = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IReadOnlyCollection or ICollection are probably the wrong types here, since they do not guarantee order iirc, I'd recommend IReadOnlyList since order is important here.

Copy link
Member

@EliotJones EliotJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just the doubt on collection vs list types.

PdfMerger: Allow to select the pages when merging documents

Supersede UglyToad#245
Plus, Use IReadOnlyList instead of IReadOnlyCollection
@Poltuu Poltuu mentioned this pull request Jan 4, 2021
@InusualZ InusualZ merged commit 19ac38b into UglyToad:master Jan 5, 2021
@InusualZ InusualZ deleted the merger-selection-v2 branch January 8, 2021 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pdf splitter
5 participants