New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PdfSplitter, PdfRearranger #251
Conversation
…ting the document in byte[]
Also, add api similar to PdfDocument regarding streams
in pdfmerger methods
to clear tokenReferences as we are merging the documents
(not on public api)
spotted on 68-1990-01_A.pdf
First of all thanks for taking your time in putting this code together. This comment is my opinion and shouldn't be considered final at all. But, I don't like the three new classes. Why? Because they are little bit verbose, they duplicate code and they do what Lastly, I think you should break this changes down into their own PR. Maybe one for the |
Thanks for your feedback. The reason that I went for
I guess they could all fall under the I would also like to underline that, thanks to the Similarly, this allows to split documents in complex ways with a fairly simple api. Finally, the Concerning the stream aspect of this PR, I'll split in in differents PR, as you suggested. I made sure not to break any public API thus. Concerning |
This is my take on the PdfSplitter/PdfMerger/ general pdf pages recombination problem. Feel free to put me back in my place if anything is wrong.
It mostly builds up api based on the #248 logic, to allow treatment of several new documents at once, which is necessary when splitting a pdf. I didn't know how to cherry pick commit across forks, so this will conflict with #248, and the page selection logic comes from this branch.
The suggested new public apis do conflicts with the suggested new ones in #248, but since it's not merged yet, I thought I would give it a try.
I introduce the
IPdfArrangement
public api for ultimate control over the pages order to the user. I believe proposing a method instead of a collection of collection opens up more dynamic possibilities.I tried to keep the commit chain readable, so that's the easiest way to read this PR IMO.
Suggested new apis:
class PdfMerger
void Merge(string file1, string file2, Stream output)
void Merge(Stream output, params string[] filePaths)
void Merge(IReadOnlyList<Stream> streams, Stream output)
class PdfSplitter
void SplitTwoParts(Stream file, int secondDocumentFirstPageIndex, Stream output1, Stream output2)
void RemovePages(Stream file, IReadOnlyCollection<int> removedPages, Stream output, Stream removedPagesOutput = null)
void SplitEveryPage(Stream file, IEnumerable<Stream> outputs, int pageCountPerFile = 1)
void Split(Stream file, IReadOnlyCollection<(IReadOnlyCollection<int> Pages, Stream output)> pageBundles)
class PdfRearranger
void Rearrange(IReadOnlyList<IInputBytes> files, IPdfArrangement arrangement, Stream output)
void RearrangeMany(IReadOnlyList<IInputBytes> files, IEnumerable<(IPdfArrangement Arrangement, Stream Output)> rearrangements)
interface IPdfArrangement
IEnumerable<(int FileIndex, IReadOnlyCollection<int> PageIndices)> GetArrangements(Dictionary<int, int> pagesCountPerFileIndex)
Other changes:
PdfDocumentFactory.Open(string filename)
usesFile.ReadAllBytes
, which allocates the whole document, when I believe opening a stream should be beneficial for large documents. This can be put in another PR.PdfStreamWriter
logic to allow flushing after every page tree. This releases references and should again help with large documents.