Skip to content

Releases: YM162/gulagcleaner

v0.14.1

01 Jun 14:28
Compare
Choose a tag to compare
  • Fixed an error where three repeated iobjs would cause the program to output corrupted PDFs. We might need to change it in the future if we find PDFs with a higher number of repeated iobjs.

Full Changelog: v0.13.0...v0.14.1

v0.13.0

03 Apr 23:48
Compare
Choose a tag to compare
  • Added a new function to the python package that allows to directly clean a PDF file using the bytes.
  • @carlosiborra Refactored the PDF tests for the Rust distribution. Also added a README for the Rust distribution.

What's Changed

  • Refactor PDF Cleaning Tests for Improved Modularity and Error Handling + Add Rust Distribution README by @carlosiborra in #18

New Contributors

Full Changelog: v0.12.2...v0.13.0

0.12.2

31 Jan 02:14
Compare
Choose a tag to compare

Re-enabled the Wuolah cleaning method.

Full Changelog: v0.12.1...v0.12.2

v0.12.1

16 Jan 15:12
Compare
Choose a tag to compare

WARNING: This release has the "Wuolah" method disabled. It will clean those PDFs using the "Naive" method instead.

This is meant to be a temporary fix to keep things working while we work on fixing the rest.

What's Changed

  • 0xCAB0/Rust-module-optimization by @0xCAB0 in #13

New Contributors

Full Changelog: v0.11.1...v0.12.1

0.11.1

28 Dec 14:15
Compare
Choose a tag to compare
  • Temporarily removed the method code in the gulagcleaner_wasm crate due to issues with serde serialization messing up the data.

Full Changelog: v0.11.0...v0.11.1

0.11.0

28 Dec 02:06
Compare
Choose a tag to compare

WARNING: The clean_pdf function now returns the numerical code of the method used to clean the pdf along with the cleaned PDF itself. The change is very small, but WILL break your code if you don't change it.

Example for the gulagcleaner_wasm package:

//Previous:
var cleaned_pdf= await clean_pdf(data,0);

//Current:
var cleaning_result = await clean_pdf(data,0);
var cleaned_pdf = cleaning_result.result
var method_code = cleaning_result.method

NEW:

  • Added a method to clean StuDocu PDFs.
  • Now the clean_pdf function also returns the method used to clean the PDF.

Full Changelog: v0.5.2...v0.11.0

0.10.3

24 Dec 05:00
Compare
Choose a tag to compare

NEW:

Full Changelog: v0.10.2...v0.10.3

0.10.2

24 Dec 04:42
199a0fc
Compare
Choose a tag to compare

What's Changed

  • Changed code to Rust and added bindings for Python and JS (via WASM). by @YM162 in #11

Full Changelog: v0.8.1...v0.10.2

v0.5.2

31 May 22:45
Compare
Choose a tag to compare

NEW: Fixes for newer files.

All PDFs downloaded after 18/05/23 have a different internal structure, making the old method of extracting pages obsolete.

This version introduces a new method for extracting the original page via /Contents dictionary manipulation. The old method of PDF.Form extraction can still be used with the '-o' flag.

v0.8.2

21 Sep 16:51
3d72951
Compare
Choose a tag to compare

NEW:

  • Fixed edge case for PDFs with strange MediaBoxes
  • Added support for cleaning multiple pdfs or full folders recursively.

What's Changed

  • Multiple files / folders feature added by @jseg380 in #6
  • Fixes for MediaBoxes not starting in (0,0) by @YM162 in #8
  • Fix for PDFs with unusual MediaBoxes by @YM162 in #9

New Contributors

Full Changelog: v0.7.0...v0.8.1