Skip to content

Commit

Permalink
check unicode errors on metadata printing in verbose mode
Browse files Browse the repository at this point in the history
  • Loading branch information
abarker committed Apr 9, 2017
1 parent df3106e commit ee9f722
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 27 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.rst
Expand Up @@ -3,8 +3,8 @@
History
=======

0.1.3 (no date yet)
-------------------
0.1.3 (2017-03-14)
------------------

New Features:

Expand Down
32 changes: 18 additions & 14 deletions README.rst
Expand Up @@ -18,7 +18,7 @@ Features
- Works with either the pdftoppm program or with Ghostscript.
- A version of pdftoppm for Windows is included.
- Renders and analyzes page images to find the bounding boxes, which allows it
to deal with noisy images.
to deal with noisy scanned PDFs.
- Ghostscript can optionally be used to find the bounding boxes directly.
- Can automatically apply a Ghostscript repair operation to attempt to fix
corrupt PDF files.
Expand All @@ -28,7 +28,7 @@ Features
their margins.
- Can uniformly crop even and odd pages separately.
- Can automatically run a document previewer on the output file.
- The format of any automatically-generated output filenames is easily
- The format of automatically-generated output-file names is easily
modifiable.
- Preserves document catalog information such as outlines if possible.
- Implements a rudimentary 'undo' capability by default.
Expand Down Expand Up @@ -69,33 +69,37 @@ The easiest way to install pdfCropMargins is to install using pip::

pip install pdfCropMargins

The ``pip`` program should be installed with Python. You can run ``python -m
ensurepip`` if it is not, or install from the repos in Linux.
The ``pip`` program should be automatically installed along with Python. You
can run ``python -m ensurepip`` if it is not, or install from the repos in
Linux.

The package can also be run directly from its source code directory. In that
case the PyPDF2 Python package must be installed. The Pillow imaging package
is also necessary if Ghostscript is unavailable as well as for certain advanced
features which use explicit image analysis.
The package can also be run directly from its source code directory tree. In
that case the PyPDF2 Python package must already be installed. The Pillow
imaging package is also necessary if Ghostscript is unavailable, as well as for
certain advanced features which use explicit image analysis. Run the script in
the top-level bin directory called ``pdfCropMargins.py``.

Running
=======

After installation the program can be run with a command such as::
After installation via pip the program can be run with a command such as::

pdf-crop-margins -v -s -u your-file.pdf

For help, run::

pdf-crop-margins -h | more

On Windows you may need to put the Python ``Scripts`` directory into your
environment ``PATH`` in order to avoid having to use the full pathname.
On Windows you may need to explicitly put the Python distribution's ``Scripts``
directory into your environment ``PATH`` in order to avoid having to use the
full pathname.

To diagnose unexpected crops, try running with the ``-v`` verbose argument.
Running without ``-u`` will crop each page separately, so you can see which
pages might be causing problems (such as with noise near the borders or margin
text). To get cropped pages all the same size be sure to use both ``-s`` to
make pages the same size and ``-u`` to crop each page by the same amount.
pages might be causing problems (such as pages with noise near the borders or
margin text). To get cropped pages all the same size be sure to use both
``-s`` to make pages the same size and ``-u`` to crop each page by the same
amount.

Documentation
=============
Expand Down
28 changes: 17 additions & 11 deletions src/pdfCropMargins/main_pdfCropMargins.py
Expand Up @@ -902,17 +902,23 @@ def main_crop():
if args.verbose and not metadata_info:
print("\nNo readable metadata in the document.")
elif args.verbose:
print("\nThe document's metadata, if set:\n")
print(" The Author attribute set in the input document is:\n %s"
% (metadata_info.author))
print(" The Creator attribute set in the input document is:\n %s"
% (metadata_info.creator))
print(" The Producer attribute set in the input document is:\n %s"
% (metadata_info.producer))
print(" The Subject attribute set in the input document is:\n %s"
% (metadata_info.subject))
print(" The Title attribute set in the input document is:\n %s"
% (metadata_info.title))
try:
print("\nThe document's metadata, if set:\n")
print(" The Author attribute set in the input document is:\n %s"
% (metadata_info.author))
print(" The Creator attribute set in the input document is:\n %s"
% (metadata_info.creator))
print(" The Producer attribute set in the input document is:\n %s"
% (metadata_info.producer))
print(" The Subject attribute set in the input document is:\n %s"
% (metadata_info.subject))
print(" The Title attribute set in the input document is:\n %s"
% (metadata_info.title))
# Some metadata cannot be decoded or encoded, at least on Windows. Could
# print from a function instead to write all the lines which can be written.
except (UnicodeDecodeError, UnicodeEncodeError):
print("\nWarning: Could not write all the document's metadata to the screen."
"\nGot a UnicodeEncodeError or a UnicodeDecodeError.")

##
## Now compute the set containing the pyPdf page number of all the pages
Expand Down

0 comments on commit ee9f722

Please sign in to comment.