Skip to content

AstraBert/PdfItDown

Repository files navigation

PdfItDown

Convert Everything to PDF


Join Discord Server

PdfItDown Logo

Looking for the legacy python package?

PdfItDown is a Rust-based tool and library that converts text-based files, images, office documents, and markup files to PDF. It is built on top of markdown2pdf, office2pdf, and image crates to carry out fast, reliable conversions. Visit us on our documentation website!

Applicability

PdfItDown is applicable to the following file formats:

  • Markdown (.md)
  • HTML (.html, .htm)
  • PowerPoint (.pptx)
  • Word (.docx)
  • Excel (.xlsx)
  • Text-based formats (.txt, .csv, .xml, .json, and more)
  • Image files (.png, .jpg, .jpeg, .webp, .tiff, .avif)
  • PDF (pass-through)

How does it work?

PdfItDown works in a very simple way:

  • From markdown / HTML to PDF
graph LR
2(Input File) --> 3[Markdown content]
3[Markdown content] --> 4[markdown2pdf]
4[markdown2pdf] --> 5(PDF file)
Loading
  • From image to PDF
graph LR
2(Input File) --> 3[Bytes]
3[Bytes] --> 4[image crate]
4[image crate] --> 5(PDF file)
Loading
  • From Office documents to PDF
graph LR
2(Input File) --> 3[office2pdf]
3[office2pdf] --> 4(PDF file)
Loading
  • From other text-based file formats to PDF
graph LR
2(Input File) --> 3[Text content]
3[Text content] --> 4[markdown2pdf]
4[markdown2pdf] --> 5(PDF file)
Loading

Installation and Usage

PdfItDown is distributed as a Rust crate and a standalone CLI binary.

Install the CLI

# Install from crates.io
cargo install pdfitdown

# Or build from source
git clone https://github.com/AstraBert/PdfItDown.git
cd PdfItDown
cargo install --path crates/pdfitdown

You can now use the command line tool:

Usage: pdfitdown [OPTIONS]

  PdfItDown CLI: convert any file format to PDF

Options:
  -i, --inputfile <INPUTFILE>    Path to the input file(s) that need to be converted to PDF. Can be used multiple times.
  -o, --outputfile <OUTPUTFILE>  Path to the output PDF file(s). If more than one input file is provided, you should provide an equal number of output files.
  -d, --directory <DIRECTORY>    Directory whose files you want to bulk-convert to PDF. If `--inputfile` is also provided, this option will be ignored.
      --no-overwrite             Do not overwrite existing PDF files
      --recursive                Recursively go through a directory when converting files to PDFs
  -h, --help                     Print help
  -V, --version                  Print version

An example usage can be:

pdfitdown -i README.md -o README.pdf

Or you can use it inside your Rust projects:

use pdfitdown::{PdfItDownConverter, types::Converter};

let converter = PdfItDownConverter::new();
let pdf_bytes = converter.convert("business_growth.md")?;
std::fs::write("business_growth.pdf", pdf_bytes)?;

You can also convert multiple files at once:

  • In the CLI:
# with custom output paths
pdfitdown -i test0.png -i test1.md -o testoutput0.pdf -o testoutput1.pdf
  • In the Rust API:
use pdfitdown::{PdfItDownConverter, types::Converter};

let converter = PdfItDownConverter::new();
converter.convert_multiple_files(
    vec!["business_growth.md", "logo.png"],
    vec!["business_growth.pdf", "logo.pdf"],
    true, // overwrite
)?;

You can bulk-convert all the files in a directory:

  • In the CLI:
# non-recursive
pdfitdown -d tests/data/testdir
# recursive
pdfitdown -d tests/data/testdir --recursive
  • In the Rust API:
use pdfitdown::{PdfItDownConverter, types::Converter};

let converter = PdfItDownConverter::new();
converter.convert_directory("tests/data/testdir", true, true)?; // overwrite, recursive

Parallelization with rayon

Enable the rayon feature for parallel conversion of multiple files:

[dependencies]
pdfitdown = { version = "4.0", features = ["rayon"] }
cargo install pdfitdown --features rayon

Python Legacy

Looking for the legacy Python package? It is available on the v3 branch and on PyPI as pdfitdown<4.0.

License

This project is open-source and is provided under an MIT License.