GoPDF

A Go library for reading PDF files, with active CJK text extraction support.

Requires Go 1.25+ (go.mod directive).

Originally forked from ledongthuc/pdf; now an independent project. Original lineage: rsc/pdf.

Features

Plain text extraction with context/cancellation support
Styled text extraction (font name, size, position)
Text grouped by row
Pages() iter.Seq2[int, Page] and Texts() iter.Seq[Text] — lazy iterators for streaming access (Go 1.23+)
OpenBytes([]byte) — parse a PDF from an in-memory byte slice
Page.MediaBox() and Page.CropBox() — page dimensions with inheritance-chain resolution
Document metadata API (r.Info(): title, author, dates, …)
Outline (table of contents) with resolved page numbers
Encrypted PDF support — transparent decryption of Standard-security-handler files: RC4 (40/128-bit, V=1/2), AES-128 (V=4, AESV2), and AES-256 (V=5, R=5/R=6 — Acrobat 9+ and PDF 2.0 / ISO 32000-2). Open with the empty, user, or owner password via NewReaderEncrypted.
CJK predefined CMap decoders:
- Japanese Shift-JIS (90ms-RKSJ-H/V, 90pv-RKSJ-H)
- CJK UCS-2 BE (UniGB-UCS2-H/V, UniCNS-UCS2-H/V, UniJIS-UCS2-H/V, UniKS-UCS2-H/V)
- Simplified Chinese GBK / GB-EUC / GBKp-EUC (GBK-EUC-H/V, GB-EUC-H/V, GBKp-EUC-H/V)
- Traditional Chinese Big5-ETen / ETenms (ETen-B5-H/V, ETenms-B5-H/V)
- Korean UHC / KSC-EUC / UHC-HW (KSCms-UHC-H/V, KSC-EUC-H/V, KSCms-UHC-HW-H/V)

Install

go get github.com/Detective-XH/gopdf

Examples

See the examples/ folder for runnable programs.

Read plain text

package main

import (
	"bytes"
	"context"
	"fmt"

	"github.com/Detective-XH/gopdf"
)

func main() {
	f, r, err := pdf.Open("./sample.pdf")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	var buf bytes.Buffer
	b, err := r.GetPlainText(context.Background())
	if err != nil {
		panic(err)
	}
	buf.ReadFrom(b)
	fmt.Println(buf.String())
}

Read styled text

package main

import (
	"context"
	"fmt"

	"github.com/Detective-XH/gopdf"
)

func main() {
	f, r, err := pdf.Open("./sample.pdf")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	sentences, err := r.GetStyledTexts(context.Background())
	if err != nil {
		panic(err)
	}
	for _, s := range sentences {
		fmt.Printf("font=%s size=%.1f x=%.1f y=%.1f text=%s\n",
			s.Font, s.FontSize, s.X, s.Y, s.S)
	}
}

Read text by row

package main

import (
	"fmt"
	"os"

	"github.com/Detective-XH/gopdf"
)

func main() {
	f, r, err := pdf.Open(os.Args[1])
	if err != nil {
		panic(err)
	}
	defer f.Close()

	for i := 1; i <= r.NumPage(); i++ {
		p := r.Page(i)
		if p.V.IsNull() {
			continue
		}
		rows, _ := p.GetTextByRow()
		for _, row := range rows {
			fmt.Printf("row %d:", row.Position)
			for _, word := range row.Content {
				fmt.Printf(" %s", word.S)
			}
			fmt.Println()
		}
	}
}

Limitations

Text extraction only — no PDF creation, modification, or rendering.
Image content is not decoded (location metadata via Page.Images() is planned).
No AcroForms extraction yet (planned).
Requires Go 1.23+ for Pages() / Texts() iterators; all other APIs work on Go 1.21+.

Changelog

See CHANGELOG.md for the full history of fixes and additions.

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github/workflows		.github/workflows
examples		examples
pdfpasswd		pdfpasswd
testdata/corpus		testdata/corpus
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
buf.go		buf.go
buf_test.go		buf_test.go
charclass.go		charclass.go
charclass_test.go		charclass_test.go
cmap.go		cmap.go
cmap_test.go		cmap_test.go
content.go		content.go
content_test.go		content_test.go
context_test.go		context_test.go
corpus_bench_test.go		corpus_bench_test.go
corpus_gen_test.go		corpus_gen_test.go
corpus_test.go		corpus_test.go
encoder.go		encoder.go
encoder_test.go		encoder_test.go
encrypt.go		encrypt.go
encrypt_aes256.go		encrypt_aes256.go
encrypt_aes256_test.go		encrypt_aes256_test.go
encrypt_test.go		encrypt_test.go
filter.go		filter.go
filter_test.go		filter_test.go
font.go		font.go
font_test.go		font_test.go
go.mod		go.mod
go.sum		go.sum
gstate.go		gstate.go
gstate_test.go		gstate_test.go
layout.go		layout.go
layout_test.go		layout_test.go
lex.go		lex.go
lex_test.go		lex_test.go
metadata.go		metadata.go
metadata_test.go		metadata_test.go
name.go		name.go
name_test.go		name_test.go
object.go		object.go
object_test.go		object_test.go
outline.go		outline.go
outline_test.go		outline_test.go
page.go		page.go
page_cjk_test.go		page_cjk_test.go
page_test.go		page_test.go
plaintext.go		plaintext.go
plaintext_test.go		plaintext_test.go
ps.go		ps.go
ps_test.go		ps_test.go
read.go		read.go
read_test.go		read_test.go
redteam_test.go		redteam_test.go
resolve.go		resolve.go
resolve_test.go		resolve_test.go
text.go		text.go
text_test.go		text_test.go
tj_kerning_test.go		tj_kerning_test.go
value.go		value.go
value_test.go		value_test.go
walk.go		walk.go
walk_test.go		walk_test.go
xobject_text_test.go		xobject_text_test.go
xref.go		xref.go
xref_stream.go		xref_stream.go
xref_stream_test.go		xref_stream_test.go
xref_test.go		xref_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoPDF

Features

Install

Examples

Read plain text

Read styled text

Read text by row

Limitations

Changelog

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GoPDF

Features

Install

Examples

Read plain text

Read styled text

Read text by row

Limitations

Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages