Skip to content

pdf-object: implement CCITTFaxDecode filter#121

Merged
Velli20 merged 8 commits intomainfrom
ccitt-fax-filter
Apr 5, 2026
Merged

pdf-object: implement CCITTFaxDecode filter#121
Velli20 merged 8 commits intomainfrom
ccitt-fax-filter

Conversation

@Velli20
Copy link
Copy Markdown
Owner

@Velli20 Velli20 commented Mar 28, 2026

Add pub mod ccitt with a pure-Rust CCITTFaxDecode decoder supporting all three encoding modes: Group 3 1D (K=0), Group 3 2D (K>0), and Group 4 / MMR (K<0). Wire it into StreamObject::data() with a ccitt_params_for_filter() helper that reads /DecodeParms for both single-filter and multi-filter streams.

The implementation embeds Huffman tables from ITU-T T.4 Appendix A (WHITE/BLACK terminating and make-up codes, common extended make-up codes) and a T.6 mode-code decoder (Pass, Horizontal, Vertical ±0–3). CCITTFaxParams is parsed directly from PDF dictionary entries without requiring an ObjectResolver.

Includes 14 unit tests covering params defaults, bit reader, Huffman run-length decoding, EOL handling, 1D rows (all-white, all-black, mixed), 2D Group 4 rows (V0 / Horizontal modes, all-white, row limit), and row packing with both polarities.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CCITTFaxDecode support to pdf-object by introducing a pure-Rust CCITT (Group 3/4) decoder and wiring it into stream filter decoding so CCITT-compressed streams can be decompressed via StreamObject::data().

Changes:

  • Added ccitt module implementing CCITTFaxDecode decoding (Group 3 1D, Group 3 2D, Group 4/MMR) with unit tests.
  • Wired Filter::CCITTFaxDecode into StreamObject::data() and added a /DecodeParms extraction helper.
  • Implemented Filter::decode_ccitt_fax(.., params) as a thin wrapper over the new decoder.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
crates/pdf-object/src/stream.rs Adds CCITTFaxDecode handling in StreamObject::data() and parses /DecodeParms per filter index.
crates/pdf-object/src/lib.rs Exposes the new ccitt module publicly.
crates/pdf-object/src/filter.rs Implements CCITTFaxDecode decode entrypoint and documents its behavior.
crates/pdf-object/src/ccitt.rs New CCITT decoder implementation + parameter parsing + unit tests.
Comments suppressed due to low confidence (1)

crates/pdf-object/src/stream.rs:99

  • StreamObject::data() documents that it errors on unsupported filter chains, but the default match arm prints to stdout and returns partially decoded data. This can mask failures and produce incorrect output for callers. Consider returning an ObjectError for Filter::Unsupported(_) / unknown filters instead of println! + break.
                _ => {
                    println!(
                        "Unsupported filter in data_with_remaining_filter: {:?}",
                        filter
                    );
                    break;
                }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment thread crates/pdf-object/src/stream.rs Outdated
Add `pub mod ccitt` with a pure-Rust CCITTFaxDecode decoder supporting
all three encoding modes: Group 3 1D (K=0), Group 3 2D (K>0), and
Group 4 / MMR (K<0). Wire it into `StreamObject::data()` with a
`ccitt_params_for_filter()` helper that reads `/DecodeParms` for both
single-filter and multi-filter streams.

The implementation embeds Huffman tables from ITU-T T.4 Appendix A
(WHITE/BLACK terminating and make-up codes, common extended make-up
codes) and a T.6 mode-code decoder (Pass, Horizontal, Vertical ±0–3).
`CCITTFaxParams` is parsed directly from PDF dictionary entries without
requiring an `ObjectResolver`.

Includes 14 unit tests covering params defaults, bit reader, Huffman
run-length decoding, EOL handling, 1D rows (all-white, all-black,
mixed), 2D Group 4 rows (V0 / Horizontal modes, all-white, row limit),
and row packing with both polarities.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment on lines +723 to +733
loop {
if rows > 0 && decoded_rows >= rows {
break;
}

// Skip any EOL marker preceding the row (no-op when none present).
skip_eol(&mut reader);

if reader.exhausted() {
break;
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCITTFaxParams::end_of_block is never consulted during decoding. For Group 3/4 streams with rows = 0 (the default), this will treat the RTC/EOFB terminator as image data and can emit a spurious final row (often all-white) or stop mid-row depending on bit patterns. Please add explicit RTC (Group 3) / EOFB (Group 4) detection when end_of_block is true and stop decoding before appending another row; also add unit tests that cover termination behavior.

Copilot uses AI. Check for mistakes.
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment on lines +728 to +730
// Skip any EOL marker preceding the row (no-op when none present).
skip_eol(&mut reader);

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_eol(&mut reader) is called unconditionally before every row, even when params.end_of_line is false (and for Group 4, which normally has no EOLs). This can mis-synchronize decoding if a data bit sequence happens to match the EOL pattern. Consider only consuming a leading EOL when the encoding mode/params require it (e.g., end_of_line for Group 3 1D, mandatory row EOL for Group 3 2D) and avoid scanning for EOLs in Group 4 unless you are explicitly checking for EOFB.

Copilot uses AI. Check for mistakes.
Comment thread crates/pdf-object/src/stream.rs Outdated
Comment on lines +106 to +115
/// Extract [`CCITTFaxParams`][crate::ccitt::CCITTFaxParams] for the filter at
/// `filter_idx` from the stream's `/DecodeParms` dictionary entry.
///
/// Per PDF spec §7.3.8.2, `/DecodeParms` is either a single dictionary (when
/// there is one filter) or an array of dictionaries (one per filter). Values
/// are always inline objects, so no object resolver is needed.
fn ccitt_params_for_filter(dict: &Dictionary, filter_idx: usize) -> crate::ccitt::CCITTFaxParams {
match dict.get("DecodeParms") {
Some(ObjectVariant::Dictionary(d)) => crate::ccitt::CCITTFaxParams::from_dictionary(d),
Some(ObjectVariant::Array(arr)) => arr
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper assumes /DecodeParms values are always inline objects (and ignores ObjectVariant::Reference). In PDF, /DecodeParms can legally be an indirect reference, so this will silently fall back to defaults and potentially decode CCITT data with the wrong parameters (K/Columns/BlackIs1/etc.). Either resolve references here (which likely requires threading an ObjectResolver into StreamObject::data) or detect references and return a clear DecompressionError/update the doc comment to state indirect /DecodeParms is unsupported.

Copilot uses AI. Check for mistakes.
Comment thread crates/pdf-object/src/filter.rs Outdated
Comment on lines +192 to +211
/// Decodes CCITTFaxDecode (Group 3 / Group 4 fax) compressed stream data.
///
/// # Parameters
///
/// - `stream_data`: The compressed byte stream to decode.
/// - `params`: Decode parameters parsed from the stream's `/DecodeParms` dictionary.
///
/// # Returns
///
/// The decompressed image data as a packed MSB-first 1-bit-per-pixel `Vec<u8>`.
///
/// # Errors
///
/// Returns [`ObjectError::DecompressionError`] if the stream is truncated or
/// contains an invalid bit pattern.
pub fn decode_ccitt_fax(
stream_data: &[u8],
params: &crate::ccitt::CCITTFaxParams,
) -> Result<Vec<u8>, ObjectError> {
crate::ccitt::decode(stream_data, params)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decode_ccitt_fax docstring says it returns DecompressionError on truncated/invalid bit patterns, but crate::ccitt::decode() currently only errors for columns == 0 and otherwise decodes best-effort (including on truncation). Please align the documentation with the actual behavior (or change the decoder to return errors as documented).

Copilot uses AI. Check for mistakes.
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment on lines +33 to +38
/// Whether a block terminator (EOFB / RTC) is present. Default: `true`.
pub end_of_block: bool,
/// If `true`, black = 1 and white = 0. Default: `false` (white = 1).
pub black_is1: bool,
/// Tolerated number of damaged rows before returning an error. Default: `0`.
pub damaged_rows_before_error: u32,
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damaged_rows_before_error is parsed from /DecodeParms but never used by the decoder. This is misleading (callers may assume it has an effect). Either implement damaged-row tracking/erroring per the PDF spec, or remove the field / mark it as currently ignored in the docs.

Copilot uses AI. Check for mistakes.
Comment thread crates/pdf-object/src/ccitt.rs Outdated
Comment on lines +787 to +817
// ─── Tests ───────────────────────────────────────────────────────────────────

#[cfg(test)]
mod tests {
use super::*;

fn params_1d(columns: u32, rows: u32) -> CCITTFaxParams {
CCITTFaxParams {
k: 0,
columns,
rows,
end_of_line: false,
encoded_byte_align: false,
end_of_block: true,
black_is1: false,
damaged_rows_before_error: 0,
}
}

fn params_g4(columns: u32, rows: u32) -> CCITTFaxParams {
CCITTFaxParams {
k: -1,
columns,
rows,
end_of_line: false,
encoded_byte_align: false,
end_of_block: true,
black_is1: false,
damaged_rows_before_error: 0,
}
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the unit tests cover EOL handling, mixed 1D rows, and Group 4 2D modes (V0/Horizontal, row limit), but the tests in this file currently only cover fill_bits/find_bit basics, a few simple 1D cases, and a Group 4 empty-stream case (no 2D mode coverage). Either expand the tests to match the stated coverage (especially around 2D decoding and end-of-block handling) or update the PR description to reflect the current test scope.

Copilot uses AI. Check for mistakes.
Velli20 and others added 7 commits April 2, 2026 22:20
…safety

- Introduce CcittDecoder struct encapsulating BitReader, row buffers,
  and decode configuration; eliminates parameter threading across
  decode_1d_row/decode_2d_row
- Extract clamp_to_usize() helper replacing 7 repeated
  usize::try_from(x.max(0)).unwrap_or(0) call sites
- Replace manual iterator loops with idiomatic slice::fill and for_each
- Remove dead EOL-recovery loop in decode_1d_row (always returned Err)
- Move misplaced fill_bits/find_bit/find_b1_b2 tests from bitreader.rs
  into ccitt.rs where those functions are accessible (fixes compilation)
- Add From<CcittDecodeError> for ObjectError; replace .unwrap() with ?
  in filter.rs decode_ccitt_fax to avoid panics on decode failure
- Fix empty-line-after-doc clippy warning
- Fix incorrect G3-2D test constant (0xC8 -> 0xCC)
- Add 12 new edge-case unit tests (clamp_to_usize, ref_pixel,
  one_lead_pos, fill_bits boundaries, find_bit skip path, multi-row,
  error conversion)
- Add doc comments to all public items

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace i32 sentinel (-1) with Option<usize> for the a0 position tracker
in decode_2d_row. This removes all usize↔i32 conversions from the method:

- Change a0 from i32 to Option<usize> (None = before first pixel)
- Change find_b1_b2 to accept Option<usize> instead of i32
- Change decode_run_seq return type from Result<i32> to Result<usize>
- Add apply_delta(usize, i32) helper for signed vertical offsets
- Remove columns_i32 field from CcittDecoder
- Remove clamp_to_usize function and its tests (zero remaining callers)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ticalLeft(usize)

Replace the signed Vertical(i32) variant with two unsigned variants:
- VerticalRight(usize) for non-negative deltas (a1 = b1 + delta)
- VerticalLeft(usize) for negative deltas (a1 = b1 - delta)

This eliminates the apply_delta helper and its usize::try_from
conversions, using checked_add/checked_sub directly at the call site.

Also fix pre-existing issues in ccitt_fax_params.rs:
- Rename DEFFAULT_IMAGE_WIDTH → DEFAULT_IMAGE_WIDTH (typo)
- Change columns field and test helpers from u32 to usize

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Handle monotonicity violations gracefully in decode_2d_row:
  VerticalRight overflow clamps to columns, VerticalLeft underflow
  uses saturating_sub, and position regressions end the row early
  instead of returning an error.
- Make decode_all do best-effort recovery: row-level errors
  (InvalidCode, UnknownExtensionCode, etc.) include the partial row
  and continue; UnexpectedEof includes partial row then stops.
- Wire up damaged_rows_before_error parameter (0 = unlimited per
  PDF spec §7.4.6).
- Add tests for regression leniency, extension tolerance, and the
  damaged-row threshold.

Fixes rendering of PDF32000_2008.pdf which contains CCITT-encoded
images that trigger position regressions during 2D decoding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace ONE_LEAD_POS[256] table with u8::leading_zeros() (single
  hardware instruction on all targets)
- Optimize find_bit to scan 64 bits at a time via u64::from_be_bytes +
  u64::leading_zeros instead of byte-by-byte with 8-byte skip loop
- Simplify fill_bits mask computation: replace branchy if/else with
  branchless shift expressions and clearer start_mask/end_mask naming

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move stream decompression filters (FlateDecode, DCTDecode, JPXDecode,
CCITTFaxDecode, ASCII85Decode) from pdf-object into a new pdf-filter
crate. Streams are now decoded at ObjectCollection insertion time,
simplifying StreamObject to hold only decoded data.

Key changes:
- Add pdf-filter crate with FilterError, Filter enum with Display impl
- Replace println! for unsupported filters with proper error return
- Remove unused pdf-filter dep from pdf-page
- Update StreamObject docs to reflect decoded-at-insertion semantics

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the per-iteration ccitt_params_for_filter() helper with a
parse_decode_params() function that reads /DecodeParms once and returns
a Vec<DecodeParms> aligned 1:1 with the filter list.

- Add DecodeParms enum with None and CcittFax variants
- Add parse_decode_params() to parse all params before the loop
- Refactor decode() loop to zip filters with pre-parsed params
- Delete ccitt_params_for_filter() function

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +12 to +14
/// Stream data is decoded (decompressed) at insertion time by
/// [`ObjectCollection`], so the [`data`](Self::data) field always contains the
/// fully decoded bytes.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type-level docs claim stream data is always decoded at insertion time by ObjectCollection, but StreamObject::new is public and can be constructed with raw (still-filtered) bytes outside of an ObjectCollection context. To avoid misleading API consumers, consider rewording this to describe the behavior when streams are stored in an ObjectCollection (or otherwise clarify that StreamObject itself does not enforce decoding).

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +226
///
/// # Errors
///
/// Returns [`FilterError`] if any filter in the chain fails or is unsupported.
pub fn decode(stream: &StreamObject) -> Result<Cow<'_, [u8]>, FilterError> {
let mut data: Cow<'_, [u8]> = Cow::Borrowed(&stream.data);
let objects = PassthroughResolver;
let filters = Filter::from_dictionary(&stream.dictionary, &objects)?;

let Some(filters) = &filters else {
return Ok(data);
};

let decode_params = parse_decode_params(&stream.dictionary, filters, &objects);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode() currently hard-codes PassthroughResolver, so any /Filter value that is an indirect reference (or contains indirect references) will not be resolved and filter parsing will fail or silently degrade. Consider taking an &dyn ObjectResolver (or providing a decode_with_resolver) so callers with an ObjectCollection can correctly resolve referenced filter metadata.

Suggested change
///
/// # Errors
///
/// Returns [`FilterError`] if any filter in the chain fails or is unsupported.
pub fn decode(stream: &StreamObject) -> Result<Cow<'_, [u8]>, FilterError> {
let mut data: Cow<'_, [u8]> = Cow::Borrowed(&stream.data);
let objects = PassthroughResolver;
let filters = Filter::from_dictionary(&stream.dictionary, &objects)?;
let Some(filters) = &filters else {
return Ok(data);
};
let decode_params = parse_decode_params(&stream.dictionary, filters, &objects);
/// For streams whose filter metadata may contain indirect references, use
/// [`decode_with_resolver`] to provide an [`ObjectResolver`].
///
/// # Errors
///
/// Returns [`FilterError`] if any filter in the chain fails or is unsupported.
pub fn decode(stream: &StreamObject) -> Result<Cow<'_, [u8]>, FilterError> {
let objects = PassthroughResolver;
decode_with_resolver(stream, &objects)
}
/// Decodes a [`StreamObject`] by applying its full filter chain using the
/// provided [`ObjectResolver`] for `/Filter` and `/DecodeParms` metadata.
///
/// This should be used when the stream dictionary may contain indirect
/// references that need to be resolved before parsing the filter chain.
///
/// # Errors
///
/// Returns [`FilterError`] if any filter in the chain fails or is unsupported.
pub fn decode_with_resolver(
stream: &StreamObject,
objects: &dyn ObjectResolver,
) -> Result<Cow<'_, [u8]>, FilterError> {
let mut data: Cow<'_, [u8]> = Cow::Borrowed(&stream.data);
let filters = Filter::from_dictionary(&stream.dictionary, objects)?;
let Some(filters) = &filters else {
return Ok(data);
};
let decode_params = parse_decode_params(&stream.dictionary, filters, objects);

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +305
let params_entry = dict.get("DecodeParms");

// Pre-extract per-index dictionaries from the `/DecodeParms` value.
let param_dicts: Vec<Option<&Dictionary>> = match params_entry {
Some(ObjectVariant::Dictionary(d)) => {
// Single dict applies to all filters (common single-filter case).
vec![Some(d); filters.len()]
}
Some(ObjectVariant::Array(arr)) => filters
.iter()
.enumerate()
.map(|(i, _)| {
arr.get(i).and_then(|v| {
if let ObjectVariant::Dictionary(d) = v {
Some(d.as_ref())
} else {
None
}
})
})
.collect(),
_ => vec![None; filters.len()],
};

filters
.iter()
.zip(param_dicts.iter())
.map(|(filter, param_dict)| match (filter, param_dict) {
(Filter::CCITTFaxDecode, Some(d)) => {
let p = CCITTFaxParams::from_dictionary(d, objects).unwrap_or_default();
DecodeParms::CcittFax(p)
}
(Filter::CCITTFaxDecode, None) => DecodeParms::CcittFax(CCITTFaxParams::default()),
_ => DecodeParms::None,
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_decode_params only handles /DecodeParms when it is embedded directly as a Dictionary/Array in the stream dictionary; it does not resolve references (e.g. /DecodeParms 12 0 R or array elements that are references/null). This can cause CCITT decoding to use default parameters even when the PDF provides explicit decode parms. Consider resolving the /DecodeParms object (and array elements) via objects.resolve_object(...) before pattern matching.

Suggested change
let params_entry = dict.get("DecodeParms");
// Pre-extract per-index dictionaries from the `/DecodeParms` value.
let param_dicts: Vec<Option<&Dictionary>> = match params_entry {
Some(ObjectVariant::Dictionary(d)) => {
// Single dict applies to all filters (common single-filter case).
vec![Some(d); filters.len()]
}
Some(ObjectVariant::Array(arr)) => filters
.iter()
.enumerate()
.map(|(i, _)| {
arr.get(i).and_then(|v| {
if let ObjectVariant::Dictionary(d) = v {
Some(d.as_ref())
} else {
None
}
})
})
.collect(),
_ => vec![None; filters.len()],
};
filters
.iter()
.zip(param_dicts.iter())
.map(|(filter, param_dict)| match (filter, param_dict) {
(Filter::CCITTFaxDecode, Some(d)) => {
let p = CCITTFaxParams::from_dictionary(d, objects).unwrap_or_default();
DecodeParms::CcittFax(p)
}
(Filter::CCITTFaxDecode, None) => DecodeParms::CcittFax(CCITTFaxParams::default()),
_ => DecodeParms::None,
let params_entry = dict
.get("DecodeParms")
.and_then(|value| objects.resolve_object(value).ok());
filters
.iter()
.enumerate()
.map(|(i, filter)| {
let param_dict = match params_entry.as_deref() {
Some(ObjectVariant::Dictionary(d)) => Some(d.as_ref()),
Some(ObjectVariant::Array(arr)) => arr
.get(i)
.and_then(|value| objects.resolve_object(value).ok())
.and_then(|resolved| match resolved.as_ref() {
ObjectVariant::Dictionary(d) => Some(d.as_ref()),
ObjectVariant::Null => None,
_ => None,
}),
_ => None,
};
match (filter, param_dict) {
(Filter::CCITTFaxDecode, Some(d)) => {
let p = CCITTFaxParams::from_dictionary(d, objects).unwrap_or_default();
DecodeParms::CcittFax(p)
}
(Filter::CCITTFaxDecode, None) => {
DecodeParms::CcittFax(CCITTFaxParams::default())
}
_ => DecodeParms::None,
}

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +86
ObjectVariant::Stream(stream) => {
let data = pdf_filter::filter::decode(&stream)?.to_vec();
let StreamObject {
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decoding streams during ObjectCollection::insert means the persisted stream data depends on pdf_filter::filter::decode, which currently cannot resolve indirect /Filter or /DecodeParms entries (it uses a passthrough resolver). This will store incorrectly decoded bytes for PDFs that use referenced filter metadata. Consider deferring decoding until the collection is populated (second pass) or otherwise providing decode with a resolver that can follow references in the collection.

Copilot uses AI. Check for mistakes.
Comment on lines +390 to +426
struct CcittDecoder<'a> {
reader: BitReader<'a>,
ref_row: Vec<u8>,
row_buf: Vec<u8>,
columns: usize,
k: i32,
end_of_line: bool,
byte_align: bool,
black_is1: bool,
/// Maximum number of damaged rows before aborting.
/// `0` means tolerate all damaged rows (PDF spec §7.4.6).
damaged_rows_before_error: u32,
}

impl<'a> CcittDecoder<'a> {
/// Construct a new decoder from raw stream data and CCITT parameters.
///
/// # Errors
///
/// Returns [`CcittDecodeError::ZeroColumns`] if `params.columns` is zero.
fn new(data: &'a [u8], params: &CCITTFaxParams) -> Result<Self, CcittDecodeError> {
if params.columns == 0 {
return Err(CcittDecodeError::ZeroColumns);
}
let row_bytes = params.columns.div_ceil(8);

Ok(Self {
reader: BitReader::new(data),
ref_row: vec![0xff; row_bytes],
row_buf: vec![0xff; row_bytes],
columns: params.columns,
k: params.k,
end_of_line: params.end_of_line,
byte_align: params.encoded_byte_align,
black_is1: params.black_is1,
damaged_rows_before_error: params.damaged_rows_before_error,
})
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCITTFaxParams includes end_of_block, but the decoder never stores or consults it, so the implementation cannot stop cleanly on RTC/EOFB markers when Rows = 0 (the default). This can lead to extra garbage rows or spurious row-level errors after the actual image data ends. Consider wiring end_of_block into CcittDecoder and terminating decoding when the appropriate end-of-block marker is encountered.

Copilot uses AI. Check for mistakes.
TwoDMode::Horizontal => {
let run_len1 = decode_run_seq(&mut self.reader, a0color)?;
let (a0_fill, a1) = match a0 {
None => (0, run_len1.saturating_add(1)),
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 2D Horizontal mode, the a0 == None case sets a1 = run_len1 + 1, which is inconsistent with the Some(pos) branch (a1 = pos + run_len1) and is very likely an off-by-one that shifts the first pair of runs by one pixel when Horizontal is the first opcode in a row. Consider making the None case follow the same coordinate convention as the Some case so a1 advances by exactly run_len1 pixels from the start of the line.

Suggested change
None => (0, run_len1.saturating_add(1)),
None => (0, run_len1),

Copilot uses AI. Check for mistakes.
}

TwoDMode::EndOfBlock => {
// First EOL of EOFB; bits already consumed in read_2d_mode.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TwoDMode::EndOfBlock is currently a no-op, so the 2D row decoder loop continues without advancing a0/a0color, which can produce incorrect output instead of terminating at the EOFB marker. If end-of-block markers are supported, consider treating this mode as an explicit termination condition for the current row/stream (e.g. return Ok(()) and let the outer loop stop).

Suggested change
// First EOL of EOFB; bits already consumed in read_2d_mode.
// EOFB marker consumed by read_2d_mode; terminate decoding
// instead of continuing with stale a0/a0color state.
return Ok(());

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +92
#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_decode_ascii85_basic() {
// "Man " → known ASCII85 encoding "9jqo^"
let man_encoded = b"9jqo^~>";
let decoded = decode_ascii85(man_encoded).expect("decode failed");
assert_eq!(decoded, b"Man ");
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASCII85 tests use expect(...) without a #[allow(clippy::expect_used)] / #[allow(clippy::unwrap_used)] guard. Since the workspace Clippy config denies expect_used/unwrap_used, cargo clippy --all-targets will likely fail. Consider adding the appropriate #[allow(...)] on the tests module (as done elsewhere in the repo) or rewriting tests to return Result and use ? instead of expect/unwrap.

Copilot uses AI. Check for mistakes.
Comment on lines +662 to +668
#[cfg(test)]
mod tests {
use super::*;

#[test]
fn fill_bits_full_byte() {
let mut row = [0xffu8; 1];
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CCITT test module uses expect(...) / unwrap_err() without a #[allow(clippy::expect_used)] / #[allow(clippy::unwrap_used)] guard. Given the workspace Clippy config denies these lints, this will likely fail cargo clippy --all-targets. Consider adding the allow attributes on the tests module or rewriting tests to avoid expect/unwrap.

Copilot uses AI. Check for mistakes.
@Velli20 Velli20 merged commit e15d813 into main Apr 5, 2026
8 checks passed
@Velli20 Velli20 deleted the ccitt-fax-filter branch April 5, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants