Add Markdown export and bulk export workflow#19
Merged
Conversation
Adds a third export format alongside the existing TXT/ODF: the conclusion page now offers .md output through a unified Gtk.MenuButton (replacing the standalone ODF icon), and the CLI gains an `export-md` subcommand with a `--front-matter` flag. Conversion reuses the existing pdftotext+LayoutAnalyzer pipeline: headings become ATX prefixes, tables render as GitHub-flavored pipe tables, key-value lines bold the key, and paragraphs are escaped only where Markdown actually changes meaning (inline emphasis/links/pipes, plus line-start block markers) so CPF/phone hyphens stay readable. The MenuButton uses Gio.Menu + Gtk.PopoverMenu for native keyboard navigation and screen-reader semantics. Single-file MD export shows the same cancellable progress dialog the ODF flow already had, backed by a shared _build_progress_dialog helper. Front-matter dates are emitted in UTC so they stay deterministic across timezones. 20 new tests cover the conversion: escape rules, table cells with Markdown specials, YAML control-char stripping, front-matter, cancel_event propagation, _unique_path auto-suffix.
Lets the user export several OCR'd files at once. A toggle in the "Generated Files" card header switches the row into a selection view: per-file actions are hidden, every row gains a checkbox, and a bottom bar appears with "Select all", "Clear" and an "Export selected ▾" menu mirroring the per-file format choices (ODT, MD). The bulk pipeline picks a destination folder up front, then walks the selection in a background thread with a cancellable progress dialog (reusing the same _build_progress_dialog helper from the single-file flow). Name collisions are resolved by auto-suffixing "(1)", "(2)" so the run never silently overwrites a file the user didn't pick. The destination is pre-checked for existence and write permission so a read-only folder fails immediately with a clear toast instead of N per-file errors. Cancellation mid-file is honored by both converters (convert_pdf_to_odf already supported it; convert_pdf_to_markdown gained cancel_event in the previous commit). The bulk worker distinguishes ExportCancelled from a real failure so the summary toast reports counts correctly. The bulk menu uses the same Gio.Menu + Gtk.PopoverMenu pattern as the per-row export button for consistent keyboard navigation.
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Security | 50 high |
🟢 Metrics 150 complexity · 8 duplication
Metric Results Complexity 150 Duplication 8
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Static-analysis cleanup driven by the bots that ran on the PR:
- Split create_markdown() into per-element emitters (_emit_heading,
_emit_table, _emit_kv, _emit_paragraph) plus a tiny dispatch helper,
bringing Cognitive Complexity from 37 down under the 15 threshold.
- Split _bulk_export_worker by extracting _bulk_convert_one (single-file
conversion) and _safe_remove (partial-output cleanup) plus a
_BULK_EXTENSIONS map for format → suffix, bringing the worker CC from
25 down under 15. The try/except/else flow also became clearer.
- Replaced bare logger.error in the bulk except path with
logger.exception("Bulk export failed for %s", pdf_path) so the
traceback ends up in the log.
- Module-level constants for duplicated literals:
* _EXPORT_FAILED_MSG = _("Export failed") (4 callers)
* _NOTIFY_ACTIVE = "notify::active" (4 GTK signal binds)
* input_pdf_with_text_help in build_parser (3 subcommands)
- Tests: replaced hardcoded "/tmp/*.pdf" mock paths with
os.path.join(tempfile.gettempdir(), …) via a _mock_pdf_path helper —
the path is never touched (parse_tsv_pages is mocked), but Sonar's
"publicly writable directory" rule is happy now. Generic
Exception(...) test mocks tightened to RuntimeError(...).
No behavior changes, no public API changes — 358/358 tests still pass,
ruff clean, MD output diff-identical against pre-refactor fixtures.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Two related additions to the post-OCR conclusion page (and the CLI):
The menus use `Gio.Menu` + `Gtk.PopoverMenu` so keyboard navigation (Up/Down/Enter) and screen-reader semantics come for free. A shared `_build_progress_dialog` helper backs both single-file and bulk flows; `convert_pdf_to_markdown` accepts a `cancel_event` so MD batches respond to Cancel mid-file (matching the ODF contract).
Two commits, separable for review:
Test plan
Notes