Skip to content

Add support for docling reading local files#417

Merged
ppinchuk merged 111 commits into
mainfrom
pp/local_docling
May 7, 2026
Merged

Add support for docling reading local files#417
ppinchuk merged 111 commits into
mainfrom
pp/local_docling

Conversation

@ppinchuk
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk commented May 6, 2026

Also bump elm dep, which brings c4ai dep to the latest

@ppinchuk ppinchuk self-assigned this May 6, 2026
Copilot AI review requested due to automatic review settings May 6, 2026 17:12
@ppinchuk ppinchuk requested a review from castelao as a code owner May 6, 2026 17:12
@ppinchuk ppinchuk added enhancement Update to logic or general code improvements new computation Update that adds a new computation method topic-python-general Issues/pull requests related to python p-high Priority: high dependencies Issues/pull requests related to a dependency labels May 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2026

Codecov Report

❌ Patch coverage is 23.52941% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.99%. Comparing base (c577f03) to head (d91e685).

Files with missing lines Patch % Lines
compass/web/file_loader.py 26.08% 17 Missing ⚠️
compass/services/cpu.py 10.00% 9 Missing ⚠️

❌ Your patch status has failed because the patch coverage (23.52%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #417      +/-   ##
==========================================
- Coverage   55.24%   54.99%   -0.26%     
==========================================
  Files          62       62              
  Lines        5847     5870      +23     
  Branches      543      546       +3     
==========================================
- Hits         3230     3228       -2     
- Misses       2569     2593      +24     
- Partials       48       49       +1     
Flag Coverage Δ
unittests 54.99% <23.52%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ppinchuk ppinchuk merged commit e0575ac into main May 7, 2026
30 checks passed
@ppinchuk ppinchuk deleted the pp/local_docling branch May 7, 2026 17:00
rajeee pushed a commit that referenced this pull request May 27, 2026
* Fix command

* Bump elm version

* Minor prompt update

* Update lockfile

* Fix linter

* Documentation updates

* First pass of GHP schema

* Add basic plugin config

* Wire up GHP plugin

* Have function return created class

* Clarification for noise

* Clarification for setbacks

* Clarification

* Use general guidance instead

* Add clarification to definitions

* Single row instruction

* Add clarification

* Allow nulls

* Add instruction

* update instructions

* Update instructions around null

* Tighten schema

* Updates to schema

* Add debug statements

* Update prompt

* Add logging

* More logging

* Update descriptions

* Update instructions

* Add clarification

* Add info

* Add more info to logger

* Add task ids

* Trimmed

* Update schema

* Update prompt

* Generalize implementation of `_get_model_config` and use it

* Update logging statement

* Change logging level

* Fix import

* Align playwright versions

* Fix bug in llm config retrieval

* Provide additional context even if user submits prompt

* Fix pandas link

* WIP

* Fix env

Co-authored-by: Copilot <copilot@github.com>

* ELM updates (WIP)

* Minor update

* Minor cleanup

* Minor cleanup

* Rename func

* Reduce redundancy

* Minor refactor

* CLarify argument

* Add missing docs

* Add docling support

* Link to docling docs

* Suppress numpy NaNmean warnings

Co-authored-by: Copilot <copilot@github.com>

* Add docling-based file loaders

* Add `to_md_kwargs`

* Include docling in logs

* Minor updates

* Use `COMPASSWebFileLoader`

* Remove bad func

* Change to FIleLoader

Co-authored-by: Copilot <copilot@github.com>

* Add tests for env

Co-authored-by: Copilot <copilot@github.com>

* Fix tests

* Fix test

* Fix for clarity

* MInor fix

* Pull from env var

* Extra logging

* Write files with correct extension

* Minor cleanup

* Use multiprocessing queue

* Move subprocessing logging messages to `main.log`

* Fix tests

* Bump elm dep

* Default to elm backend for now

* FIx docs

* Bug fix

* Delay import

* Revert change

* Update env

Co-authored-by: Copilot <copilot@github.com>

* MInor change

Co-authored-by: Copilot <copilot@github.com>

* Try fix rust env

Co-authored-by: Copilot <copilot@github.com>

* Cleanup deps sightly

Co-authored-by: Copilot <copilot@github.com>

* No frozen in CI

* Add mac intel to tests

* Adjust test

* update tox tests

* Update deps

* Update openai dep

* No docling on Python 3.13 MacOS intel

Co-authored-by: Copilot <copilot@github.com>

* Try fix

* Break out pixi toml

* Fix build

* Update package name

* Use pixi.toml file for trigger

* Bump elm dep

* Add local loader based on docling

* Fix tests

* Update lockfile

* OCR is now only enabled if user gives tesseract path

* Skip heavy tests in GHA

* GITHUB_ACTIONS = true when running tox tests

---------

Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Issues/pull requests related to a dependency enhancement Update to logic or general code improvements new computation Update that adds a new computation method p-high Priority: high topic-python-general Issues/pull requests related to python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants