Skip to content

v0.1.60

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 17 Mar 17:00
· 52 commits to main since this release

What's new

Commits

dd72563 Bump version to v0.1.60 for release
baa0082 Don't go down too low in temp
f2951f3 Lints
1e42e5e Faster and nicer equation cache
1f8cc59 Pipeline scales temperature automatically, increases performance ~2%
4768ac4 Merge branch 'main' of https://github.com/allenai/olmocr
0968bd1 Mine headers footers
1270ca3 lints
d7361c4 Basic convert script
142a9cb Convert script to support broader folder structures
98c4283 Cap max workers to hopefully improve stability
5f3ef51 Faster equation cache and checking, cleanup data script
79e2677 Hmm, these should be passing!
f5d92bd Trying to get new CI to work
1db1b34 Merge pull request #122 from allenai/gpu-ci
9f38a8a Lints
5009bb3 Lints
acb0df3 Fixes
3eec2a8 Mining math
95f03e1 More small tests
d30a070 Tests
2696502 Much faster and responsive math bench
980121f Loading tests much faster in parallel
7729e5a Graphical pdf test from github
154a07c Math miner looks decent
d0b9b5b Fixes for math mining
09fd299 Mining
3f92265 Math miner working decently
5387a79 More tests for olmocrbench
189104b Fixing escaped html bug in mathml parsing
770bc36 Fixes for multipage
0553443 Convert scripts and other fun
8b3a9e4 Fixes for multipage runners
743e48e More fixes
b2fe82d Working on math compares
bc3a945 Adding some tests
35cc6f1 A few fixes for text comparisons and normalized chars
4709156 Leaving with some more data, but still cases to investigate
07be9ea More math testing
e39c3e4 New method for comparing equations
fff4050 More test documents
0ba56c0 Adjusting repeat test to be the "baseline" test which also looks for disallowed characters
a2b5ca8 Better markdown table parsing
3fef3f9 Gemini support, some debugging stuff
fc857f9 Starting on math dataset
d006e8f Working on equation matching
7003e9c Working on a better compare function
e144200 Fix markdown parsing for mistral
bdc0d75 Adding mistral ocr to eval
4053ea5 Work on image matching
b03d840 Better error handling on eqn rendering
438e68e Some more math stuff
7f36ac8 First math tests
b62ccc2 Equation rendering code, first pass
9be696f Adding a trailing repetition test
07466e1 Stats tests
eeb2733 Marker rerun, stats changes
50e55f4 Conversion fixes
fb0a729 Better convert script
fa68c6b Better conversion script, run on more things
c9ecd8e Need those chat templates
5611d79 Model runners
5cb32c3 Convert script work with server backends
87875b3 Merge branch 'main' of https://github.com/allenai/olmocr into main
2982526 Convert scripts for benchmark
1545a6d Adding more work on diffs
004486f Nice tables support
3a0bcb6 Better table tests
748fd62 Adding basic table relative tests
76476f9 Synth rendering ideas
c4f6b11 Fixing the mine diffs script, but it still doesn't work great
fcb1eab Consistent ordering on convert, with data dir script
ecac384 Making a nicer warning message when waiting for sglang server
03ef353 One last lint fix
7d7e81e Internal version bump
7a7c878 double parentheses for proper escaping
dc7cb5c Ruff fixes to CI
1348a29 Merge branch 'main' of https://github.com/allenai/olmocr into main
ca0f911 Probably need at least 20GB GPU ram to have a good time with olmocr
2241853 Merge branch 'main' of https://github.com/allenai/olmocr into main
a701a37 Fix for calling --pdfs with an invalid pdf
622540e Fix so that the pipeline.py attempts to download the model weights first, before starting the loading timeout
010fdf8 Small fix
7dd44ed convert script
701abdb Some new entries
1148b47 Minor fixes
361ed2a Merge branch 'main' of https://github.com/allenai/olmocr into main
9f12917 Organizing things for data entry
af02c63 Working viewer
8061aac Working on viewer/editor for rules
ab13ac6 Mining diff script outputs candidate rules
99ab046 Autominer work
143769b Merge pull request #61 from allenai/kylel/elo
1b78ec9 More work on automining
3670219 commits
2d4c1a1 Merge branch 'main' of https://github.com/allenai/olmocr into main
a03673e Working on some progress for the autominer, fixing more options in convert script
11e89dc Script fixups
505e08c automine draft
ae7efd3 Refactoring
9e019f1 More factoring
bd08fdb fixes missing OSS code for Issue #36
d4b902c Olmocr runner implemented
aac0c15 chatgpt converter
8a6e8b9 Basic rule viewer
9081f7f Update README.md
0130a97 fixed style
c2b54d8 updated readme
d841216 Merge branch 'main' of https://github.com/allenai/olmocr into main
813a355 Fixing mineru runner, added a few sample docs
cc1f476 Bugfixes
9da1f92 Cleaner implementations of benchmark stuff
53494d9 Refactoring
ff465f7 Starting refactor
a348cd6 olmocr bench runner
c20e3c0 Pdf for dataset
16a3244 olmocr running
422d08f Adding more rules and seeing how they should work
f2f7619 Adding mineru script
e5a80c5 Fixing up benchmark a bit
c3d0ce9 Some readmes and instructions
4e0339f Runner for olmocr bench
a8f6921 Benchmark runners for other systems
318abf2 Adding runbench
1230aef Making progress
072bc1d Making some progress
823629d Sample code for olmocrbench
9e62003 Adding readme for olmocr bench
e4f9b19 Infinigram counting script for paper
6020122 Match script
b871e4b Small helper to measure overlap