OCR PDF to TXT

Colab script for converting PDF documents into .txt format.

It requires a fresh install of anaconda into Colab and a bunch of other packages. The process pretty much builds a new virtual machine just for use with tesseract. Once everything is built it is generally pretty quick in converting the pdf into text. This might work better on a docker with more capacity or on an actual computer with a gpu and lots of ram. I try to scale my projects around the capabilities of Colab.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
OCR_pdf_text.ipynb		OCR_pdf_text.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

LICENSE

LICENSE

OCR_pdf_text.ipynb

OCR_pdf_text.ipynb

README.md

README.md

Repository files navigation

OCR PDF to TXT

About

Releases

Packages

Languages

License

AlejandroBeltranA/OCR-PDF-to-TXT

Folders and files

Latest commit

History

Repository files navigation

OCR PDF to TXT

About

Resources

License

Stars

Watchers

Forks

Languages