Skip to content

Docker container to extract text from any document using textract

Notifications You must be signed in to change notification settings

dmitriym09/textract

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Extract text from a binary file/image/other text formats

Credits

  1. textract

How to use this docker image

The docker image uses /data folder as a volume where document will be read/written. Hence the user needs to provide the folder that would be mapped to /data

For example, Download BookReporter.pdf file to the Downloads folder of your home directory (~/Downloads)

To extract text from BookReporter.pdf and save it to file BookReporter.txt, run

docker run \
--rm \
-v "`pwd`:/data" \
kunalshah/textract:latest \
-o converted.txt \
file.pdf

See converted text file

cat converted.txt

About textract

CLI options

Read here

Supported file types

Read here

About

Docker container to extract text from any document using textract

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Dockerfile 100.0%