In our company we are using Doc2Data for transforming structured machine-readable documents (Word, Excel, PDF) to data structure within JSON file. This tool could be used as a Java library, command line tool or microservice providing REST API.
Free version of the service that you could play with is available from https://d2d.work
This repository contains meta-mapping that could be used for some set of documents that you could find in the Internet. As well you could find a good teaching base for creating meta-mapping files for your own structured machine-readable documents.
Enjoy! :)