Skip to content
Permalink
Browse files

Fix every dead links in every markdown files

  • Loading branch information
BinaryBrain committed Feb 5, 2020
1 parent efe9875 commit 750fb20488775e24b544d4015751d7249a14a253
@@ -10,7 +10,7 @@ Here's a quick guide to create a pull request:

$ git clone git@github.com:YOUR-GITHUB-USERNAME/Parsr.git

3. Create a new branch in your git repository (branched from `develop` - see [Notes about branching](#notes-about-branching) below).
3. Create a new branch in your git repository (branched from `develop` - see [Notes about branching](#branching) below).

$ cd Parsr/
$ git checkout develop
@@ -31,7 +31,7 @@ Parsr takes as input an image (.JPG, .PNG, .TIFF, ...) or a PDF generates the fo
- JSON
- Markdown
- Text
- CSV (for tables), or Pandas Dataframes (see [here](demo/jupyter-notebook/parsr_api.py))
- CSV (for tables), or Pandas Dataframes (see [here](demo/jupyter-notebook/parsr_client.py))
- PDF

## Table of Contents
@@ -85,7 +85,7 @@ Consult the documentation on the [usage of the API](docs/api-guide.md).

Refer to the [Configuration documentation](docs/configuration.md) to interpret the configurable options in the GUI viewer.

The [API based usage](docs/usage.md#13-api) and the [command line usage](docs/usage.md#123-command-line-usage) are documented in the [advanced usage](docs/usage.md) guide.
The [API based usage](docs/usage.md#3-api) and the [command line usage](docs/usage.md#23-command-line-usage) are documented in the [advanced usage](docs/usage.md) guide.

## Documentation

@@ -102,7 +102,7 @@ Third Party Libraries licenses for its [dependencies](docs/dependencies.md):
1. **QPDF**: Apache [http://qpdf.sourceforge.net](http://qpdf.sourceforge.net/)
2. **GraphicsMagick**: MIT [http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)
3. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pfminer.six/blob/master/LICENSE)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
5. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
6. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
7. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
@@ -32,7 +32,7 @@ Parsr prend en entrée une image (.jpg, .png, .tiff, ...) ou un pdf et génère
- JSON
- Markdown
- Texte
- CSV (pour les tableaux) ou Pandas Dataframes (voir [ici](demo/jupyter-notebook/parsr_api.py))
- CSV (pour les tableaux) ou Pandas Dataframes (voir [ici](demo/jupyter-notebook/parsr_client.py))
- PDF

## Table des matières
@@ -88,7 +88,7 @@ Consultez la documentation sur [l'utilisation de l'API](docs/api-guide.md).

Reportez-vous à la [Documentation de configuration](docs/configuration.md) pour interpréter les options configurables dans l'interface graphique.

[Utilisation basée sur l'API](docs/usage.md#13-api) et [utilisation en ligne de commande](docs/usage.md#123-command-line-usage) sont documentées dans [utilisation avancée](docs/usage.md).
[Utilisation basée sur l'API](docs/usage.md#3-api) et [utilisation en ligne de commande](docs/usage.md#23-command-line-usage) sont documentées dans [utilisation avancée](docs/usage.md).

## Documentation

@@ -105,7 +105,7 @@ Licences de bibliothèques tierces pour ces [dépendances](docs/dependencies.md)
1. **QPDF**: Apache [http://qpdf.sourceforge.net](http://qpdf.sourceforge.net/)
2. **GraphicsMagick**: MIT [http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)
3. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pfminer.six/blob/master/LICENSE)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
5. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
6. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
7. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
@@ -8,7 +8,7 @@

它为用户提供了结构化且标记完全的信息集,适用于包括数据输入和文档分析自动化,存档等即用型应用程序。

- [Parsr: 从文档到数据,一步到位!](#parsr-从文档到数据一步到位)
- [Parsr: 从文档到数据,一步到位!](#parsr-从文档到数据,一步到位)
- [1. 开始 / 安装](#1-开始--安装)
- [1.1. 通过 Docker 安装](#11-通过-docker-安装)
- [1.2. 直接安装](#12-直接安装)
@@ -210,7 +210,7 @@ npm install

该工具包含一系列模块,可逐步处理文档,并且具有高度可配置性。

要更改它的默认配置,请参阅 [配置文档](docs/configuration-file.md).
要更改它的默认配置,请参阅 [配置文档](docs/configuration.md).

#### 2.2.2. 演示: Web Viewer

@@ -328,7 +328,7 @@ Parsr 默认的 OCR 解决方案是 tesseract,这是 Parsr 的基本依赖。
1. **QPDF**: Apache [http://qpdf.sourceforge.net](http://qpdf.sourceforge.net/)
2. **GraphicsMagick**: MIT [http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)
3. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pfminer.six/blob/master/LICENSE)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
5. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
6. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
7. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
@@ -10,7 +10,7 @@ Run it with:
$ python3 echo-module.py
```

Then call it in your [configuration](../../docs/configuration-file.md):
Then call it in your [configuration](../../docs/configuration.md):

```json
[

This file was deleted.

@@ -42,7 +42,7 @@ The API has an endpoint prefix `/api` and then, optionally, the version number `
- `/api/v1`: will use the latest API version 1.x
- `/api`: will use the latest API version

## 1. Send Your Document: [POST /document](https://axatechlab.github.io/Parsr/docs/api.html#api-Input-postDocument)
## 1. Send Your Document: `POST /document`

First of all, you need to do a POST request to send the document to Parsr. Along that, you need to send the configuration to tell Parsr what kind of processing it must perform on the file.

@@ -70,7 +70,7 @@ The document you sent has been accepted and is being processed. The body contain

This error means the file format you sent is not supported by the platform (it's probably not a PDF or an Image).

## 2. Get the queue status: [GET /queue/{id}](https://axatechlab.github.io/Parsr/docs/api.html#api-Processing-getQueueStatus)
## 2. Get the queue status: `GET /queue/{id}`

This request allows you to get the status of the queued document being processed. You need to give it the **queue ID** that was return in the previous request.

@@ -124,10 +124,10 @@ This error means that something went terribly wrong on the backend, probably an

You can have results in different formats:

- JSON: [GET /json/{id}](https://axatechlab.github.io/Parsr/docs/api.html#api-Output-getJson)
- Markdown [GET /markdown/{id}](https://axatechlab.github.io/Parsr/docs/api.html#api-Output-getMarkdown)
- Raw text [GET /text/{id}](https://axatechlab.github.io/Parsr/docs/api.html#api-Output-getText)
- CSV [GET /csv/{id}](https://axatechlab.github.io/Parsr/docs/api.html#api-Output-getCsvList)
- JSON: `GET /json/{id}`
- Markdown: `GET /markdown/{id}`
- Raw text: `GET /text/{id}`
- CSV: `GET /csv/{id}`

These requests allow you to get the results of the processed document. You need to give it the **queue ID** that was return in a previous request.

@@ -158,7 +158,7 @@ For more information on the JSON format, please [refer to the specific guide](js

This error means that the result file doesn't exist. Maybe it wasn't asked to be outputted in the config you sent in the first request.

### 3.2. CSV List of Files: [GET /csv/{id}](https://axatechlab.github.io/Parsr/docs/api.html#api-Output-getCsvList)
### 3.2. CSV List of Files: `GET /csv/{id}`

Since you can have multiple tables per page, you need to query them in two steps:

@@ -186,7 +186,7 @@ curl -X GET \

This error means that the result file doesn't exist. Maybe it wasn't asked to be outputted in the config you sent in the first request.

### 3.3. CSV File: [GET /csv/{id}/{page}/{table}](https://axatechlab.github.io/Parsr/docs/api.html#api-Output-getCsv)
### 3.3. CSV File: `GET /csv/{id}/{page}/{table}`

Then, we can get the CSV files one by one with the following parameters:

@@ -79,7 +79,7 @@ Different extractors are available for each input file format.
- **Images:** four OCR extractors are supported for images:
- `tesseract` which is an Open Source OCR software,
- `abbyy`, that relies on ABBYY Finereader, a paid solution for OCR on documents and images,
- `google-vision`, which uses the [Google Vision](https://cloud.google.com/vision/) API to detect the contents of an image (see the [google vision documentation for more](google-vision.md)),
- `google-vision`, which uses the [Google Vision](https://cloud.google.com/vision/) API to detect the contents of an image (see the [google vision documentation for more](../server/src/input/google-vision/README.md)),
- `ms-cognitive-services`, that uses [Microsoft Cognitive Services](https://azure.microsoft.com/es-es/services/cognitive-services/) OCR to detect and process text inside an image.
- `amazon-textract`, that uses [Amazon Textract](https://us-east-2.console.aws.amazon.com/textract/home) service to detect and process text inside an image.
### 2.2. Language
@@ -10,7 +10,7 @@ Sets the `isHeader` or `isFooter` property flags for each one of the elements de

## Dependencies

[Lines To Paragraph Module](lines-to-paragraph-module.md)
[Lines To Paragraph Module](../LinesToParagraphModule/README.md)

## How it works

@@ -8,7 +8,7 @@ Detects the Hierarchy within a document, respecting indentation of element block

## Dependencies

[Lines To Paragraph Module](lines-to-paragraph-module.md)
[Lines To Paragraph Module](../LinesToParagraphModule/README.md)

## How it works

@@ -10,7 +10,7 @@ Generates new `metadata` instances for each match of a key-value pair which vali

## Dependencies

[Words to Line Module](words-to-line-module.md)
[Words to Line Module](../WordsToLineModule/README.md)

## How it works

@@ -48,11 +48,11 @@ You can copy the [template module file](TemplateModule/README.md) to help you ha

### 2.2. Add to Register

To add your newly created module to the register, simply open the [Cleaner file](../../server/src/Cleaner.ts) `/server/src/Cleaner.ts` and add your module class to the `Cleaner.cleaningToolRegister` attribute.
To add your newly created module to the register, simply open the [Cleaner file](../Cleaner.ts) `/server/src/Cleaner.ts` and add your module class to the `Cleaner.cleaningToolRegister` attribute.

### 2.3. Add it to the Configuration

If you want your module to run you need to enable it in your [configuration](../../docs/configuration.md#3-Cleaner-Config).
If you want your module to run you need to enable it in your [configuration](../../../docs/configuration.md#3-Cleaner-Config).

Simply add a line in the `cleaner` array with the name of your module, and potential options.

@@ -4,12 +4,12 @@

This module is a bit different than others, because it doesn't change the document by itself.

It exports the document as [JSON](../json-output.md), call an API with it and expect a modified JSON back.
It exports the document as [JSON](../../../../docs/json-output.md), call an API with it and expect a modified JSON back.

## How to use it

First of all, you need to have a small web server that will handle the API call.
You can use our [Python example](../../demo/python-module/README.md) as a start.
You can use our [Python example](../../../../demo/python-module/README.md) as a start.

Your server needs to handle a HTTP `POST` request on the given URL, respond with the modified JSON.

@@ -10,7 +10,7 @@ It creates new line elements that contains arrays of word elements.

## Dependencies

[Reading Order Module](reading-order-module.md)
[Reading Order Module](../ReadingOrderDetectionModule/README.md)

## How it works

0 comments on commit 750fb20

Please sign in to comment.
You can’t perform that action at this time.