Skip to content
Permalink
Browse files

[gm] - removed all references to GraphicsMagick

  • Loading branch information
marianorodriguez committed Mar 3, 2020
1 parent 8487108 commit 6dc3ca5bab0a7df40a366b91398a2d371ff9b011
Showing with 42 additions and 53 deletions.
  1. +7 −8 README.md
  2. +7 −8 README_fr.md
  3. +10 −11 README_zh-cn.md
  4. +13 −19 demo/jupyter-notebook/parsr-jupyter-notebook.ipynb
  5. +1 −1 docker/parsr-base/Dockerfile
  6. +4 −6 docs/installation.md
@@ -100,14 +100,13 @@ Please refer to the [contribution guidelines](CONTRIBUTING.md).
Third Party Libraries licenses for its [dependencies](docs/dependencies.md):

1. **QPDF**: Apache [http://qpdf.sourceforge.net](http://qpdf.sourceforge.net/)
2. **GraphicsMagick**: MIT [http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)
3. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
5. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
6. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
7. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
8. **MuPDF** (Optional dependency): AGPL [https://mupdf.com/license.html](https://mupdf.com/license.html)
9. **Pandoc** (Optional dependency): GPL [https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)
2. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
3. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
4. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
5. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
6. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
7. **MuPDF** (Optional dependency): AGPL [https://mupdf.com/license.html](https://mupdf.com/license.html)
8. **Pandoc** (Optional dependency): GPL [https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)

## License

@@ -103,14 +103,13 @@ Veuillez vous référer aux [directives de contribution](CONTRIBUTING.md).
Licences de bibliothèques tierces pour ces [dépendances](docs/dependencies.md):

1. **QPDF**: Apache [http://qpdf.sourceforge.net](http://qpdf.sourceforge.net/)
2. **GraphicsMagick**: MIT [http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)
3. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
5. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
6. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
7. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
8. **MuPDF** (Dépendance optionnelle): AGPL [https://mupdf.com/license.html](https://mupdf.com/license.html)
9. **Pandoc** (Dépendance optionnelle): GPL [https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)
2. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
3. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
4. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
5. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
6. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
7. **MuPDF** (Dépendance optionnelle): AGPL [https://mupdf.com/license.html](https://mupdf.com/license.html)
8. **Pandoc** (Dépendance optionnelle): GPL [https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)

## Licence

@@ -61,13 +61,13 @@ Docker 容器已经上传到 [Docker Hub](https://hub.docker.com/u/axarev).
```sh
sudo add-apt-repository ppa:ubuntuhandbook1/apps
sudo apt-get update
sudo apt-get install nodejs npm qpdf imagemagick graphicsmagick tesseract-ocr libtesseract-dev
sudo apt-get install nodejs npm qpdf imagemagick tesseract-ocr libtesseract-dev
```

**Arch** 操作系统下 :

```sh
pacman -S nodejs npm qpdf imagemagick graphicsmagick tesseract
pacman -S nodejs npm qpdf imagemagick tesseract
```

#### 1.2.2. 安装 MacOS 环境下的依赖
@@ -82,7 +82,7 @@ pacman -S nodejs npm qpdf imagemagick graphicsmagick tesseract
之后用 brew 指令安装依赖:

```sh
brew install node qpdf imagemagick graphicsmagick tesseract tesseract-lang
brew install node qpdf imagemagick tesseract tesseract-lang
```

#### 1.2.3. 安装 Windows 环境下的依赖
@@ -326,14 +326,13 @@ Parsr 默认的 OCR 解决方案是 tesseract,这是 Parsr 的基本依赖。
第三方证书 :

1. **QPDF**: Apache [http://qpdf.sourceforge.net](http://qpdf.sourceforge.net/)
2. **GraphicsMagick**: MIT [http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)
3. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
4. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
5. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
6. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
7. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
8. **MuPDF** (Optional dependency): AGPL [https://mupdf.com/license.html](https://mupdf.com/license.html)
9. **Pandoc** (Optional dependency): GPL [https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)
2. **ImageMagick**: Apache 2.0 [https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)
3. **Pdfminer.six**: MIT [https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)
4. **PDF.js**: Apache 2.0 [https://github.com/mozilla/pdf.js](https://github.com/mozilla/pdf.js)
5. **Tesseract**: Apache 2.0 [https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)
6. **Camelot**: MIT [https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)
7. **MuPDF** (Optional dependency): AGPL [https://mupdf.com/license.html](https://mupdf.com/license.html)
8. **Pandoc** (Optional dependency): GPL [https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)

## 7. 证书

@@ -1223,22 +1223,16 @@
" <td class=\"blob-code blob-code-context base\">1.&nbsp;**QPDF:**&nbsp;Apache&nbsp;[http://qpdf.sourceforge.net](http://qpdf.sourceforge.net)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L73\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"73\"></td>\n",
" <td class=\"blob-code blob-code-context base\">2.&nbsp;**GraphicsMagick:**&nbsp;MIT&nbsp;[http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)</td>\n",
" <td id=\"R81\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"81\"></td>\n",
" <td class=\"blob-code blob-code-context base\">2.&nbsp;**GraphicsMagick:**&nbsp;MIT&nbsp;[http://www.graphicsmagick.org/index.html](http://www.graphicsmagick.org/index.html)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L74\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"74\"></td>\n",
" <td class=\"blob-code blob-code-context base\">3.&nbsp;**ImageMagick:**&nbsp;Apache&nbsp;2.0&nbsp;[https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)</td>\n",
" <td class=\"blob-code blob-code-context base\">2.&nbsp;**ImageMagick:**&nbsp;Apache&nbsp;2.0&nbsp;[https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)</td>\n",
" <td id=\"R82\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"82\"></td>\n",
" <td class=\"blob-code blob-code-context base\">3.&nbsp;**ImageMagick:**&nbsp;Apache&nbsp;2.0&nbsp;[https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)</td>\n",
" <td class=\"blob-code blob-code-context base\">2.&nbsp;**ImageMagick:**&nbsp;Apache&nbsp;2.0&nbsp;[https://imagemagick.org/script/license.php](https://imagemagick.org/script/license.php)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L75\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"75\"></td>\n",
" <td class=\"blob-code blob-code-context base\">4.&nbsp;**Pdfminer.six:**&nbsp;MIT&nbsp;[https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)</td>\n",
" <td class=\"blob-code blob-code-context base\">3.&nbsp;**Pdfminer.six:**&nbsp;MIT&nbsp;[https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)</td>\n",
" <td id=\"R83\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"83\"></td>\n",
" <td class=\"blob-code blob-code-context base\">4.&nbsp;**Pdfminer.six:**&nbsp;MIT&nbsp;[https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)</td>\n",
" <td class=\"blob-code blob-code-context base\">3.&nbsp;**Pdfminer.six:**&nbsp;MIT&nbsp;[https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE](https://github.com/pdfminer/pdfminer.six/blob/master/LICENSE)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L76\" class=\"blob-num blob-num-deletion base js-linkable-line-number\" data-line-number=\"76\"></td>\n",
@@ -1260,27 +1254,27 @@
" </tr>\n",
" <tr>\n",
" <td id=\"L79\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"79\"></td>\n",
" <td class=\"blob-code blob-code-context base\">5.&nbsp;**Tesseract:**&nbsp;Apache&nbsp;2.0&nbsp;[https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)</td>\n",
" <td class=\"blob-code blob-code-context base\">4.&nbsp;**Tesseract:**&nbsp;Apache&nbsp;2.0&nbsp;[https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)</td>\n",
" <td id=\"R84\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"84\"></td>\n",
" <td class=\"blob-code blob-code-context base\">5.&nbsp;**Tesseract:**&nbsp;Apache&nbsp;2.0&nbsp;[https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)</td>\n",
" <td class=\"blob-code blob-code-context base\">4.&nbsp;**Tesseract:**&nbsp;Apache&nbsp;2.0&nbsp;[https://github.com/tesseract-ocr/tesseract](https://github.com/tesseract-ocr/tesseract)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L80\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"80\"></td>\n",
" <td class=\"blob-code blob-code-context base\">6.&nbsp;**Camelot:**&nbsp;MIT&nbsp;[https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)</td>\n",
" <td class=\"blob-code blob-code-context base\">5.&nbsp;**Camelot:**&nbsp;MIT&nbsp;[https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)</td>\n",
" <td id=\"R85\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"85\"></td>\n",
" <td class=\"blob-code blob-code-context base\">6.&nbsp;**Camelot:**&nbsp;MIT&nbsp;[https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)</td>\n",
" <td class=\"blob-code blob-code-context base\">5.&nbsp;**Camelot:**&nbsp;MIT&nbsp;[https://github.com/camelot-dev/camelot](https://github.com/camelot-dev/camelot)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L81\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"81\"></td>\n",
" <td class=\"blob-code blob-code-context base\">7.&nbsp;**MuPDF**&nbsp;(Optional&nbsp;dependency):&nbsp;AGPL&nbsp;[https://mupdf.com/license.html](https://mupdf.com/license.html)</td>\n",
" <td class=\"blob-code blob-code-context base\">6.&nbsp;**MuPDF**&nbsp;(Optional&nbsp;dependency):&nbsp;AGPL&nbsp;[https://mupdf.com/license.html](https://mupdf.com/license.html)</td>\n",
" <td id=\"R86\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"86\"></td>\n",
" <td class=\"blob-code blob-code-context base\">7.&nbsp;**MuPDF**&nbsp;(Optional&nbsp;dependency):&nbsp;AGPL&nbsp;[https://mupdf.com/license.html](https://mupdf.com/license.html)</td>\n",
" <td class=\"blob-code blob-code-context base\">6.&nbsp;**MuPDF**&nbsp;(Optional&nbsp;dependency):&nbsp;AGPL&nbsp;[https://mupdf.com/license.html](https://mupdf.com/license.html)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L82\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"82\"></td>\n",
" <td class=\"blob-code blob-code-context base\">8.&nbsp;**Pandoc**&nbsp;(Optional&nbsp;dependency):&nbsp;GPL&nbsp;[https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)</td>\n",
" <td class=\"blob-code blob-code-context base\">7.&nbsp;**Pandoc**&nbsp;(Optional&nbsp;dependency):&nbsp;GPL&nbsp;[https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)</td>\n",
" <td id=\"R87\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"87\"></td>\n",
" <td class=\"blob-code blob-code-context base\">8.&nbsp;**Pandoc**&nbsp;(Optional&nbsp;dependency):&nbsp;GPL&nbsp;[https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)</td>\n",
" <td class=\"blob-code blob-code-context base\">7.&nbsp;**Pandoc**&nbsp;(Optional&nbsp;dependency):&nbsp;GPL&nbsp;[https://github.com/jgm/pandoc](https://github.com/jgm/pandoc)</td>\n",
" </tr>\n",
" <tr>\n",
" <td id=\"L83\" class=\"blob-num blob-num-context base js-linkable-line-number\" data-line-number=\"83\"></td>\n",
@@ -1421,7 +1415,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
"version": "3.7.4"
}
},
"nbformat": 4,
@@ -15,7 +15,7 @@ RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update && \
apt-get install -y imagemagick graphicsmagick mupdf mupdf-tools qpdf pandoc tesseract-ocr-all nodejs npm python-pdfminer python-pip python3-pip python-tk python3-pdfminer python3-opencv && \
apt-get install -y imagemagick mupdf mupdf-tools qpdf pandoc tesseract-ocr-all nodejs npm python-pdfminer python-pip python3-pip python-tk python3-pdfminer python3-opencv && \
pip install ghostscript camelot-py[cv] scikit-image numpy pillow && \
pip3 install ghostscript camelot-py[cv] scikit-image numpy pillow

@@ -31,7 +31,7 @@ Under a **Debian** based distribution:
```sh
sudo add-apt-repository ppa:ubuntuhandbook1/apps
sudo apt-get update
sudo apt-get install nodejs npm qpdf imagemagick graphicsmagick tesseract-ocr libtesseract-dev python3-tk ghostscript python3-pip
sudo apt-get install nodejs npm qpdf imagemagick tesseract-ocr libtesseract-dev python3-tk ghostscript python3-pip
pip install camelot-py[cv]
pip install numpy pillow scikit-image
pip install pdfminer.six
@@ -40,7 +40,7 @@ pip install pdfminer.six
Under **Arch** Linux :

```sh
pacman -S nodejs npm qpdf imagemagick graphicsmagick pdfminer tesseract python-pip
pacman -S nodejs npm qpdf imagemagick pdfminer tesseract python-pip
pip install camelot-py[cv]
pip install numpy pillow scikit-image
```
@@ -57,7 +57,7 @@ To install it, launch the following in a terminal
Next, install the required dependencies:

```sh
brew install node qpdf imagemagick graphicsmagick tesseract tesseract-lang tcl-tk ghostscript
brew install node qpdf imagemagick tesseract tesseract-lang tcl-tk ghostscript
```

To install the python based dependencies (pdfminer and camelot), install, first install `pip`:
@@ -95,9 +95,7 @@ Then,
choco install qpdf imagemagick
```

5. Install [**graphicsmagick**](http://www.graphicsmagick.org/).

6. For table detection, install [**camelot**](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#for-windows).
5. For table detection, install [**camelot**](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#for-windows).

#### 2.3.1. Tesseract

0 comments on commit 6dc3ca5

Please sign in to comment.
You can’t perform that action at this time.