Skip to content

Commit

Permalink
Revert "[Pipenv] Python package dependency handling: with bugfixes an…
Browse files Browse the repository at this point in the history
…d stricter package control"
  • Loading branch information
marianorodriguez committed Dec 19, 2019
1 parent 50c330b commit 58b7c49
Show file tree
Hide file tree
Showing 16 changed files with 60 additions and 786 deletions.
17 changes: 0 additions & 17 deletions Pipfile

This file was deleted.

479 changes: 0 additions & 479 deletions Pipfile.lock

This file was deleted.

4 changes: 3 additions & 1 deletion docker/parsr-base/Dockerfile
Expand Up @@ -15,7 +15,9 @@ RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update && \
apt-get install -y nodejs npm pipenv imagemagick graphicsmagick mupdf mupdf-tools qpdf pandoc tesseract-ocr-all
apt-get install -y imagemagick graphicsmagick mupdf mupdf-tools qpdf pandoc tesseract-ocr-all nodejs npm python-pdfminer python-pip python3-pip python-tk python3-pdfminer python3-opencv && \
pip install ghostscript camelot-py scikit-image numpy pillow && \
pip3 install ghostscript camelot-py scikit-image numpy pillow

WORKDIR /opt/app-root/src
RUN chown 1001:0 /opt/app-root/src
Expand Down
7 changes: 1 addition & 6 deletions docker/parsr/build.sh
Expand Up @@ -4,16 +4,11 @@ set -e

export PATH=$PATH:$PWD/node_modules/.bin

echo "Installing node packages : npm install"
echo "Installing packages : npm install"
npm install

echo

echo "Installing python packages : pipenv install"
pipenv install

echo

echo "Building typescript : npm run build:ts"
npm run build:ts

Expand Down
39 changes: 27 additions & 12 deletions docs/installation.md
Expand Up @@ -29,17 +29,20 @@ The documentation to build and run Docker containers is [here](docker.md).
Under a **Debian** based distribution:

```sh
sudo apt install software-properties-common python-software-properties
sudo add-apt-repository ppa:ubuntuhandbook1/apps
sudo add-apt-repository ppa:pypa/ppa
sudo apt-get update
sudo apt-get install nodejs npm qpdf imagemagick graphicsmagick tesseract-ocr libtesseract-dev python3-tk pipenv
sudo apt-get install nodejs npm qpdf imagemagick graphicsmagick tesseract-ocr libtesseract-dev python3-tk ghostscript python3-pip
pip install camelot-py
pip install numpy pillow scikit-image
pip install pdfminer.six
```

Under **Arch** Linux :

```sh
pacman -S nodejs npm qpdf imagemagick graphicsmagick pdfminer tesseract python-pipenv
pacman -S nodejs npm qpdf imagemagick graphicsmagick pdfminer tesseract python-pip
pip install camelot-py
pip install numpy pillow scikit-image
```

### 2.2. Installing Dependencies under MacOS
Expand All @@ -54,7 +57,22 @@ To install it, launch the following in a terminal
Next, install the required dependencies:

```sh
brew install node qpdf imagemagick graphicsmagick tesseract tesseract-lang pipenv
brew install node qpdf imagemagick graphicsmagick tesseract tesseract-lang tcl-tk ghostscript
```

To install the python based dependencies (pdfminer and camelot), install, first install `pip`:

```sh
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
```

and then the dependencies:

```sh
pip install pdfminer.six
pip install camelot-py
pip install numpy pillow scikit-image
```

### 2.3. Installing Dependencies under Windows
Expand All @@ -70,19 +88,16 @@ Then,

1. We recommend using [Chocolatey](https://chocolatey.org) as the package manager for installing dependencies under Windows. To install Chocolatey, [follow these instructions](https://chocolatey.org/install#installing-chocolatey).
2. [Download and install **`node.js`**](https://nodejs.org/en/download)
3. Install **`qpdf`** and **`imagemagick`** using Powershell (Run as Administrator):
3. For the **pdfminer** extractor for pdfs, [follow these steps](https://github.com/pdfminer/pdfminer.six#how-to-install).
4. Install **`qpdf`** and **`imagemagick`** using Powershell (Run as Administrator):

```sh
choco install qpdf imagemagick
```

4. Install [**graphicsmagick**](http://www.graphicsmagick.org/).
5. Make sure that you have a working Python installation (with pip), then launch the following:

```sh
pip install --user pipenv
```
5. Install [**graphicsmagick**](http://www.graphicsmagick.org/).

6. For table detection, install [**camelot**](https://camelot-py.readthedocs.io/en/master/user/install-deps.html#for-windows).

#### 2.3.1. Tesseract

Expand Down
41 changes: 16 additions & 25 deletions docs/usage.md
Expand Up @@ -2,15 +2,14 @@

- [Parsr Usage Guide](#parsr-usage-guide)
- [1. Install npm packages](#1-install-npm-packages)
- [2. Install python packages](#2-install-python-packages)
- [3. Run](#3-run)
- [3.1. Configuration](#31-configuration)
- [3.2. Demo: Web Viewer](#32-demo-web-viewer)
- [3.2.1. Under Linux/MacOS:](#321-under-linuxmacos)
- [3.2.2. Under Windows:](#322-under-windows)
- [3.3. Command Line Usage](#33-command-line-usage)
- [4. API](#4-api)
- [5. Test](#5-test)
- [2. Run](#2-run)
- [2.1. Configuration](#21-configuration)
- [2.2. Demo: Web Viewer](#22-demo-web-viewer)
- [2.2.1. Under Linux/MacOS:](#221-under-linuxmacos)
- [2.2.2. Under Windows:](#222-under-windows)
- [2.3. Command Line Usage](#23-command-line-usage)
- [3. API](#3-api)
- [4. Test](#4-test)

You can use Parsr in different ways:

Expand All @@ -26,31 +25,23 @@ Inside the Parsr folder (where it has been installed), launch:
npm install
```

## 2. Install python packages
## 2. Run

Inside the Parsr folder (where it has been installed), launch:

```sh
pipenv install
```

## 3. Run

### 3.1. Configuration
### 2.1. Configuration

The tool contains a pipeline of modules that process the document step by step and is highly configurable. To change it's default configuration, please refer to the [configuration file documentation](configuration.md).

### 3.2. Demo: Web Viewer
### 2.2. Demo: Web Viewer

To start the web viewer demo, simply run:

#### 3.2.1. Under Linux/MacOS:
#### 2.2.1. Under Linux/MacOS:

```sh
npm run start:web:vue
```

#### 3.2.2. Under Windows:
#### 2.2.2. Under Windows:

In two different terminals, first:

Expand All @@ -66,7 +57,7 @@ cd demo/vue-viewer && npm install && npm run serve

Open [localhost:8080](http://localhost:8080) with your favorite browser to use the GUI.

### 3.3. Command Line Usage
### 2.3. Command Line Usage

Under Mac OS X, Linux:

Expand All @@ -80,7 +71,7 @@ Under Windows:
cmd /C "npm run run:debug -- --input-file samples/t1.pdf --output-folder samples --document-name example --config server/defaultConfig.json --pretty-logs"
```

## 4. API
## 3. API

Install the API server with:

Expand All @@ -98,7 +89,7 @@ You can then call endpoints on [localhost:3001](http://localhost:3001).

The documentation for the API can be found [here](api-guide.md).

## 5. Test
## 4. Test

```sh
npm run test
Expand Down
2 changes: 1 addition & 1 deletion server/src/input/extractImagesFonts.ts
Expand Up @@ -27,7 +27,7 @@ export function extractImagesAndFonts(pdfInputFile: string): Promise<void> {
return new Promise<void>((resolve, reject) => {
const folder = utils.getMutoolExtractionFolder();
logger.info(`Extracting images and fonts to ${folder}`);
utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.MUTOOL, ['extract', pdfInputFile], false, { cwd: folder })
utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.MUTOOL, ['extract', pdfInputFile], { cwd: folder })
.then(() => {
const ttfRegExp = /^[A-Z]{6}\+(.*)\-[0-9]+\.ttf$/;
fs.readdirSync(folder).forEach(file => {
Expand Down
2 changes: 1 addition & 1 deletion server/src/input/pdfminer/pdfminer.ts
Expand Up @@ -61,7 +61,7 @@ export function execute(pdfInputFile: string): Promise<Document> {
fs.appendFileSync(xmlOutputFile, '');
}

utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.PDF2TXT, pdf2txtArguments, true)
utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.PDF2TXT, pdf2txtArguments)
.then(() => {
const xml: string = fs.readFileSync(xmlOutputFile, 'utf8');
try {
Expand Down
2 changes: 1 addition & 1 deletion server/src/input/tesseract/TesseractExtractor.ts
Expand Up @@ -75,7 +75,6 @@ export class TesseractExtractor extends Extractor {
}
const outPutFilePath = folder + '/Sample_%03d.tiff';
utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.CONVERT, [
pdfPath,
'-colorspace',
'RGB',
'-density',
Expand All @@ -86,6 +85,7 @@ export class TesseractExtractor extends Extractor {
'remove',
'-background',
'white',
pdfPath,
outPutFilePath,
])
.then(() => {
Expand Down
1 change: 0 additions & 1 deletion server/src/output/pdf/PdfExporter.ts
Expand Up @@ -45,7 +45,6 @@ export class PdfExporter extends Exporter {
'-o',
outputPath,
],
false,
{
cwd: process.cwd(),
env: process.env,
Expand Down
Expand Up @@ -91,7 +91,7 @@ export class LinkDetectionModule extends Module {
if (!fs.existsSync(xmlOutputFile)) {
fs.appendFileSync(xmlOutputFile, '');
}
utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.DUMPPDF, dumppdfArguments, true)
utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.DUMPPDF, dumppdfArguments)
.then(() => {
const xml: string = fs.readFileSync(xmlOutputFile, 'utf8');
try {
Expand Down
Expand Up @@ -60,7 +60,7 @@ const defaultExtractor: TableExtractor = {
scriptArgs.push(options.table_areas.join(';'));
}

return utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.PYTHON, scriptArgs, true)
return utils.CommandExecuter.run(utils.CommandExecuter.COMMANDS.PYTHON, scriptArgs)
.then((stdout) => ({
stdout,
stderr: '',
Expand Down
49 changes: 4 additions & 45 deletions server/src/utils.ts
Expand Up @@ -57,17 +57,14 @@ export class CommandExecuter {
public static async run(
cmd: string | string[],
args: string[],
pythonCommand: boolean = false,
options?: any,
): Promise<string> {
return new Promise((resolve, reject) => {
let command = '';
if (Array.isArray(cmd)) {
command = pythonCommand ?
getPythonCommandLocationOnSystem(cmd[0], cmd[1] || '', cmd[2] || '') :
getCommandLocationOnSystem(cmd[0], cmd[1] || '', cmd[2] || '');
command = getCommandLocationOnSystem(cmd[0], cmd[1] || '', cmd[2] || '');
} else {
command = pythonCommand ? getPythonCommandLocationOnSystem(cmd) : getCommandLocationOnSystem(cmd);
command = getCommandLocationOnSystem(cmd);
}
if (!command) {
return reject({
Expand Down Expand Up @@ -233,7 +230,7 @@ export async function correctImageForRotation(srcImg: string): Promise<RotationC

const args: string[] = [path.join(__dirname, '../assets/ImageCorrection.py'), srcImg];
try {
const data = await CommandExecuter.run(CommandExecuter.COMMANDS.PYTHON, args, true);
const data = await CommandExecuter.run(CommandExecuter.COMMANDS.PYTHON, args);
const rotationData = JSON.parse(data);
correctionInfo.fileName = rotationData.filename;
correctionInfo.degrees = rotationData.degrees;
Expand Down Expand Up @@ -785,44 +782,6 @@ function getExecLocationCommandOnSystem(): string {
return os.platform() === 'win32' ? 'where' : 'which';
}

/**
* Returns the location of the python command on the system
* @param firstChoice the first choice name of the executable to be located
* @param secondChoice the second choice name of the executable to be located
* @param thirdChoice the third choice name of the executable to be located
*/
function getPythonCommandLocationOnSystem(
firstChoice: string,
secondChoice: string = '',
thirdChoice: string = '',
): string {
const cmdComponents: string[] = firstChoice.split(' ');
const pipenvSpawn = spawnSync(getCommandLocationOnSystem('pipenv'), ['--venv']);
const pipEnvParsrLocation: string =
pipenvSpawn.status === 0 ? pipenvSpawn.stdout.toString().split(os.EOL)[0] : "";
let result = pipEnvParsrLocation !== "" ? path.join(pipEnvParsrLocation, 'bin', cmdComponents[0]) : "";

if (result === null && secondChoice !== '') {
return getPythonCommandLocationOnSystem(secondChoice, thirdChoice);
}
if (result === null) {
return null;
}

// re-insert the other words into the command (if any)
result = [result , ...cmdComponents.slice(1, cmdComponents.length)].join(" ");

// append python to the beginning if the command looked for isnt python itself
if (CommandExecuter.COMMANDS.PYTHON.includes(firstChoice)) {
return result;
} else {
const pythonCommands: string[] = CommandExecuter.COMMANDS.PYTHON;
return getPythonCommandLocationOnSystem(pythonCommands[0], pythonCommands[1] || '', pythonCommands[2] || '')
+ ' '
+ result;
}
}

/**
* returns the location of a command on a system.
* @param firstChoice the first choice name of the executable to be located
Expand All @@ -844,7 +803,7 @@ export function getCommandLocationOnSystem(
return null;
}

return [result , ...cmdComponents.slice(1, cmdComponents.length)].join(" ");
return firstChoice;
}

/**
Expand Down
15 changes: 0 additions & 15 deletions train/Pipfile

This file was deleted.

0 comments on commit 58b7c49

Please sign in to comment.