Skip to content
This repository has been archived by the owner on May 7, 2024. It is now read-only.

Commit

Permalink
Finalisation Version 0.9.7
Browse files Browse the repository at this point in the history
  • Loading branch information
walter-weinmann committed Sep 8, 2022
1 parent 71dfe81 commit 5d238a8
Show file tree
Hide file tree
Showing 88 changed files with 30 additions and 444 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
![Coveralls GitHub](https://img.shields.io/coveralls/github/KonnexionsGmbH/dcr.svg)
![GitHub (Pre-)Release](https://img.shields.io/github/v/release/KonnexionsGmbH/dcr?include_prereleases)
![GitHub (Pre-)Release Date](https://img.shields.io/github/release-date-pre/KonnexionsGmbh/dcr)
![GitHub commits since latest release](https://img.shields.io/github/commits-since/KonnexionsGmbH/dcr/0.9.6)
![GitHub commits since latest release](https://img.shields.io/github/commits-since/KonnexionsGmbH/dcr/0.9.7)

Based on the paper "Unfolding the Structure of a Document using Deep Learning" (**[Rahman and Finin, 2019](https://arxiv.org/abs/1910.03678)**), this software project attempts to use various software techniques to automatically recognise the structure in any **`pdf`** documents and thus make them more searchable.

Expand Down
4 changes: 2 additions & 2 deletions docs/developing_development_environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ When selecting the Docker image, care must be taken to select the appropriate ve

Alternatively, for a **`Ubuntu 22.04 LTS`** environment that is as unspoiled as possible, the following two scripts are available in the **`scripts`** file directory:

- **`scripts/0.9.6/run_install_4-vm_wsl2_1.sh`**
- **`scripts/0.9.6/run_install_4-vm_wsl2_2.sh`**
- **`scripts/0.9.7/run_install_4-vm_wsl2_1.sh`**
- **`scripts/0.9.7/run_install_4-vm_wsl2_2.sh`**

After a **`cd scripts`** command in a terminal window, the script **`run_install_4-vm_wsl2_1.sh`** must first be executed.
Administration rights (**`sudo`**) are required for this.
Expand Down
75 changes: 0 additions & 75 deletions docs/developing_research_notes.md

This file was deleted.

3 changes: 2 additions & 1 deletion docs/developing_version_planning.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@

| Version | Feature(s) |
|---------|------------|
| 0.9.7 | TBD |
| 0.9.8 | TBD |

### 1.2 Already implemented

| Version | Feature(s) |
|---------|----------------------------------------|
| 0.9.7 | Documentation and test improvements |
| 0.9.6 | Extracting an API |
| 0.9.3 | Extending NLP capabilities |
| 0.9.2 | Refactoring database and code |
Expand Down
22 changes: 0 additions & 22 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,28 +348,6 @@ With the granularity document `line`, the recognised headers and footers are lef
spaCy provides a number of attributes for the token.
Details can be found [here](https://spacy.io/api/token#attributes){:target="_blank"} in the spaCy documentation.
The configuration parameters of the type `spacy_tkn_attr_...` control which of these attributes are stored to the database table `content_token`.
By default, the following attributes are stored:

- `spacy_tkn_attr_ent_iob_ `
- `spacy_tkn_attr_ent_type_ `
- `spacy_tkn_attr_i `
- `spacy_tkn_attr_is_currency `
- `spacy_tkn_attr_is_digit `
- `spacy_tkn_attr_is_oov `
- `spacy_tkn_attr_is_punct `
- `spacy_tkn_attr_is_sent_end `
- `spacy_tkn_attr_is_sent_start `
- `spacy_tkn_attr_is_stop `
- `spacy_tkn_attr_is_title `
- `spacy_tkn_attr_lemma_ `
- `spacy_tkn_attr_like_email `
- `spacy_tkn_attr_like_num `
- `spacy_tkn_attr_like_url `
- `spacy_tkn_attr_norm_ `
- `spacy_tkn_attr_pos_ `
- `spacy_tkn_attr_tag_ `
- `spacy_tkn_attr_text `
- `spacy_tkn_attr_whitespace_ `

In the event of an error, the original document is marked as erroneous and an explanatory entry is also written in the **`document`** table.

Expand Down
24 changes: 11 additions & 13 deletions docs/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,19 @@
![Coveralls GitHub](https://img.shields.io/coveralls/github/KonnexionsGmbH/dcr.svg)
![GitHub (Pre-)Release](https://img.shields.io/github/v/release/KonnexionsGmbH/dcr?include_prereleases)
![GitHub (Pre-)Release Date](https://img.shields.io/github/release-date-pre/KonnexionsGmbh/dcr)
![GitHub commits since latest release](https://img.shields.io/github/commits-since/KonnexionsGmbH/dcr/0.9.6)
![GitHub commits since latest release](https://img.shields.io/github/commits-since/KonnexionsGmbH/dcr/0.9.7)

## 1. Version 0.9.7

Release Date: dd.mm.2022
Release Date: 08.03.2022

### 1.1 New Features
### 1.1 Modified Features

- TODO
- Delimitation of the documentation to the **`DCR`** application
- Delimitation of the tests to the **`DCR`** application
- Updating the third party software used

### 1.2 Modified Features

- TODO

### 1.3 Applied Software
### 1.2 Applied Software

| Software | Version | Remark | Status |
|:------------------------------------------------------------------------------|:----------------|:------------------------------------|---------|
Expand All @@ -31,13 +29,13 @@ Release Date: dd.mm.2022
| [Tesseract OCR](https://github.com/tesseract-ocr/tesseract){:target="_blank"} | 5.2.0-22-g0daf1 | base version | upgrade |
| [TeX Live](https://www.tug.org/texlive){:target="_blank"} | 2022 | base version | upgrade |

#### 1.3.1 Unix-specific Software
#### 1.2.1 Unix-specific Software

| Software | Version | Remark | Status |
|:----------------------------------------------------------------|:------------|:------------------------|---------|
| asdf | v0.10.2 | base version (optional) | |
| cURL | 7.81.0 | base version | upgrade |
| dos2unix | 7.4.0 | base version | |
| dos2unix | 7.4.2 | base version | upgrade |
| GCC & G++ | 11.2.0 | base version | upgrade |
| GNU Autoconf | 2.71 | base version | upgrade |
| GNU Automake | 1.16.5 | base version | upgrade |
Expand All @@ -50,15 +48,15 @@ Release Date: dd.mm.2022
| Vim | 8.2.3995 | base version (optional) | upgrade |
| Wget | 1.21.2 | | upgrade |

#### 1.3.2 Windows-specific Software
#### 1.2.2 Windows-specific Software

| Software | Version | Remark | Status |
|:----------------------------------------------------------------------------------------|:--------|:--------------|--------|
| [Grep for Windows](http://gnuwin32.sourceforge.net/packages/grep.htm){:target="_blank"} | 2.5.4 | base version | |
| [Make for Windows](http://gnuwin32.sourceforge.net/packages/make.htm){:target="_blank"} | 3.81 | base version | |
| [sed for Windows](http://gnuwin32.sourceforge.net/packages/sed.htm){:target="_blank"} | 4.2.1 | base version | |

### 1.4 Open issues
### 1.3 Open issues

1. Tesseract OCR: (see [here](#issues_tesseract_ocr){:target="_blank"})

Expand Down

0 comments on commit 5d238a8

Please sign in to comment.