Skip to content

Commit

Permalink
Fix whitespace issues
Browse files Browse the repository at this point in the history
* Remove whitespace (blanks, tabs, cr) at line endings

Signed-off-by: Stefan Weil <sw@weilnetz.de>
  • Loading branch information
stweil committed Aug 1, 2018
1 parent 3af2773 commit 6a28cce
Show file tree
Hide file tree
Showing 45 changed files with 239 additions and 239 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE.md
Expand Up @@ -6,7 +6,7 @@ Note that it will be much easier for us to fix the issue if a test case that
reproduces the problem is provided. Ideally this test case should not have any
external dependencies. Provide a copy of the image or link to files for the test case.

Please delete this text and fill in the template below.
Please delete this text and fill in the template below.

------------------------

Expand Down
16 changes: 8 additions & 8 deletions CONTRIBUTING.md
Expand Up @@ -9,9 +9,9 @@ If you think you found a bug in Tesseract, please create an issue.
Use the [users mailing-list](https://groups.google.com/d/forum/tesseract-ocr) instead of creating an Issue if ...
* You have problems using Tesseract and need some help.
* You have problems installing the software.
* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) wiki page.
* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) wiki page.
* You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the **official** guides [[1]](https://github.com/tesseract-ocr/tesseract/wiki) or [[2]](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) found in the project wiki.
* You have a general question.
* You have a general question.

An issue should only be reported if the platform you are using is one of these:
* Linux (but not a version that is more than 4 years old)
Expand All @@ -22,7 +22,7 @@ For older versions or other operating systems, use the Tesseract forum.

When creating an issue, please report your operating system, including its specific version: "Ubuntu 16.04", "Windows 10", "Mac OS X 10.11" etc.

Search through open and closed issues to see if similar issue has been reported already (and sometimes also has been solved).
Search through open and closed issues to see if similar issue has been reported already (and sometimes also has been solved).

Similarly, before you post your question in the forum, search through past threads to see if similar question has been asked already.

Expand All @@ -32,10 +32,10 @@ Only report an issue in the latest official release. Optionally, try to check if

Make sure you are able to replicate the problem with Tesseract command line program. For external programs that use Tesseract (including wrappers and your own program, if you are developer), report the issue to the developers of that software if it's possible. You can also try to find help in the Tesseract forum.

Each version of Tesseract has its own language data you need to obtain. You **must** obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command:
Each version of Tesseract has its own language data you need to obtain. You **must** obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command:
`tesseract --list-langs`.

Post example files to demonstrate the problem.
Post example files to demonstrate the problem.
BUT don't post files with private info (about yourself or others).

When attaching a file to the issue report / forum ...
Expand All @@ -46,23 +46,23 @@ Do not attach programs or libraries to your issues/posts.

For large files or for programs, add a link to a location where they can be downloaded (your site, Git repo, Google Drive, Dropbox etc.)

Attaching a multi-page TIFF image is useful only if you have problem with multi-page functionality, otherwise attach only one or a few single page images.
Attaching a multi-page TIFF image is useful only if you have problem with multi-page functionality, otherwise attach only one or a few single page images.

Copy the error message from the console instead of sending a screenshot of it.

Use the toolbar above the comment edit area to format your comment.

Add three backticks before and after a code sample or output of a command to format it (The `Insert code` button can help you doing it).

If your comment includes a code sample or output of a command that exceeds ~25 lines, post it as attached text file (`filename.txt`).
If your comment includes a code sample or output of a command that exceeds ~25 lines, post it as attached text file (`filename.txt`).

Use `Preview` before you send your issue. Read it again before sending.

Note that most of the people that respond to issues and answer questions are either other 'regular' users or **volunteers** developers. Please be nice to them :-)

The [tesseract developers](http://groups.google.com/group/tesseract-dev/) forum should be used to discuss Tesseract development: bug fixes, enhancements, add-ons for Tesseract.

Sometimes you will not get a respond to your issue or question. We apologize in advance! Please don't take it personally. There can be many reasons for this, including: time limits, no one knows the answer (at least not the ones that are available at that time) or just that
Sometimes you will not get a respond to your issue or question. We apologize in advance! Please don't take it personally. There can be many reasons for this, including: time limits, no one knows the answer (at least not the ones that are available at that time) or just that
your question has been asked (and has been answered) many times before...

## For Developers: Creating a Pull Request
Expand Down
12 changes: 6 additions & 6 deletions ChangeLog
@@ -1,7 +1,7 @@
2017-03-24 - V4.00.00-alpha
* Added new neural network system based on LSTMs, with major accuracy gains.
* Improvements to PDF rendering.
* Fixes to trainingdata rendering.
* Fixes to trainingdata rendering.
* Added LSTM models+lang models to 101 languages. (tessdata repository)
* Improved multi-page TIFF handling.
* Fixed damage to binary images when processing PDFs.
Expand Down Expand Up @@ -40,7 +40,7 @@
* Fixed some openCL issues.
* Added option to build Tesseract with CMake build system.
* Implemented CPPAN support for easy Windows building.

2016-02-17 - V3.04.01
* Added OSD renderer for psm 0. Works for single page and multi-page images.
* Improve tesstrain.sh script.
Expand Down Expand Up @@ -84,7 +84,7 @@
text and truetype fonts.
* Added support for PDF output with searchable text.
* Removed entire IMAGE class and all code in image directory.
* Tesseract executable: support for output to stdout; limited support for one
* Tesseract executable: support for output to stdout; limited support for one
page images from stdin (especially on Windows)
* Added Renderer to API to allow document-level processing and output
of document formats, like hOCR, PDF.
Expand Down Expand Up @@ -169,12 +169,12 @@
* Added TessdataManager to combine data files into a single file.
* Some dead code deleted.
* VC++6 no longer supported. It can't cope with the use of templates.
* Many more languages added.
* Many more languages added.
* Doxygenation of most of the function header comments.
* Added man pages.
* Added bash completion script (issue 247: thanks to neskiem)
* Fix integer overview in thresholding (issue 366: thanks to Cyanide.Drake)
* Add Danish Fraktur support (issues 300, 360: thanks to
* Add Danish Fraktur support (issues 300, 360: thanks to
dsl602230@vip.cybercity.dk)
* Fix file pointer leak (issue 359, thanks to yukihiro.nakadaira)
* Fix an error using user-words (Issue 345: thanks to max.markin)
Expand All @@ -183,7 +183,7 @@
* Fix an automake error (Issue 318, thanks to ichanjz)
* Fix a Win32 crash on fileFormatIsTiff() (Issues 304, 316, 317, 330, 347,
349, 352: thanks to nguyenq87, max.markin, zdenop)
* Fixed a number of errors in newer (stricter) versions of VC++ (Issues
* Fixed a number of errors in newer (stricter) versions of VC++ (Issues
301, among others)

2009-06-30 - V2.04
Expand Down
6 changes: 3 additions & 3 deletions INSTALL.GIT.md
Expand Up @@ -26,14 +26,14 @@ So, the steps for making Tesseract are:
$ make training
$ sudo make training-install

You need to install at least English language and OSD traineddata files to
`TESSDATA_PREFIX` directory.
You need to install at least English language and OSD traineddata files to
`TESSDATA_PREFIX` directory.

You can retrieve single file with tools like [wget](https://www.gnu.org/software/wget/), [curl](https://curl.haxx.se/), [GithubDownloader](https://github.com/intezer/GithubDownloader) or browser.

All language data files can be retrieved from git repository (useful only for packagers!).
(Repository is huge - more that 1.2 GB. You do NOT need to download traineddata files for
all languages).
all languages).

$ git clone https://github.com/tesseract-ocr/tessdata.git tesseract-ocr.tessdata

Expand Down
8 changes: 4 additions & 4 deletions appveyor.yml
Expand Up @@ -5,13 +5,13 @@ environment:
- APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2017
vs_ver: 15 2017
vs_platform: " Win64"

configuration:
- Release

cache:
- c:/Users/appveyor/.cppan/storage

# for curl
install:
- set PATH=C:\Program Files\Git\mingw64\bin;%PATH%
Expand All @@ -25,7 +25,7 @@ before_build:
- ps: 'Add-Content $env:USERPROFILE\.cppan\cppan.yml "`n`nbuild_warning_level: 0`n"'
- ps: 'Add-Content $env:USERPROFILE\.cppan\cppan.yml "`n`nbuild_system_verbose: false`n"'
- ps: 'Add-Content $env:USERPROFILE\.cppan\cppan.yml "`n`nvar_check_jobs: 1`n"'

build_script:
- mkdir build
- mkdir build\bin
Expand Down
12 changes: 6 additions & 6 deletions autogen.sh
Expand Up @@ -46,10 +46,10 @@ if [ "$1" = "clean" ]; then
find . -iname "Makefile.in" -type f -exec rm '{}' +
fi

# Prevent any errors that might result from failing to properly invoke
# `libtoolize` or `glibtoolize,` whichever is present on your system,
# from occurring by testing for its existence and capturing the absolute path to
# its location for caching purposes prior to using it later on in 'Step 2:'
# Prevent any errors that might result from failing to properly invoke
# `libtoolize` or `glibtoolize,` whichever is present on your system,
# from occurring by testing for its existence and capturing the absolute path to
# its location for caching purposes prior to using it later on in 'Step 2:'
if command -v libtoolize >/dev/null 2>&1; then
LIBTOOLIZE="$(command -v libtoolize)"
elif command -v glibtoolize >/dev/null 2>&1; then
Expand All @@ -67,13 +67,13 @@ fi
bail_out()
{
echo
echo " Something went wrong, bailing out!"
echo " Something went wrong, bailing out!"
echo
exit 1
}

# --- Step 1: Generate aclocal.m4 from:
# . acinclude.m4
# . acinclude.m4
# . config/*.m4 (these files are referenced in acinclude.m4)

mkdir -p config
Expand Down
4 changes: 2 additions & 2 deletions contrib/genlangdata.pl
Expand Up @@ -8,7 +8,7 @@

=pod
=head1 NAME
=head1 NAME
genwordlists.pl - generate word lists for Tesseract
Expand All @@ -33,7 +33,7 @@ =head1 DESCRIPTION
pfx=$(echo $i|tr '/' '_'); cat $i | \
perl genwordlists.pl -d OUTDIR -p $pfx; done
This will create a set of output files to match each of the files
This will create a set of output files to match each of the files
WikiExtractor created.
To combine these files:
Expand Down
8 changes: 4 additions & 4 deletions contrib/tesseract.completion
@@ -1,6 +1,6 @@
#-*- mode: shell-script;-*-
#
# bash completion support for tesseract
# bash completion support for tesseract
#
# Copyright (C) 2009 Neskie A. Manuel <neskiem@gmail.com>
# Distributed under the Apache License, Version 2.0.
Expand All @@ -20,19 +20,19 @@ _tesseract()
COMPREPLY=()
cur="$2"
prev="$3"

case "$prev" in
tesseract)
COMPREPLY=($(compgen -f -X "!*.+(tif)" -- "$cur") )
;;
*.tif)
COMPREPLY=($(compgen -W "$(basename $prev .tif)" ) )
COMPREPLY=($(compgen -W "$(basename $prev .tif)" ) )
;;
-l)
_tesseract_languages
;;
*)
COMPREPLY=($(compgen -W "-l" ) )
COMPREPLY=($(compgen -W "-l" ) )
;;
esac
}
Expand Down
2 changes: 1 addition & 1 deletion doc/Makefile.am
Expand Up @@ -17,7 +17,7 @@ man_MANS = \
text2image.1 \
unicharambigs.5 \
unicharset_extractor.1 \
wordlist2dawg.1
wordlist2dawg.1

if !DISABLED_LEGACY_ENGINE
man_MANS += \
Expand Down
14 changes: 7 additions & 7 deletions doc/classifier_tester.1.asc
Expand Up @@ -11,9 +11,9 @@ SYNOPSIS
DESCRIPTION
-----------
classifier_tester(1) runs Tesseract in a special mode.
It takes a list of .tr files and tests a character classifier
on data as formatted for training,
classifier_tester(1) runs Tesseract in a special mode.
It takes a list of .tr files and tests a character classifier
on data as formatted for training,
but it doesn't have to be the same as the training data.
IN/OUT ARGUMENTS
Expand All @@ -25,11 +25,11 @@ OPTIONS
-------
-l 'lang'::
(Input) three character language code; default value 'eng'.
-classifier 'x'::
(Input) One of "pruner", "full".
-U 'unicharset'::
(Input) The unicharset for the language.
Expand All @@ -42,7 +42,7 @@ OPTIONS
(Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ]

*font_name* *xheight*

-output_trainer 'trainer'::
(Output, Optional) Filename for output trainer.

Expand Down
40 changes: 20 additions & 20 deletions doc/combine_lang_model.1.asc
Expand Up @@ -8,54 +8,54 @@ combine_lang_model - generate starter traineddata

SYNOPSIS
--------
*combine_lang_model* --input_unicharset 'filename' --script_dir 'dirname' --output_dir 'rootdir' --lang 'lang' [--lang_is_rtl] [pass_through_recoder] [--words file --puncs file --numbers file]
*combine_lang_model* --input_unicharset 'filename' --script_dir 'dirname' --output_dir 'rootdir' --lang 'lang' [--lang_is_rtl] [pass_through_recoder] [--words file --puncs file --numbers file]

DESCRIPTION
-----------
combine_lang_model(1) generates a starter traineddata file that can be used to train an LSTM-based neural network model. It takes as input a unicharset and an optional set of wordlists. It eliminates the need to run set_unicharset_properties(1), wordlist2dawg(1), some non-existent binary to generate the recoder (unicode compressor), and finally combine_tessdata(1).

OPTIONS
-------
'-l lang'::
The language to use.
The language to use.
Tesseract uses 3-character ISO 639-2 language codes. (See LANGUAGES)

'--script_dir PATH'::
'--script_dir PATH'::
Directory name for input script unicharsets. It should point to the location of langdata (github repo) directory. (type:string default:)
'--input_unicharset FILE'::

'--input_unicharset FILE'::
Unicharset to complete and use in encoding. It can be a hand-created file with incomplete fields. Its basic and script properties will be set before it is used. (type:string default:)

'--lang_is_rtl BOOL'::
True if language being processed is written right-to-left (eg Arabic/Hebrew). (type:bool default:false)

'--pass_through_recoder BOOL'::
If true, the recoder is a simple pass-through of the unicharset. Otherwise, potentially a compression of it by encoding Hangul in Jamos, decomposing multi-unicode symbols into sequences of unicodes, and encoding Han using the data in the radical_table_data, which must be the content of the file: langdata/radical-stroke.txt. (type:bool default:false)

'--version_str STRING'::
'--version_str STRING'::
An arbitrary version label to add to traineddata file (type:string default:)
'--words FILE'::

'--words FILE'::
(Optional) File listing words to use for the system dictionary (type:string default:)
'--numbers FILE'::

'--numbers FILE'::
(Optional) File listing number patterns (type:string default:)
'--puncs FILE'::

'--puncs FILE'::
(Optional) File listing punctuation patterns. The words/puncs/numbers lists may be all empty. If any are non-empty then puncs must be non-empty. (type:string default:)
'--output_dir PATH'::

'--output_dir PATH'::
Root directory for output files. Output files will be written to <output_dir>/<lang>/<lang>.* (type:string default:)

HISTORY
-------
combine_lang_model(1) was first made available for tesseract4.00.00alpha.
combine_lang_model(1) was first made available for tesseract4.00.00alpha.

RESOURCES
---------
Main web site: <https://github.com/tesseract-ocr> +
Information on training tesseract LSTM: <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>

SEE ALSO
--------
tesseract(1)
Expand Down

0 comments on commit 6a28cce

Please sign in to comment.