Fix whitespace issues

* Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de>
tesseract-ocr · Aug 1, 2018 · 6a28cce · 6a28cce
1 parent 3af2773
commit 6a28cce
Show file tree

Hide file tree

Showing 45 changed files with 239 additions and 239 deletions.
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
@@ -6,7 +6,7 @@ Note that it will be much easier for us to fix the issue if a test case that
 reproduces the problem is provided. Ideally this test case should not have any
 external dependencies. Provide a copy of the image or link to files for the test case.
 
-Please delete this text and fill in the template below. 
+Please delete this text and fill in the template below.
 
 ------------------------
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -9,9 +9,9 @@ If you think you found a bug in Tesseract, please create an issue.
 Use the [users mailing-list](https://groups.google.com/d/forum/tesseract-ocr) instead of creating an Issue if ...
 * You have problems using Tesseract and need some help.
 * You have problems installing the software.
-* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) wiki page. 
+* You are not satisfied with the accuracy of the OCR, and want to ask how you can improve it. Note: You should first read the [ImproveQuality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) wiki page.
 * You are trying to train Tesseract and you have a problem and/or want to ask a question about the training process. Note: You should first read the **official** guides [[1]](https://github.com/tesseract-ocr/tesseract/wiki) or [[2]](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract) found in the project wiki.
-* You have a general question. 
+* You have a general question.
 
 An issue should only be reported if the platform you are using is one of these:
   * Linux (but not a version that is more than 4 years old)
@@ -22,7 +22,7 @@ For older versions or other operating systems, use the Tesseract forum.
 
 When creating an issue, please report your operating system, including its specific version: "Ubuntu 16.04", "Windows 10", "Mac OS X 10.11" etc.
 
-Search through open and closed issues to see if similar issue has been reported already (and sometimes also has been solved). 
+Search through open and closed issues to see if similar issue has been reported already (and sometimes also has been solved).
 
 Similarly, before you post your question in the forum, search through past threads to see if similar question has been asked already.
 
@@ -32,10 +32,10 @@ Only report an issue in the latest official release. Optionally, try to check if
 
 Make sure you are able to replicate the problem with Tesseract command line program. For external programs that use Tesseract (including wrappers and your own program, if you are developer), report the issue to the developers of that software if it's possible. You can also try to find help in the Tesseract forum.
 
-Each version of Tesseract has its own language data you need to obtain. You **must** obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command:  
+Each version of Tesseract has its own language data you need to obtain. You **must** obtain and install trained data for English (eng) and osd. Verify that Tesseract knows about these two files (and other trained data you installed) with this command:
 `tesseract --list-langs`.
 
-Post example files to demonstrate the problem.  
+Post example files to demonstrate the problem.
 BUT don't post files with private info (about yourself or others).
 
 When attaching a file to the issue report / forum ...
@@ -46,23 +46,23 @@ Do not attach programs or libraries to your issues/posts.
 
 For large files or for programs, add a link to a location where they can be downloaded (your site, Git repo, Google Drive, Dropbox etc.)
 
-Attaching a multi-page TIFF image is useful only if you have problem with multi-page functionality, otherwise attach only one or a few single page images. 
+Attaching a multi-page TIFF image is useful only if you have problem with multi-page functionality, otherwise attach only one or a few single page images.
 
 Copy the error message from the console instead of sending a screenshot of it.
 
 Use the toolbar above the comment edit area to format your comment.
 
 Add three backticks before and after a code sample or output of a command to format it (The `Insert code` button can help you doing it).
 
-If your comment includes a code sample or output of a command that exceeds ~25 lines, post it as attached text file (`filename.txt`). 
+If your comment includes a code sample or output of a command that exceeds ~25 lines, post it as attached text file (`filename.txt`).
 
 Use `Preview` before you send your issue. Read it again before sending.
 
 Note that most of the people that respond to issues and answer questions are either other 'regular' users or **volunteers** developers. Please be nice to them :-)
 
 The [tesseract developers](http://groups.google.com/group/tesseract-dev/) forum should be used to discuss Tesseract development: bug fixes, enhancements, add-ons for Tesseract.
 
-Sometimes you will not get a respond to your issue or question. We apologize in advance! Please don't take it personally. There can be many reasons for this, including: time limits, no one knows the answer (at least not the ones that are available at that time) or just that 
+Sometimes you will not get a respond to your issue or question. We apologize in advance! Please don't take it personally. There can be many reasons for this, including: time limits, no one knows the answer (at least not the ones that are available at that time) or just that
 your question has been asked (and has been answered) many times before...
 
 ## For Developers: Creating a Pull Request

diff --git a/ChangeLog b/ChangeLog
@@ -1,7 +1,7 @@
 2017-03-24 - V4.00.00-alpha
   * Added new neural network system based on LSTMs, with major accuracy gains.
   * Improvements to PDF rendering.
-  * Fixes to trainingdata rendering.  
+  * Fixes to trainingdata rendering.
   * Added LSTM models+lang models to 101 languages. (tessdata repository)
   * Improved multi-page TIFF handling.
   * Fixed damage to binary images when processing PDFs.
@@ -40,7 +40,7 @@
   * Fixed some openCL issues.
   * Added option to build Tesseract with CMake build system.
   * Implemented CPPAN support for easy Windows building.
-  
+
 2016-02-17 - V3.04.01
   * Added OSD renderer for psm 0. Works for single page and multi-page images.
   * Improve tesstrain.sh script.
@@ -84,7 +84,7 @@
     text and truetype fonts.
   * Added support for PDF output with searchable text.
   * Removed entire IMAGE class and all code in image directory.
-  * Tesseract executable: support for output to stdout; limited support for one 
+  * Tesseract executable: support for output to stdout; limited support for one
     page images from stdin  (especially on Windows)
   * Added Renderer to API to allow document-level processing and output
     of document formats, like hOCR, PDF.
@@ -169,12 +169,12 @@
   * Added TessdataManager to combine data files into a single file.
   * Some dead code deleted.
   * VC++6 no longer supported. It can't cope with the use of templates.
-  * Many more languages added. 
+  * Many more languages added.
   * Doxygenation of most of the function header comments.
   * Added man pages.
   * Added bash completion script (issue 247: thanks to neskiem)
   * Fix integer overview in thresholding (issue 366: thanks to Cyanide.Drake)
-  * Add Danish Fraktur support (issues 300, 360: thanks to 
+  * Add Danish Fraktur support (issues 300, 360: thanks to
     dsl602230@vip.cybercity.dk)
   * Fix file pointer leak (issue 359, thanks to yukihiro.nakadaira)
   * Fix an error using user-words (Issue 345: thanks to max.markin)
@@ -183,7 +183,7 @@
   * Fix an automake error (Issue 318, thanks to ichanjz)
   * Fix a Win32 crash on fileFormatIsTiff() (Issues 304, 316, 317, 330, 347,
     349, 352: thanks to nguyenq87, max.markin, zdenop)
-  * Fixed a number of errors in newer (stricter) versions of VC++ (Issues 
+  * Fixed a number of errors in newer (stricter) versions of VC++ (Issues
     301, among others)
 
 2009-06-30 - V2.04

diff --git a/INSTALL.GIT.md b/INSTALL.GIT.md
@@ -26,14 +26,14 @@ So, the steps for making Tesseract are:
     $ make training
     $ sudo make training-install
 
-You need to install at least English language and OSD traineddata files to 
-`TESSDATA_PREFIX` directory. 
+You need to install at least English language and OSD traineddata files to
+`TESSDATA_PREFIX` directory.
 
 You can retrieve single file with tools like [wget](https://www.gnu.org/software/wget/), [curl](https://curl.haxx.se/), [GithubDownloader](https://github.com/intezer/GithubDownloader) or browser.
 
 All language data files can be retrieved from git repository (useful only for packagers!).
 (Repository is huge - more that 1.2 GB. You do NOT need to download traineddata files for
-all languages). 
+all languages).
 
     $ git clone https://github.com/tesseract-ocr/tessdata.git tesseract-ocr.tessdata
 

diff --git a/appveyor.yml b/appveyor.yml
@@ -5,13 +5,13 @@ environment:
     - APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2017
       vs_ver: 15 2017
       vs_platform: " Win64"
-  
+
 configuration:
   - Release
-  
+
 cache:
   - c:/Users/appveyor/.cppan/storage
-  
+
 # for curl
 install:
   - set PATH=C:\Program Files\Git\mingw64\bin;%PATH%
@@ -25,7 +25,7 @@ before_build:
   - ps: 'Add-Content $env:USERPROFILE\.cppan\cppan.yml "`n`nbuild_warning_level: 0`n"'
   - ps: 'Add-Content $env:USERPROFILE\.cppan\cppan.yml "`n`nbuild_system_verbose: false`n"'
   - ps: 'Add-Content $env:USERPROFILE\.cppan\cppan.yml "`n`nvar_check_jobs: 1`n"'
-  
+
 build_script:
   - mkdir build
   - mkdir build\bin

diff --git a/autogen.sh b/autogen.sh
@@ -46,10 +46,10 @@ if [ "$1" = "clean" ]; then
     find . -iname "Makefile.in" -type f -exec rm '{}' +
 fi
 
-# Prevent any errors that might result from failing to properly invoke 
-# `libtoolize` or `glibtoolize,` whichever is present on your system, 
-# from occurring by testing for its existence and capturing the absolute path to 
-# its location for caching purposes prior to using it later on in 'Step 2:'  
+# Prevent any errors that might result from failing to properly invoke
+# `libtoolize` or `glibtoolize,` whichever is present on your system,
+# from occurring by testing for its existence and capturing the absolute path to
+# its location for caching purposes prior to using it later on in 'Step 2:'
 if command -v libtoolize >/dev/null 2>&1; then
   LIBTOOLIZE="$(command -v libtoolize)"
 elif command -v glibtoolize >/dev/null 2>&1; then
@@ -67,13 +67,13 @@ fi
 bail_out()
 {
     echo
-    echo "  Something went wrong, bailing out!" 
+    echo "  Something went wrong, bailing out!"
     echo
     exit 1
 }
 
 # --- Step 1: Generate aclocal.m4 from:
-#             . acinclude.m4 
+#             . acinclude.m4
 #             . config/*.m4 (these files are referenced in acinclude.m4)
 
 mkdir -p config

diff --git a/contrib/genlangdata.pl b/contrib/genlangdata.pl
@@ -8,7 +8,7 @@
 
 =pod
 
-=head1 NAME 
+=head1 NAME
 
 genwordlists.pl - generate word lists for Tesseract
 
@@ -33,7 +33,7 @@ =head1 DESCRIPTION
     pfx=$(echo $i|tr '/' '_'); cat $i | \
     perl genwordlists.pl -d OUTDIR -p $pfx; done
 
-This will create a set of output files to match each of the files 
+This will create a set of output files to match each of the files
 WikiExtractor created.
 
 To combine these files:

diff --git a/contrib/tesseract.completion b/contrib/tesseract.completion
@@ -1,6 +1,6 @@
 #-*- mode: shell-script;-*-
 #
-# bash completion support for tesseract 
+# bash completion support for tesseract
 #
 # Copyright (C) 2009 Neskie A. Manuel <neskiem@gmail.com>
 # Distributed under the Apache License, Version 2.0.
@@ -20,19 +20,19 @@ _tesseract()
         COMPREPLY=()
         cur="$2"
         prev="$3"
-	
+
 	case "$prev" in
 		tesseract)
 			COMPREPLY=($(compgen -f -X "!*.+(tif)" -- "$cur") )
 		;;
 		*.tif)
-			COMPREPLY=($(compgen -W "$(basename  $prev .tif)" ) ) 
+			COMPREPLY=($(compgen -W "$(basename  $prev .tif)" ) )
 		;;
 		-l)
 			_tesseract_languages
 		;;
 		*)
-			COMPREPLY=($(compgen -W "-l" ) ) 
+			COMPREPLY=($(compgen -W "-l" ) )
 		;;
     esac
 }

diff --git a/doc/Makefile.am b/doc/Makefile.am
@@ -17,7 +17,7 @@ man_MANS = \
   text2image.1 \
   unicharambigs.5 \
   unicharset_extractor.1 \
-  wordlist2dawg.1 
+  wordlist2dawg.1
 
 if !DISABLED_LEGACY_ENGINE
 man_MANS += \

diff --git a/doc/classifier_tester.1.asc b/doc/classifier_tester.1.asc
@@ -11,9 +11,9 @@ SYNOPSIS
 
 DESCRIPTION
 -----------
-classifier_tester(1) runs Tesseract in a special mode. 
-It takes a list of .tr files and tests a character classifier 
-on data as formatted for training, 
+classifier_tester(1) runs Tesseract in a special mode.
+It takes a list of .tr files and tests a character classifier
+on data as formatted for training,
 but it doesn't have to be the same as the training data.
 
 IN/OUT ARGUMENTS
@@ -25,11 +25,11 @@ OPTIONS
 -------
 -l 'lang'::
 	(Input) three character language code; default value 'eng'.
-  
+
 -classifier 'x'::
 	(Input) One of "pruner", "full".
-  
- 
+
+
 -U 'unicharset'::
 	(Input) The unicharset for the language.
 
@@ -42,7 +42,7 @@ OPTIONS
 	(Input) x heights file, each line is of the following form, where xheight is calculated as the pixel x height of a character drawn at 32pt on 300 dpi. [ That is, if base x height + ascenders + descenders = 133, how much is x height? ]
 
 	*font_name* *xheight*
-  
+
 -output_trainer 'trainer'::
 	(Output, Optional) Filename for output trainer.
 

diff --git a/doc/combine_lang_model.1.asc b/doc/combine_lang_model.1.asc
@@ -8,54 +8,54 @@ combine_lang_model - generate starter traineddata
 
 SYNOPSIS
 --------
-*combine_lang_model*  --input_unicharset 'filename' --script_dir 'dirname' --output_dir 'rootdir' --lang 'lang' [--lang_is_rtl] [pass_through_recoder] [--words file --puncs file --numbers file] 
+*combine_lang_model*  --input_unicharset 'filename' --script_dir 'dirname' --output_dir 'rootdir' --lang 'lang' [--lang_is_rtl] [pass_through_recoder] [--words file --puncs file --numbers file]
 
 DESCRIPTION
 -----------
 combine_lang_model(1) generates a starter traineddata file that can be used to train an LSTM-based neural network model. It takes as input a unicharset and an optional set of wordlists. It eliminates the need to run set_unicharset_properties(1), wordlist2dawg(1), some non-existent binary to generate the recoder (unicode compressor), and finally combine_tessdata(1).
- 
+
 OPTIONS
 -------
 '-l lang'::
-	The language to use. 
+	The language to use.
 	Tesseract uses 3-character ISO 639-2 language codes. (See LANGUAGES)
 
-'--script_dir  PATH'::   
+'--script_dir  PATH'::
   Directory name for input script unicharsets. It should point to the location of langdata (github repo) directory.  (type:string default:)
-  
-'--input_unicharset  FILE':: 
+
+'--input_unicharset  FILE'::
   Unicharset to complete and use in encoding. It can be a hand-created file with incomplete fields. Its basic and script properties will be set before it is used.  (type:string default:)
-  
+
 '--lang_is_rtl  BOOL'::
   True if language being processed is written right-to-left (eg Arabic/Hebrew). (type:bool default:false)
-  
+
 '--pass_through_recoder BOOL'::
   If true, the recoder is a simple pass-through of the unicharset. Otherwise, potentially a compression of it by encoding Hangul in Jamos, decomposing multi-unicode symbols into sequences of unicodes, and encoding Han using the data in the radical_table_data, which must be the content of the file: langdata/radical-stroke.txt. (type:bool default:false)
 
-'--version_str  STRING':: 
+'--version_str  STRING'::
   An arbitrary version label to add to traineddata file  (type:string default:)
-  
-'--words  FILE'::   
+
+'--words  FILE'::
   (Optional) File listing words to use for the system dictionary  (type:string default:)
-  
-'--numbers  FILE'::   
+
+'--numbers  FILE'::
   (Optional) File listing number patterns  (type:string default:)
-  
-'--puncs  FILE'::   
+
+'--puncs  FILE'::
   (Optional) File listing punctuation patterns. The words/puncs/numbers lists may be all empty. If any are non-empty then puncs must be non-empty.  (type:string default:)
-  
-'--output_dir   PATH'::   
+
+'--output_dir   PATH'::
   Root directory for output files. Output files will be written to <output_dir>/<lang>/<lang>.*  (type:string default:)
-  
+
 HISTORY
 -------
-combine_lang_model(1) was first made available for tesseract4.00.00alpha. 
+combine_lang_model(1) was first made available for tesseract4.00.00alpha.
 
 RESOURCES
 ---------
 Main web site: <https://github.com/tesseract-ocr> +
 Information on training tesseract LSTM: <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>
-  
+
 SEE ALSO
 --------
 tesseract(1)