Permalink
Browse files

20151127 version

  • Loading branch information...
ashipunov
ashipunov committed Nov 28, 2015
1 parent 0cea05f commit 78b777e1c1262eb14b1e724f207dc04ffbf429fe
Showing with 22 additions and 17 deletions.
  1. +3 −0 NEWS
  2. +7 −7 README
  3. +1 −4 TODO
  4. +11 −6 img2djvu
View
3 NEWS
@@ -1,3 +1,6 @@
20151127 Now minidjvu coder part removes temporary pnm files right after encoding and therefore saves much more space
20151125 Pull requests' improvements: name of temp dir is 'pagesXXXXXX' now; djvu created in the temp folder; possible trailing slash removed from the folder name
1.14 Bug fixes: codjvu function did no work with names contain spaces; mini function not exits if name of temp dir contains spaces (thanks to Kyrill Detinov)
1.13 Option -V for version, minor optimizations (thanks to Kyrill Detinov)
View
14 README
@@ -18,7 +18,7 @@ ADVANCED USE
> img2djvu -d 600 out
Will change default resolution (300 dpi) to 600 dpi
Will set resolution to 600 dpi (default is 300 dpi)
> img2djvu -t 17 out
@@ -30,17 +30,17 @@ Will skip ImageMagick color count (this is much faster!) and send all color file
> img2djvu -m 10 out
For black and white pages, will employ minidjvu DjVu coder (with 10 pages per dictionary) instead of default cjb2 coder
For black and white pages, will use minidjvu DjVu coder (with 10 pages per dictionary) instead of default cjb2 coder
> img2djvu -l 1 out
For color pages, img2djvu will do layer separation*** (assuming that the file is the result of Scan Tailor* mixed mode output**), then blur color part and start forced segmentation****; it is slow but usually produce compact output.
For color pages, img2djvu will perform layer separation*** (assuming that the file is the result of Scan Tailor* mixed mode output**), then blur color part and start forced segmentation****; it is slow but usually produce compact output.
===
* Scan Tailor: http://scantailor.sourceforge.net/
** Mixed mode outputs image colors in [1...254] rank and text colors as pure black [0]
** Mixed mode outputs image colors in [1...254] rank whereas text is pure black [0] and page background is pure white [255]
*** Layer separation after Scan Tailor: http://forum.ru-board.com/topic.cgi?forum=5&topic=32945 (in Russian), http://alexrey036.narod.ru/LayerTailor/LayerTailor.zip
@@ -60,7 +60,7 @@ Will NOT use blur and contrast for processing color layers
> img2djvu -l 1 -r rus -e cuneiform -j 2 -a 1 out
After creation of final DjVu, will run two OCR jobs of cuneiform with "-rus" language option via ocrodjvu and insert text layer in place
After creation of the final DjVu, will run two OCR jobs of cuneiform with "-rus" language option via ocrodjvu and insert text layer in place
> img2djvu -c 1 out
@@ -74,6 +74,6 @@ Absolute paths are not allowed
It is expected that all images inside a folder have the same resolution
Image names are very important; they determine the page sequence in future DjVu file. It is strongly recommended to rename files sequentially before im2djvu run
Folder and image names are very important; they, for example, determine the page sequence in future DjVu file. It is strongly recommended to rename files sequentially before im2djvu run. Also, please avoid using anything except [0-9a-z_-] in the file and folder names. No spaces and non-ASCII letters, please!
It is NOT expected that layer separation and forced segmentation will harm non-Scan Tailor files
It is NOT recommended to use -l option with files which did not come from Scan Tailor output
View
5 TODO
@@ -1,5 +1,2 @@
Make an option to avoid layer separation for the selected pages?
Make minidjvu accept filenames with spaces, probablly sequentially rename files immediately after conversion
(?) MMR and JPEG (ald also 2k?) chunks (with djvumake) instead of JB2 and IW44 for B&W and color pages, respectively
(?) New {twopass} function for cases when minidjvu dictionary broken with color files: extract color chunks, keep them under name <page_num>.iw44, extract bw parts, remember their names in $bwpages, when <F> or end, start minidjvu, make multi-paged, then add color chunks to pages with <page_number> OR do it for ALL pages (and somehow insert <F> pages at the end)
View
@@ -31,7 +31,7 @@ tmpdefault="$tmp"
ocrjobsdefault="$ocrjobs"
function printversion() {
printf "img2djvu version 1.13\n"
printf "img2djvu version 20151127\n"
}
function usage() {
@@ -283,7 +283,8 @@ function nomini {
IFS=$SAVEIFS
) &&
cd "$tmpdir" && \
djvm -c "$djvu" *.djvu && \
djvm -c merged.djvu *.djvu && \
mv merged.djvu "$djvu" && \
printf "\nDone.\n" && \
if [ "$useocr" -gt 0 ] ; then
printf "Starting OCR...\n"
@@ -297,6 +298,7 @@ function nomini {
### Minidjvu-based coder and bundler
# Similar to previous, but instead of cjb2 minidjvu called every time when black and white sequence interrupted with color image, or when sequence ends on black and white file
# Works with sequences, therefore visually less verbose, minidjvu is also slower than cjb2
function mini {
( cd "$fld" &&
( bwcount=0 && \
@@ -316,6 +318,7 @@ function mini {
if [ `identify -format "%z" "$of"` -gt 1 ] ; then
if [ "$bwcount" -gt 0 ] ; then
minidjvucoder "$bwpages" "${of%pnm}1.djvu"
rm -f $bwpages
bwpages=""
bwcount=0
printf "$age"
@@ -331,13 +334,15 @@ function mini {
else
colorcoder "$of" "${of%pnm}2.djvu"
fi
rm "$of" && \
printf "$age"
else
bwpages="$bwpages $of"
bwcount=$((bwcount+1))
fi
if [[ "$pgcount" -eq 0 && "$bwcount" -gt 0 ]] ; then
if [[ "$pgcount" -eq 0 && "$bwcount" -gt 0 ]] ; then
minidjvucoder "$bwpages" "${of%pnm}1.djvu"
rm -f $bwpages
bwpages=""
bwcount=0
printf "$age"
@@ -361,7 +366,7 @@ function mini {
# Absolute paths to input folder and output DjVu files
fld="`pwd`/$1"
djvu="`pwd`/$1.djvu"
djvu="`pwd`/`echo $1 | sed -e 's|/$||g'`.djvu"
if [ "$tmp" -eq 0 ]; then
tmpdirprefix="/tmp"
else
@@ -372,7 +377,7 @@ fi
pgcount=`ls -1p "$fld" | egrep -v '/$' | wc -l`
printf "$pgcount files:\n"
if [ "$usemini" -lt 1 ] ; then
tmpdir=`mktemp -d "$tmpdirprefix"/pagesXXXXX`
tmpdir=`mktemp -d "$tmpdirprefix"/pagesXXXXXX`
nomini
else
if ! which minidjvu >/dev/null ; then
@@ -390,6 +395,6 @@ else
exit 1
fi
fi
tmpdir=`mktemp -d "$tmpdirprefix"/pagesXXXXX`
tmpdir=`mktemp -d "$tmpdirprefix"/pagesXXXXXX`
mini
fi

0 comments on commit 78b777e

Please sign in to comment.