# Pipeline

See [usage](https://among.github.io/fusus/use/).

See [example](https://github.com/among/fusus/blob/master/example/doExample.ipynb).

How to read this notebook:

1.  *best experience*
    get this repository on your computer and run `jupyter lab`.
    Also install the table of contents extension in Jupyter Lab, since this is a lengthy notebook
    You can run the code cells now.
1.  *good reading experience*
    read it on [NbViewer](https://nbviewer.jupyter.org/github/among/fusus/blob/master/example/doExample.ipynb)
1.  *suboptimal*
    read it directly on [GitHub](https://github.com/among/fusus/blob/master/example/doExample.ipynb)
    (long time to load)

# The beginning

Enable autoloading of changed code.

Change the directory to where this notebook resides on disk, i.e. near the book's scanned pages directory.

In [1]:
%load_ext autoreload
%autoreload 2
!cd `pwd`

In [2]:
import fusus

Import the fusus package.

See [install](https://among.github.io/fusus/about/install.html).

In [3]:
from fusus.book import Book

Initialize the processing line.

In [12]:
B = Book()

  0.01s Loading for Kraken: ~/github/among/fusus/model/arabic_generalized.mlmodel
  0.98s model loaded


In [19]:
page = B.process(pages=132, doOcr=True, batch=False)

  0.00s Batch of 1 pages: 132
  0.00s Start batch processing images
   |     5.55s     1 132.jpg                                 
  5.55s all done


In [20]:
page.show(stage="proof")

<IPython.core.display.Image object>

In [17]:
page.show(stage="ocrw")

stripe	column	line	left	top	right	bottom	confidence	text
1	l	0	973	373	1028	449	90	أعم
1	l	0	911	373	946	449	100	من
1	l	0	733	373	884	449	100	المبادىء
1	l	0	623	373	678	449	98	وهو
1	l	0	561	373	589	449	99	ما
1	l	0	465	373	541	449	89	يتوقف
1	l	0	349	373	403	449	100	عليه
1	l	0	204	373	328	449	98	المسائل
1	l	0	129	373	163	449	100	بلا
1	l	1	923	458	1021	544	98	واسطة
1	l	1	712	458	859	544	99	والمقدمة
1	l	1	656	458	677	544	98	ما
1	l	1	557	458	635	544	94	يتوقف
1	l	1	431	458	487	544	98	عليه
1	l	1	291	458	410	544	99	المسائل
1	l	1	129	458	248	544	95	بواسطة
1	l	2	1012	553	1035	636	88	أو
1	l	2	952	553	975	636	88	لا
1	l	2	824	553	922	636	86	واسطة
1	l	2	793	553	801	636	100	.
1	l	2	695	553	763	636	92	فتأمل
1	l	2	665	553	665	636	100	!
1	r	0	2006	380	2078	456	100	أعم
1	r	0	1933	380	1982	456	98	من
1	r	0	1783	380	1903	456	98	مقدمة
1	r	0	1663	380	1765	456	87	العلم
1	r	0	1506	380	1614	456	95	بينهما
1	r	0	1380	380	1476	456	99	عموم
1	r	0	1211	380	1362	456	99	وخصوص
1	r	1	2008	471	2072	541	98	مطلق
1	r	1	1967	4

In [18]:
page.show(stage="ocr")

stripe	column	line	left	top	right	bottom	confidence	text
1	l	0	1028	373	1028	449	100	ا
1	l	0	1021	373	1021	449	60	ٔ
1	l	0	1001	373	1007	449	100	ع
1	l	0	973	373	980	449	100	م
1	l	0	939	373	946	449	100	م
1	l	0	911	373	918	449	100	ن
1	l	0	884	373	884	449	100	ا
1	l	0	870	373	877	449	100	ل
1	l	0	850	373	850	449	100	م
1	l	0	829	373	829	449	100	ب
1	l	0	808	373	815	449	100	ا
1	l	0	795	373	795	449	100	د
1	l	0	767	373	767	449	100	ى
1	l	0	733	373	733	449	100	ء
1	l	0	671	373	678	449	100	و
1	l	0	644	373	644	449	95	ه
1	l	0	623	373	623	449	100	و
1	l	0	582	373	589	449	98	م
1	l	0	561	373	568	449	100	ا
1	l	0	541	373	541	449	100	ي
1	l	0	527	373	527	449	93	ت
1	l	0	513	373	513	449	100	و
1	l	0	486	373	486	449	53	ق
1	l	0	465	373	472	449	100	ف
1	l	0	397	373	403	449	100	ع
1	l	0	376	373	383	449	100	ل
1	l	0	362	373	362	449	100	ي
1	l	0	349	373	349	449	100	ه
1	l	0	321	373	328	449	100	ا
1	l	0	307	373	314	449	100	ل
1	l	0	287	373	287	449	99	م
1	l	0	259	373	266	449	100	س
1	l	0	232	373	239	449	100	ا
1	l	0	218	373	225	4

In [9]:
page.show(stage="layout")

<IPython.core.display.Image object>

In [43]:
page.show(stage="clean")

{'normalizedC', 'gray', 'layout', 'normalized', 'histogram', 'orig', 'cleanh', 'boxed', 'markData', 'demarginedC', 'ocr', 'clean', 'demargined', 'rotated'}


<IPython.core.display.Image object>

In [30]:
binary = page.stages["binary"]
clean = page.stages["clean"]

In [31]:
cleanIm = PILFromArray(clean)

In [None]:
binary

In [20]:
type(binary)

numpy.ndarray

In [21]:
binary

array([[255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       ...,
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255]], dtype=uint8)

In [7]:
page.show(stage="segments")

    33s Will skip illegal stages: segments


{'normalized', 'clean', 'rotated', 'gray', 'binary', 'ocrData', 'markData', 'demargined', 'orig'}


In [29]:
from kraken.pageseg import segment
from fusus.lib import PILFromArray
from kraken.binarization import nlbin

In [25]:
im = PILFromArray(binary)

In [28]:
im.mode

'L'

In [27]:
segment(im)

{'text_direction': 'horizontal-lr',
 'boxes': [[124, 367, 2081, 469],
  [125, 464, 1106, 557],
  [1845, 478, 2080, 564],
  [659, 557, 1959, 640],
  [1008, 719, 1200, 768]],
 'script_detection': False}

In [23]:
segment(binary)

AttributeError: 'numpy.ndarray' object has no attribute 'mode'