forked from quran/ayah-detection
-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes
47 lines (38 loc) · 1.68 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# TODO: sometimes, midpoint is not the best algorithm. can we do better?
shamerly
========
settings: (110 line height, 87 max pixels, 0 for mode)
- the header for the next sura takes up two lines
- the basmalla takes up one line
- so for each sura split, skip 3 lines, except for sura tawba, skip 2.
- care must be taken for pages like 519 and 520, which end in a header (and
the actual basmala is on the next page, so it needs to be skipped there).
# TODO: need to shift all the files down by 1 (sura fatiha is currently pg 2)
known issue - for the last line of page 467, change the pixels value from
87 to 50 in order to detect it properly.
to fix up the page numbering:
#! /usr/bin/perl
for ($i=2; $i<=522; $i++) {
$target = $i-1;
$cmd = "mv page" . sprintf("%03d", $i) . ".png page" .
sprintf("%03d", $target) . ".png";
`$cmd`;
}
warsh
=====
settings: (100 line height, 35 max pixels, 0 for mode)
- each sura split header is one line, and each basmala is one line
- skip 2 lines per sura split (except sura tawba, skip 1)
previously problematic pages
- pages 27, 304, 320 and 538 are detecting 16 lines
- page 294, 385 don't work (crashes) -- done
qaloon
======
settings: (175 line height, 75 max pixels, 1 for mode
(false for transparency, can start lines at the top of the page))
- each sura split header is 2 lines
note: removing restrict_lines flag results in any page where a sura starts
at the top of the page to only detect 14 lines. it may, in the future, be
an option to remove the restrict lines option and just handle the 14 lines
case as having already skipped one of the required lines to skip.
* important - use the white background pages, otherwise, results aren't good