-
Notifications
You must be signed in to change notification settings - Fork 0
Shamela: .bok files structure
Shamela bok files are downloadable as zipped folders in the RAR format from all book pages on its website.
Bok files are in fact mdb database files and can be opened in MS Access, or accessed with an odbc (Open DataBase Connectivity) driver (e.g., pypyodbc for Python).
The files contain the following tables and columns:
- tables present in all files:
-
Main: contains the metadata of the file. All columns are present in all bok files, but some columns contain data in only few files.
-
betaka :
a long string containing the main metadata about the book (different categories are separated by linebreaks; heading of category separated from data by a colon and a space (“: ”).
NB: not always all categories are present.
Categories:
- الكتاب : title
- المؤلف : author
- المحقق : editor
- الناشر : publisher
- الطبعة : date and number (in Arabic ordinal numbers, e.g., الثانية) of the edition
- عدد الأجزاء : number of volumes
- رسالة : Info on the type of dissertation if the work is a dissertation
- إعداد : Author (if the work is a dissertation)
- إشراف : Supervisor (if the book is a dissertation)
- العام الجامعي : Academic year (if the work is a dissertation)
- [ترقيم الكتاب موافق للمطبوع] : digitization checked; page numbers follow the print version etc.
- [الكتاب مرقم آليا غير موافق للمطبوع] : Automatically digitized; page numbers etc. do not follow a specific print version
-
bk : book title
-
auth : short version of the author’s name
-
lng : long version of the author’s name
-
inf : information about the book
-
authinf : information on the author
-
higrid : Hijri date of the author’s death (date, or sometimes “معاصر”, if author is still alive)
-
ad : AD/CE date of the author’s death (e.g., 855; 99999 if “muʿāṣir”, contemporary)
-
cat : genre
-
bkid : Shamela book ID
-
aseal : ? (e.g., 08BF59FA)
-
blnk : ? (e.g., None)
-
bver : ? (e.g., 1)
-
islamshort : ? (e.g., 0)
-
oauth : Shamela author ID?? (e.g., 71)
-
oauthver : ? (e.g., 5)
-
onum : Shamela book ID (onum = old number?)
-
over : ? (e.g., 1)
-
pdf : ? pdf is available? (e.g., 1, None)
-
pdfcs : ? (e.g., 1)
-
seal : ? (e.g., FFC9D645)
-
shrtcs : ? (e.g., 0)
-
tafseernam : ? (e.g., None)
-
vername : ? (only has data in 16/7509 files)
-
-
bxxxx (where xxxx is the Shamela id number of the book): contains the text
- id : id number of the current (section of) page (note that the same page may contain a number of sections with different ids!)
- page : the page number (as in the printed copy) of the current page
- part : the volume number of the current page
- nass : the text of the current page
- sora : the number of the first sūra discussed in the current page (in tafsīr works)
- aya : the number of the first āya discussed in the current page (in tafsīr works)
- na : the number of āyas discussed in the current page (in tafsīr works); i.e., aya + na of the current page = aya of the next page (usually; sometimes, a number of āyas are discussed together in the space of more than one page)
- hno : ḥadīth number (as printed in the text)
- seal : ? (8-number hexadeximal code)
- b1 : ?
- b2 : ?
- b3 : ?
- b4 : ?
-
txxxx (where xxxx is the Shamela id number of the book): contains the headings of each section
- id : id of the page section (see also bxxxx)
- tit : title (heading) of the page section
- lvl : level of the section (1 = highest)
- sub : 0 if there is only one title in a section; subsequent titles in the same page section are given numbers 1, 2, etc.
-
nBound
- bcode
- dver
- d
- b
- bver
-
abc
- a
- b
- c
-
men_b
- name
- id
- manid
- bk
-
men_h
- name
- id
- upg
-
men_u
- name
- id
- bk
-
com
- id
- com
- bk
-
oShrooh
- sharh
- matn
- sharhver
- matnver
-
Shrooh
- sharh
- matn
- matnid
- sharhid
-
oShr
- sharh
- matn
- matnid
- sharhid
-
Shorts : abbreviations used in the book
- bk : Shamela BookID
- ramz : abbreviation (e.g., in Shamela0008: "Q" for "السؤال")
- nass : the expression the abbreviation refers to (e.g., in Shamela0008: "السؤال" is referred to by ramz "Q")
-
avPdf
- onum
- def
- cs
- vername
- pdfver
-
sPdf
- onum
- part
- sfilename
The following overview is the result of an analysis of all bok files scraped from Shamela in October 2019:
| table name | column name | present in number of files | data in number of files |
|---|---|---|---|
| bxxxx | 7509 | 7509 | |
| nass | 7509 | 7509 | |
| seal | 7509 | 7509 | |
| id | 7509 | 7509 | |
| page | 7509 | 7451 | |
| part | 7509 | 7449 | |
| hno | 1748 | 1744 | |
| na | 1027 | 100 | |
| sora | 1027 | 100 | |
| aya | 1027 | 100 | |
| b1 | 17 | 17 | |
| b2 | 1 | 1 | |
| b3 | 1 | 1 | |
| b4 | 1 | 1 | |
| blnk | 16 | 16 | |
| ppart1 | 40 | 40 | |
| ppage1 | 40 | 40 | |
| ppart2 | 7 | 7 | |
| ppage2 | 7 | 7 | |
| ppart3 | 7 | 7 | |
| ppage3 | 7 | 7 | |
| ppart4 | 2 | 2 | |
| ppage4 | 2 | 2 | |
| done | 3 | 3 | |
| bhno | 1 | 1 | |
| Main | 7509 | 7509 | |
| oauth | 7509 | 7509 | |
| auth | 7509 | 7509 | |
| bver | 7509 | 7509 | |
| oauthver | 7509 | 7509 | |
| over | 7509 | 7509 | |
| lng | 7509 | 7509 | |
| betaka | 7509 | 7509 | |
| aseal | 7509 | 7509 | |
| seal | 7509 | 7509 | |
| bk | 7509 | 7509 | |
| onum | 7509 | 7509 | |
| bkid | 7509 | 7509 | |
| ad | 7509 | 7509 | |
| cat | 7509 | 7509 | |
| islamshort | 7509 | 7508 | |
| authinf | 7509 | 7265 | |
| higrid | 7509 | 7265 | |
| pdfcs | 7509 | 6711 | |
| 7509 | 6480 | ||
| inf | 7509 | 5617 | |
| shrtcs | 2664 | 2124 | |
| tafseernam | 7509 | 183 | |
| vername | 7509 | 16 | |
| blnk | 7509 | 16 | |
| txxxx | 7509 | 7507 | |
| sub | 7509 | 7507 | |
| id | 7509 | 7507 | |
| tit | 7509 | 7507 | |
| lvl | 7509 | 7507 | |
| abc | 7509 | 397 | |
| a | 7509 | 397 | |
| b | 7509 | 397 | |
| c | 7509 | 397 | |
| Shorts | 7509 | 341 | |
| nass | 7509 | 341 | |
| ramz | 7509 | 341 | |
| bk | 7509 | 341 | |
| sPdf | 7509 | 236 | |
| part | 7509 | 236 | |
| sfilename | 7509 | 236 | |
| onum | 7509 | 236 | |
| men_u | 7509 | 148 | |
| id | 7509 | 148 | |
| name | 7509 | 148 | |
| bk | 7509 | 148 | |
| avPdf | 7509 | 28 | |
| cs | 7509 | 28 | |
| def | 7509 | 28 | |
| onum | 7509 | 28 | |
| vername | 7509 | 28 | |
| pdfver | 7509 | 28 | |
| men_b | 7509 | 17 | |
| id | 7509 | 17 | |
| manid | 7509 | 17 | |
| name | 7509 | 17 | |
| bk | 7509 | 17 | |
| nBound | 7509 | 13 | |
| d | 7509 | 13 | |
| b | 7509 | 13 | |
| dver | 7509 | 13 | |
| bver | 7509 | 13 | |
| bcode | 7509 | 13 | |
| oShr | 7509 | 9 | |
| matn | 7509 | 9 | |
| sharhid | 7509 | 9 | |
| matnid | 7509 | 9 | |
| sharh | 7509 | 9 | |
| oShrooh | 7509 | 9 | |
| matn | 7509 | 9 | |
| matnver | 7509 | 9 | |
| sharhver | 7509 | 9 | |
| sharh | 7509 | 9 | |
| Shrooh | 7509 | 0 | |
| matn | 7509 | 0 | |
| sharhid | 7509 | 0 | |
| matnid | 7509 | 0 | |
| sharh | 7509 | 0 | |
| com | 7509 | 0 | |
| id | 7509 | 0 | |
| com | 7509 | 0 | |
| bk | 7509 | 0 | |
| men_h | 7509 | 0 | |
| id | 7509 | 0 | |
| name | 7509 | 0 | |
| upg | 7509 | 0 | |
| 10759 | 1 | 1 | |
| wrd | 1 | 1 | |
| pos | 1 | 1 | |
| 10786 | 1 | 1 | |
| wrd | 1 | 1 | |
| pos | 1 | 1 | |
| 10772 | 1 | 1 | |
| wrd | 1 | 1 | |
| pos | 1 | 1 | |
| 10769 | 1 | 1 | |
| wrd | 1 | 1 | |
| pos | 1 | 1 | |
| 10773 | 1 | 1 | |
| wrd | 1 | 1 | |
| pos | 1 | 1 |
