Skip to content

Shamela: .bok files structure

pverkind edited this page Nov 1, 2019 · 6 revisions

Shamela bok files are downloadable as zipped folders in the RAR format from all book pages on its website.

Shamela bok file

Bok files are in fact mdb database files and can be opened in MS Access, or accessed with an odbc (Open DataBase Connectivity) driver (e.g., pypyodbc for Python).

The files contain the following tables and columns:

  1. tables present in all files:
  • Main: contains the metadata of the file. All columns are present in all bok files, but some columns contain data in only few files.

    • betaka :

      a long string containing the main metadata about the book (different categories are separated by linebreaks; heading of category separated from data by a colon and a space (“: ”).

      NB: not always all categories are present.

      Categories:

      • الكتاب : title
      • المؤلف : author
      • المحقق : editor
      • الناشر : publisher
      • الطبعة : date and number (in Arabic ordinal numbers, e.g., الثانية) of the edition
      • عدد الأجزاء : number of volumes
      • رسالة : Info on the type of dissertation if the work is a dissertation
      • إعداد : Author (if the work is a dissertation)
      • إشراف : Supervisor (if the book is a dissertation)
      • العام الجامعي : Academic year (if the work is a dissertation)
      • [ترقيم الكتاب موافق للمطبوع] : digitization checked; page numbers follow the print version etc.
      • [الكتاب مرقم آليا غير موافق للمطبوع] : Automatically digitized; page numbers etc. do not follow a specific print version
    • bk : book title

    • auth : short version of the author’s name

    • lng : long version of the author’s name

    • inf : information about the book

    • authinf : information on the author

    • higrid : Hijri date of the author’s death (date, or sometimes “معاصر”, if author is still alive)

    • ad : AD/CE date of the author’s death (e.g., 855; 99999 if “muʿāṣir”, contemporary)

    • cat : genre

    • bkid : Shamela book ID

    • aseal : ? (e.g., 08BF59FA)

    • blnk : ? (e.g., None)

    • bver : ? (e.g., 1)

    • islamshort : ? (e.g., 0)

    • oauth : Shamela author ID?? (e.g., 71)

    • oauthver : ? (e.g., 5)

    • onum : Shamela book ID (onum = old number?)

    • over : ? (e.g., 1)

    • pdf : ? pdf is available? (e.g., 1, None)

    • pdfcs : ? (e.g., 1)

    • seal : ? (e.g., FFC9D645)

    • shrtcs : ? (e.g., 0)

    • tafseernam : ? (e.g., None)

    • vername : ? (only has data in 16/7509 files)

  • bxxxx (where xxxx is the Shamela id number of the book): contains the text

    • id : id number of the current (section of) page (note that the same page may contain a number of sections with different ids!)
    • page : the page number (as in the printed copy) of the current page
    • part : the volume number of the current page
    • nass : the text of the current page
    • sora : the number of the first sūra discussed in the current page (in tafsīr works)
    • aya : the number of the first āya discussed in the current page (in tafsīr works)
    • na : the number of āyas discussed in the current page (in tafsīr works); i.e., aya + na of the current page = aya of the next page (usually; sometimes, a number of āyas are discussed together in the space of more than one page)
    • hno : ḥadīth number (as printed in the text)
    • seal : ? (8-number hexadeximal code)
    • b1 : ?
    • b2 : ?
    • b3 : ?
    • b4 : ?
  • txxxx (where xxxx is the Shamela id number of the book): contains the headings of each section

    • id : id of the page section (see also bxxxx)
    • tit : title (heading) of the page section
    • lvl : level of the section (1 = highest)
    • sub : 0 if there is only one title in a section; subsequent titles in the same page section are given numbers 1, 2, etc.
  • nBound

    • bcode
    • dver
    • d
    • b
    • bver
  • abc

    • a
    • b
    • c
  • men_b

    • name
    • id
    • manid
    • bk
  • men_h

    • name
    • id
    • upg
  • men_u

    • name
    • id
    • bk
  • com

    • id
    • com
    • bk
  • oShrooh

    • sharh
    • matn
    • sharhver
    • matnver
  • Shrooh

    • sharh
    • matn
    • matnid
    • sharhid
  • oShr

    • sharh
    • matn
    • matnid
    • sharhid
  • Shorts : abbreviations used in the book

    • bk : Shamela BookID
    • ramz : abbreviation (e.g., in Shamela0008: "Q" for "السؤال")
    • nass : the expression the abbreviation refers to (e.g., in Shamela0008: "السؤال" is referred to by ramz "Q")
  • avPdf

    • onum
    • def
    • cs
    • vername
    • pdfver
  • sPdf

    • onum
    • part
    • sfilename

Overview of the frequency of these tables and columns in the Shamela bok files:

The following overview is the result of an analysis of all bok files scraped from Shamela in October 2019:

table name column name present in number of files data in number of files
bxxxx 7509 7509
nass 7509 7509
seal 7509 7509
id 7509 7509
page 7509 7451
part 7509 7449
hno 1748 1744
na 1027 100
sora 1027 100
aya 1027 100
b1 17 17
b2 1 1
b3 1 1
b4 1 1
blnk 16 16
ppart1 40 40
ppage1 40 40
ppart2 7 7
ppage2 7 7
ppart3 7 7
ppage3 7 7
ppart4 2 2
ppage4 2 2
done 3 3
bhno 1 1
Main 7509 7509
oauth 7509 7509
auth 7509 7509
bver 7509 7509
oauthver 7509 7509
over 7509 7509
lng 7509 7509
betaka 7509 7509
aseal 7509 7509
seal 7509 7509
bk 7509 7509
onum 7509 7509
bkid 7509 7509
ad 7509 7509
cat 7509 7509
islamshort 7509 7508
authinf 7509 7265
higrid 7509 7265
pdfcs 7509 6711
pdf 7509 6480
inf 7509 5617
shrtcs 2664 2124
tafseernam 7509 183
vername 7509 16
blnk 7509 16
txxxx 7509 7507
sub 7509 7507
id 7509 7507
tit 7509 7507
lvl 7509 7507
abc 7509 397
a 7509 397
b 7509 397
c 7509 397
Shorts 7509 341
nass 7509 341
ramz 7509 341
bk 7509 341
sPdf 7509 236
part 7509 236
sfilename 7509 236
onum 7509 236
men_u 7509 148
id 7509 148
name 7509 148
bk 7509 148
avPdf 7509 28
cs 7509 28
def 7509 28
onum 7509 28
vername 7509 28
pdfver 7509 28
men_b 7509 17
id 7509 17
manid 7509 17
name 7509 17
bk 7509 17
nBound 7509 13
d 7509 13
b 7509 13
dver 7509 13
bver 7509 13
bcode 7509 13
oShr 7509 9
matn 7509 9
sharhid 7509 9
matnid 7509 9
sharh 7509 9
oShrooh 7509 9
matn 7509 9
matnver 7509 9
sharhver 7509 9
sharh 7509 9
Shrooh 7509 0
matn 7509 0
sharhid 7509 0
matnid 7509 0
sharh 7509 0
com 7509 0
id 7509 0
com 7509 0
bk 7509 0
men_h 7509 0
id 7509 0
name 7509 0
upg 7509 0
10759 1 1
wrd 1 1
pos 1 1
10786 1 1
wrd 1 1
pos 1 1
10772 1 1
wrd 1 1
pos 1 1
10769 1 1
wrd 1 1
pos 1 1
10773 1 1
wrd 1 1
pos 1 1

Clone this wiki locally