Skip to content

Conversion step 1: RAR to BOK

pverkind edited this page Oct 22, 2019 · 1 revision

Books are downloaded from the Shamela website in either epub or bok format. The bok files have the advantage that they contain a table of contents which can be used to automatically tag section headings in the book.

The bok files are zipped as RAR archives.

These can be unzipped by using an unzip utility like WinRAR.

The name of the RAR archives is the Shamela book id. The name of the enclosed BOK file is the Arabic title of the book. After unzipping the RAR files, we give the BOK files the same file name as the RAR files. This can be automated in different ways:

  1. using a shell script:
#!/bin/bash
# extracting data from MDB files

echo "Testing the script"

srcFolder="1_rar"
trgFolder="2_bok_TEMP"

mkdir $trgFolder

cd ./$srcFolder
for rarFile in *.rar
do
    echo $rarFile
    mkdir ../$trgFolder/$rarFile
    unrar e $rarFile ../$trgFolder/$rarFile
done
  1. using a python script:
#! /usr/bin/env python
import rarfile
import os
import shutil

# point Python to the location of the Unrar.exe program on your machine:
# (the program can be downloaded from https://www.win-rar.com/download.html)
rarfile.UNRAR_TOOL = r"C:\Program Files\WinRAR\UnRaR.exe" 

def unRAR(source_fp, bok_dir):
    # extract the bok file into a temporary directory:
    bookid = os.path.splitext(os.path.basename(source_fp))[0]
    temp_dir = source_fp[:-4]
    with rarfile.RarFile(source_fp) as rf:
        rf.extractall(temp_dir)
    # move the file to the bok directory
    for fn in os.listdir(temp_dir):
        if fn.endswith(".bok"):
            os.rename(os.path.join(temp_dir, fn), os.path.join(bok_dir, bookid+".bok"))
    # remove the temp_dir:
    shutil.rmtree(temp_dir)

src_folder=r".\1_rar"
bok_folder=r".\2_bok"
for fn in os.listdir(src_folder):
    unRAR(os.path.join(src_folder, fn), bok_folder)

Clone this wiki locally