-
Notifications
You must be signed in to change notification settings - Fork 0
Conversion step 1: RAR to BOK
pverkind edited this page Oct 22, 2019
·
1 revision
Books are downloaded from the Shamela website in either epub or bok format. The bok files have the advantage that they contain a table of contents which can be used to automatically tag section headings in the book.
The bok files are zipped as RAR archives.
These can be unzipped by using an unzip utility like WinRAR.
The name of the RAR archives is the Shamela book id. The name of the enclosed BOK file is the Arabic title of the book. After unzipping the RAR files, we give the BOK files the same file name as the RAR files. This can be automated in different ways:
- using a shell script:
#!/bin/bash
# extracting data from MDB files
echo "Testing the script"
srcFolder="1_rar"
trgFolder="2_bok_TEMP"
mkdir $trgFolder
cd ./$srcFolder
for rarFile in *.rar
do
echo $rarFile
mkdir ../$trgFolder/$rarFile
unrar e $rarFile ../$trgFolder/$rarFile
done
- using a python script:
#! /usr/bin/env python
import rarfile
import os
import shutil
# point Python to the location of the Unrar.exe program on your machine:
# (the program can be downloaded from https://www.win-rar.com/download.html)
rarfile.UNRAR_TOOL = r"C:\Program Files\WinRAR\UnRaR.exe"
def unRAR(source_fp, bok_dir):
# extract the bok file into a temporary directory:
bookid = os.path.splitext(os.path.basename(source_fp))[0]
temp_dir = source_fp[:-4]
with rarfile.RarFile(source_fp) as rf:
rf.extractall(temp_dir)
# move the file to the bok directory
for fn in os.listdir(temp_dir):
if fn.endswith(".bok"):
os.rename(os.path.join(temp_dir, fn), os.path.join(bok_dir, bookid+".bok"))
# remove the temp_dir:
shutil.rmtree(temp_dir)
src_folder=r".\1_rar"
bok_folder=r".\2_bok"
for fn in os.listdir(src_folder):
unRAR(os.path.join(src_folder, fn), bok_folder)