convert PDF to TXT.
- Install:
pip install aradf
-
Install Tesseract, include arabic training data in the installation from: https://github.com/UB-Mannheim/tesseract/wiki
-
convert PDF to TXT:
from aradf import convertor
# get the text, it also saves txt file to the same directory of the pdf
txt = convertor.pdf_to_txt('path/to/pdf_file')