GitHub - hussainnazary2/book-translator

Persian-English QA System

What This Project Does – AI Book Translator (Offline, Local)

This project is an AI-powered system that helps users read English books, ask questions in Dari, and get accurate answers and translations—all offline, without needing the internet or any API keys.

How It Works (Step-by-Step): Reads the Book (PDF) The system reads the book and splits the text into smaller parts.

Embeds the Text (Vectorization) Each part of the book is converted into numerical vectors using embedding libraries and stored in a vector database for fast access.

User Asks a Question in Dari The user types a question in Dari (e.g., "What is the main idea of chapter 2?").

Translates the Question to English The question is automatically translated into English to match the book’s content.

Finds the Most Relevant Parts of the Book The system searches and retrieves the most relevant sections related to the question.

Uses an LLM to Answer A local language model (like Mistral) reads those sections and generates a response.

Translates the Answer Back to Dari The English answer is translated back to Dari and shown to the user.

Key Features: 100% Offline – No internet or API needed

Supports Dari language input/output

Private & Secure – All data stays on your local device

Works on mid-range laptops – No expensive hardware required

Based on open-source models and libraries

Setup

Install requirements: pip install -r requirements.txt
Download models: Mistral-7B-Instruct.gguf to models/ MT5 translation models to models/mt5-fa-en and models/mt5-en-fa
Create vector store: python setup.py

Usage:

from src.workflow import run_qa

persian_question = "نویسنده این کتاب کیست؟" answer = run_qa(persian_question) print(answer)

Message Examples ( I used "A Brief History of Time" book for testing) User (Persian): سلام! این سیستم چطور کار می‌کند؟ System: سلام! این سیستم می‌تواند به سوالات شما درباره کتاب‌های انگلیسی پاسخ دهد. لطفا سوال خود را بپرسید.

User (Persian): نویسنده کتاب "تاریخچه زمان" کیست؟ System (Processing Steps):

Translating to English: "Who is the author of 'A Brief History of Time'?"
Searching PDF content...
Found relevant passage: "A Brief History of Time by Stephen Hawking..."
Generating answer: "The author is Stephen Hawking."
Translating to Persian: "نویسنده استیون هاوکینگ است."

User (Persian): موضوع اصلی کتاب چیست؟ System (Processing Steps):

Translating to English: "What is the main subject of the book?"
Searching PDF content...
Found relevant passage: "This book explores the nature of time, black holes..."
Generating answer: "It explores cosmology and theoretical physics."
Translating to Persian: "کتاب به بررسی کیهان‌شناسی و فیزیک نظری می‌پردازد."

User (Persian): یک نقل قول معروف از کتاب بگویید System (Processing Steps):

Translating to English: "Tell me a famous quote from the book"
Searching PDF content...
Found passage: "The universe doesn't allow perfection..."
Generating answer: "One famous quote is: 'The universe doesn't allow perfection.'"
Translating to Persian: "یک نقل قول معروف: 'جهان به کمال اجازه وجود نمی‌دهد.'"

User (Persian): متشکرم System: خواهش می‌کنم! برای سوالات دیگر درباره کتاب در خدمت

Workflow Steps:

Translate Persian question → English
Retrieve relevant context from PDF
Generate English answer using Mistral
Translate English answer → Persian

Key Components:

PDF Processing:
- Uses PyPDFLoader for text extraction
- Recursive text splitting preserves context
Translation:
- Local MT5 models for Persian ↔ English
- Separate models for each translation direction
Vector Search:
- FAISS for efficient similarity search
- MiniLM embeddings for document retrieval
LLM Integration:
- Mistral-7B via llama.cpp for local inference
- Prompt engineering for QA tasks
Workflow Orchestration:
- LangGraph state management
- Clear node definitions for each processing step
- Typed state transitions

To use the system:

Run setup.py to create the vector store
Call run_qa() with Persian questions
Get Persian answers back

The implementation focuses on:

Local execution only (no API dependencies)
Modular components
Efficient document retrieval
Clear state transitions
Minimal external dependencies

Note: Adjust n_gpu_layers in llm_query.py based on your GPU capabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
src		src
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
langgraph_workflow.json		langgraph_workflow.json
model.py		model.py
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup

Key Components:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hussainnazary2/book-translator

Folders and files

Latest commit

History

Repository files navigation

Setup

Key Components:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages