In this project, I present a Retrieval-Augmented Generation (RAG) system designed to assist translators and project managers working with style guides.
The current version allows users to query the European Commission’s English Style Guide and instantly receive precise, grounded answers, eliminating the need for manual searches.
The application integrates Weaviate, Amazon Bedrock (Claude 3 Sonnet), and Streamlit in a full end-to-end pipeline covering document parsing, embedding, retrieval, and LLM response generation.
It was containerized with Docker and deployed on AWS EC2 for demonstration purposes. Click the image above to watch the full demonstration. The live AWS deployment has been discontinued to manage costs.
- Answers complex style-related questions directly from the official guide.
- Leverages hybrid semantic–keyword retrieval for high precision.
- Uses prompt-engineered LLM generation to minimize hallucinations.
- Deploys consistently with Docker for containerized architecture
- Allows easy integration of new sources: terminology databases, translation memories, or domain-specific style rules.
- Translator Efficiency: Get quick, reliable answers to style questions, eliminating manual search time and increasing translation speed.
- Project Consistency: Maintain consistent terminology and formatting across documents and languages.
- Onboarding & Training: Help new translators learn institutional style rules through direct, interactive access.
- Quality Assurance: Support editors and project managers in checking compliance with official style standards.
The system implements a RAG workflow to ensure accurate, grounded answers:
| Stage | Description |
|---|---|
| 1. Document Parsing | Extracts text from the European Commission’s English Style Guide using PyMuPDF, cleaning layout noise such as headers, footers, and page numbers. |
| 2. Chunking | Splits the text into coherent chunks aligned with the document’s structure (parts, chapters, sections, subsections). |
| 3. Embedding Generation | Converts each chunk into dense vectors using BAAI/bge-large-en-v1.5, capturing semantic meaning for retrieval. |
| 4. Vector Storage & Indexing | Stores embeddings and metadata in Weaviate, supporting semantic, keyword, and hybrid searches. |
| 5. Query Processing & Retrieval | Through the Streamlit interface, the user submits a query, which is embedded and matched against stored vectors using hybrid search (α = 0.7) to retrieve the most relevant chunks. |
| 6. Prompt Construction | Retrieved context is combined into a structured prompt that defines task scope, enforces factual grounding, and prevents out-of-context answers. |
| 7. LLM Response Generation | The Streamlit app sends the prompt to Claude 3 Sonnet via Amazon Bedrock, generates the final grounded answer, and displays performance metrics (retrieval time, LLM latency, and total response time). |
knowledge-assistant/
│
├── app/ # Streamlit application logic
│ ├── app.py # Main UI and chat-based interaction with the RAG backend
│ ├── ingest_data.py # Weaviate ingestion and embedding initialization
│ └── .streamlit/config.toml # Streamlit layout configuration
│
├── data/ # Datasets and source documents
│ ├── raw/ # Source PDF (EU English Style Guide)
│ ├── processed/ # Cleaned and chunked text data
│ └── evaluation/ground_truth.json # Ground truth Q&A pairs for benchmarking
│
├── figures/ # Images and diagrams for the README
│ ├── knowledge_assistant.png # App demo screenshot (linked to YouTube)
│ └── rag_workflow.svg # System architecture diagram
│
├── notebooks/ # Full analysis and experiment documentation
│ ├── 01_data_exploration_and_preprocessing.ipynb
│ ├── 02_embedding_and_vectordb_setup.ipynb
│ ├── 03_chunking_and_retrieval_evaluation.ipynb
│ └── 04_rag_implementation_and_prompt_engineering.ipynb
│
├── results/ # Evaluation outputs and model comparison results
│ ├── retrieval_evaluation_results.json # Results from chunking and search strategy evaluation
│ └── rag_evaluation_results.json # Prompt engineering and hallucination evaluation
│
├── Dockerfile # Defines the Docker image for local deployment of the RAG app
├── docker-compose.yml # Launches Weaviate and Streamlit services
├── config.yaml # Specifies embedding model, Weaviate, and Bedrock LLM settings
├── requirements.txt # Python dependencies for the RAG pipeline
└── .env.example # Template for AWS and Weaviate connection settings
To build a high-performing RAG system, I tested key design choices:
- Chunking strategies: Compared section-based chunking (preserving document hierarchy) against fixed-size chunking (baseline)
- Retrieval methods: Evaluated pure keyword search (BM25), pure semantic (vector) search, and hybrid search with different parameters (α = 0.3 and 0.7)
- Prompt design: Tested a basic prompt against an engineered prompt that included role definition, strict grounding rules, and fallback handling for out-of-scope queries.
- Section-based chunks outperformed fixed-size chunks by 33% in F1-score.
- Preserving the document’s hierarchical structure (Parts → Chapters → Sections) improved retrieval quality.
- Hybrid search (α = 0.7) achieved the best performance, with an F1-score of 0.733.
- It outperformed both BM25 and pure vector search, indicating that the right mix of semantic and keyword matching improves overall retrieval quality.
- The engineered prompt achieved 100% factual grounding (0% hallucination) across all 10 evaluated queries, including out-of-scope edge cases.
- Prompting strategies like role definition, explicit instructions, and fallback handling prevented unsupported answers.
- 37.3% overall improvement from baseline to optimized system.
- Successfully handled edge cases, including queries outside the scope of the source document.
| Category | Tools |
|---|---|
| Data Handling | PyMuPDF, tiktoken |
| Embeddings Model | BAAI/bge-large-en-v1.5 (Hugging Face) |
| Vector Database | Weaviate |
| LLM | Claude 3 Sonnet via Amazon Bedrock |
| Web Framework | Streamlit |
| Containerization | Docker, Docker Compose |
| Cloud Deployment | AWS EC2 (temporary demo deployment) |
| Evaluation | Custom ground truth dataset & Weighted metrics (Precision, Recall, F1-score) |
git clone https://github.com/MathRC/knowledge-assistant.git
cd knowledge-assistantCreate a .env file based on .env.example to define your AWS credentials and Weaviate settings.
docker-compose up --buildThis will:
- Start a Weaviate vector database instance locally
- Build the app container
- Launch the Streamlit interface
Open your browser at: http://localhost:8501
- Expand the knowledge base with terminology databases, translation memories, and project-specific resources
- Support multilingual style guides (e.g., French, German, Spanish)
- Implement advanced retrieval methods such as query expansion, recursive retrieval, and re-ranking
- Add caching, monitoring tools, and user feedback systems
Matheus Chaud
NLP Engineer & Senior Translator
Connect: LinkedIn • GitHub
My next project will focus on exploring Graph RAG, so stay tuned!
