DarijaAssistant is a Python library designed to assist in translating Moroccan Darija (a dialect of Arabic) into English. It integrates two main functionalities:
-
Assisted Translation: The
DarijaAssistantclass provides additional support for translating words and sentences using a custom word-distance algorithm, offering assistance to improve translation accuracy, especially for difficult or ambiguous phrases. -
LLM Client: A client that allows interaction with any language model (LLM) hosted at any URL. For enhanced usability, the library also provides built-in support for OpenAI’s GPT models, allowing users to easily integrate them by simply providing the OpenAI API key and the model name, making it work out of the box.
This library allows users to perform both raw and assisted translations, improving the contextual understanding of Moroccan Darija sentences through caching, normalization, and additional linguistic analysis.
To install the library, run:
pip install DarijaTranslatorAssistantYou can choose between a model hosted at any URL or OpenAI. Here's how to initialize the client:
from DarijaTranslatorAssistant.llm_client import LLMClient
# Example using OpenAI GPT model
llm_client = LLMClient(use_openai=True, openai_api_key="your_openai_api_key", openai_model="gpt-4o")
# Example using an LLM hosted at a specific URL
llm_client = LLMClient(llm_url="http://your-llm-url.com", use_openai=False)You can perform a direct translation using the LLM client.
sentence = "law3lm asahbi"
# only uses OpenAI's gpt-4o
translation_without_assistance = llm_client.translate(sentence)
print(translation_without_assistance)
# [output]: The world, my friend.For more context-aware translation, use the DarijaAssistant class. This will assist the translation process by leveraging a word-distance algorithm.
from DarijaTranslatorAssistant.darija_assistant import DarijaAssistant
# Initialize DarijaAssistant with the LLM client
assistant = DarijaAssistant(llm_client=llm_client)
# Use assisted translation: OpenAI's gpt-4o + DarijaAssistant
sentence = "law3lm asahbi"
result = assistant.assist_and_translate(sentence)
print(result)
# [output]: I do not know my friend.Here's the difference between GPT-4 translations and our approach, showing how each handles Darija sentences with and without specialized assistance.
| Darija Sentence | GPT4o Translation Without Assistance | Assisted Translation |
|---|---|---|
| law3lm asahbi | The world, my friend. | I do not know my friend. |
| kbchlaba9ich | I feel thirsty. | Fill my cup. |
| 3rram dyal lbrahch | Brahch's pen. | Plenty of kids. |
| chof 3la tfrnisa | Check the outlet. | Look at the smile. |
You can add new words and translations using the DarijaDataManager from the DarijaDistance package, which the DarijaAssistant library relies on.
from DarijaDistance.preprocess import DarijaDataManager
data_manager = DarijaDataManager()
data_manager.add_translations([('khona', 'brother')])Now, the word "khona" will be recognized and translated as "brother" in future translations. This addition is persistent, meaning it will be saved to the library's data, not just the current session. As a result, future instances of DarijaAssistant will automatically recognize and apply this translation, without needing to re-add it.
As a user of the DarijaAssistant library, you have access to all the methods from the word-distance algorithm, such as checking translation confidence, retrieving exact matches, and more.
Contributions are welcome! If you have any ideas, suggestions, or find a bug, please open an issue or submit a pull request to the Github repo.
This project is licensed under the MIT License. See the LICENSE file for more details.
If you have any questions or feedback, you can find me on LinkedIn: Aissam Outchakoucht or on X: @aissam_out.