Skip to content

NLP-Final-Projects/IRI_LAW

Repository files navigation

Persian Legal Assistant System based on RAG

An AI-based system designed to assist you with your legal questions.
An example of the input:

1402-12-12 20 45 47

Corresponding output generated by the model:

1402-12-12 20 45 53

Installations

First, you need to install Python packages used in the project:

pip install -r requirements.txt

If you are running the project on Google Colab, you can skip this part as it is written in the Jupyter Notebook.
Second, an OpenAI API Key is needed. You can easily create it by going through this page.

Usage

There is a web interface provided for you to interact with our pipeline. The simplest way to do so is by running the example file in the source directory. The embedding used for the retrieval part of the system is the default OpenAIEmbedding and the Language Model is GPT3.5. After running the Notebook, you are asked to enter your OpenAI API Key. After this, you need to wait a little bit for the running cells, and then a query box is presented to you. Write your question in the box and click the button. Your answer is generated after a few seconds.

Foldering

  • Chroma: Different ChromaDBs are presented in this folder. Main contains all laws, ArticleBased has the documents that are split based on each article of law, and the Naive is the one used in most RAG configurations, chunking without any semantics involved.
  • Datasets: Different domains of law taken from the official site of the laws of the Islamic Republic.
  • Evaluation/Business_law: A dataset of business questions and answers written by our team. There are three different files presented considering the difficulty of the questions.
  • Results: final results of the project.
    • 1_easy_labse: Evaluation of easy questions with LaBSE for document embeddings.
    • 2_easy_labse_chunking: Same as the previous file, but with smart chunking.
    • 3_business_all_questions_openai: The generated output for all types of questions, without LLM as the evaluator.
    • 4_medium_openai: Evaluation of medium questions with OpenAI embeddings.
    • 5_hard_openai_correctness: Evaluating the correctness of hard questions with the LLM only based on correctness.
  • Source: source files of the project. For example, Chroma_Builder can be used for creating Document DB for any law.
  • Utils: Utilities used for the project.

Results

Three different document embedding models have been implemented in this project. LaBSE, OpenAI default embedding, and Fasttext as a base model.
Also, two different languge models are presented. MaralGPT 7B and GPT3.5. All outputs are generated from GPT.
Our results for different settings are shown here:

Model/Evaluating Faithfulness Answer Relevancy Context Recall Context Precision Answer Correctness
Easy questions on LaBSE .887 0.735 .84 .675 .606
Easy questions on LaBSE with smart chunking .737 .804 .811 546 .619
Medium Questions on OpenAI .895 .799 .917 .85 .677
Hard Questions on OpenAI - - - - .645

About

A RAG-based System for Law QA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published