Persian Legal Assistant System based on RAG

An AI-based system designed to assist you with your legal questions.
An example of the input:

Corresponding output generated by the model:

Installations

First, you need to install Python packages used in the project:

pip install -r requirements.txt

If you are running the project on Google Colab, you can skip this part as it is written in the Jupyter Notebook.
Second, an OpenAI API Key is needed. You can easily create it by going through this page.

Usage

There is a web interface provided for you to interact with our pipeline. The simplest way to do so is by running the example file in the source directory. The embedding used for the retrieval part of the system is the default OpenAIEmbedding and the Language Model is GPT3.5. After running the Notebook, you are asked to enter your OpenAI API Key. After this, you need to wait a little bit for the running cells, and then a query box is presented to you. Write your question in the box and click the button. Your answer is generated after a few seconds.

Foldering

Chroma: Different ChromaDBs are presented in this folder. Main contains all laws, ArticleBased has the documents that are split based on each article of law, and the Naive is the one used in most RAG configurations, chunking without any semantics involved.
Datasets: Different domains of law taken from the official site of the laws of the Islamic Republic.
Evaluation/Business_law: A dataset of business questions and answers written by our team. There are three different files presented considering the difficulty of the questions.
Results: final results of the project.
- 1_easy_labse: Evaluation of easy questions with LaBSE for document embeddings.
- 2_easy_labse_chunking: Same as the previous file, but with smart chunking.
- 3_business_all_questions_openai: The generated output for all types of questions, without LLM as the evaluator.
- 4_medium_openai: Evaluation of medium questions with OpenAI embeddings.
- 5_hard_openai_correctness: Evaluating the correctness of hard questions with the LLM only based on correctness.
Source: source files of the project. For example, Chroma_Builder can be used for creating Document DB for any law.
Utils: Utilities used for the project.

Results

Three different document embedding models have been implemented in this project. LaBSE, OpenAI default embedding, and Fasttext as a base model.
Also, two different languge models are presented. MaralGPT 7B and GPT3.5. All outputs are generated from GPT.
Our results for different settings are shown here:

Model/Evaluating	Faithfulness	Answer Relevancy	Context Recall	Context Precision	Answer Correctness
Easy questions on LaBSE	.887	0.735	.84	.675	.606
Easy questions on LaBSE with smart chunking	.737	.804	.811	546	.619
Medium Questions on OpenAI	.895	.799	.917	.85	.677
Hard Questions on OpenAI	-	-	-	-	.645

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
chroma		chroma
datasets		datasets
evaluation/business_law		evaluation/business_law
results		results
source		source
utils		utils
.gitignore		.gitignore
Arash_IRI_LAW_RAG.ipynb		Arash_IRI_LAW_RAG.ipynb
IRI_LAW_RAG.ipynb		IRI_LAW_RAG.ipynb
Presentation.pdf		Presentation.pdf
Project_Proposal.pdf		Project_Proposal.pdf
README.md		README.md
Report.pdf		Report.pdf
Shayan_IRI_LAW_RAG.ipynb		Shayan_IRI_LAW_RAG.ipynb

NLP-Final-Projects/IRI_LAW

Folders and files

Latest commit

History

Repository files navigation

Persian Legal Assistant System based on RAG

Installations

Usage

Foldering

Results

About

Resources

Stars

Watchers

Forks

Languages