<a href="https://colab.research.google.com/github/IRONMAN-AIcoder/genai/blob/main/creativeproject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**"KnowLaw: AI-Powered Legal Document Assistant"**

Project Context:

"KnowLaw" is an AI-driven legal document assistant designed to enhance the efficiency of legal research and analysis. The system utilizes advanced natural language processing (NLP) techniques to process a variety of legal documents, such as text files, Word documents, and PDFs, and provide insightful answers to legal queries.

At its core, this project combines document retrieval and generative models to create a Retrieval-Augmented Generation (RAG) system. It uses vector search with embeddings to identify relevant sections from legal documents and then generates answers by leveraging OpenRouter's GPT model, trained on legal and conversational data.

This tool is aimed at making legal professionals' tasks faster and more accurate by allowing them to query a vast repository of legal information. Whether it’s for lawyers, researchers, or students, "KnowLaw" simplifies the process of extracting useful legal knowledge from extensive document archives.

In [1]:
import os
import requests
from langchain.document_loaders import TextLoader, UnstructuredFileLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import CharacterTextSplitter
from google.colab import files


OPENROUTER_API_KEY = "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"
OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions"
EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


def load_legal_docs(folder_path):
    all_texts = []
    for file in os.listdir(folder_path):
        full_path = os.path.join(folder_path, file)
        if file.endswith(".txt"):
            loader = TextLoader(full_path)
        elif file.endswith((".docx", ".pdf")):
            loader = UnstructuredFileLoader(full_path)
        else:
            print(f"⚠️ Skipping unsupported file: {file}")
            continue
        documents = loader.load()
        all_texts.extend(documents)
    return all_texts


def build_vector_store(documents):
    text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    split_docs = text_splitter.split_documents(documents)
    vector_store = FAISS.from_documents(split_docs, EMBEDDING_MODEL)
    return vector_store


def retrieve_context(query, vector_store, k=3):
    docs = vector_store.similarity_search(query, k=k)
    context = "\n\n".join([doc.page_content for doc in docs])
    return context


def query_openrouter_gpt(prompt):
    headers = {
        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "model": "openai/gpt-3.5-turbo",
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.5
    }

    response = requests.post(OPENROUTER_API_URL, headers=headers, json=payload)
    if response.status_code == 200:
        return response.json()["choices"][0]["message"]["content"]
    else:
        raise Exception(f"OpenRouter Error {response.status_code}: {response.text}")


def legal_assistant(query, vector_store):
    context = retrieve_context(query, vector_store)
    prompt = f"""You are a helpful legal assistant. Use the following context to answer the user's question:

Context:
{context}

Question: {query}

Answer:"""
    return query_openrouter_gpt(prompt)


if __name__ == "__main__":
    print("📂 Upload your legal documents (.txt, .docx, .pdf)...")
    uploaded = files.upload()

    os.makedirs("legal_docs", exist_ok=True)
    for filename in uploaded.keys():
        new_path = os.path.join("legal_docs", filename)
        os.rename(filename, new_path)

    print("🔍 Loading legal documents...")
    legal_docs = load_legal_docs("legal_docs")
    vs = build_vector_store(legal_docs)

    print("⚖️ Ask your legal question!")
    while True:
        user_question = input("🗨️ Question (type 'exit' to quit): ")
        if user_question.lower() in ["exit", "quit"]:
            break
        try:
            answer = legal_assistant(user_question, vs)
            print(f"\n📜 Answer:\n{answer}\n")
        except Exception as e:
            print(f"❌ Error: {e}")


  EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


📂 Upload your legal documents (.txt, .docx, .pdf)...


Saving p17027coll5_38487.pdf to p17027coll5_38487.pdf
🔍 Loading legal documents...


  loader = UnstructuredFileLoader(full_path)


⚖️ Ask your legal question!
🗨️ Question (type 'exit' to quit): explain the case

📜 Answer:
In this case, the juvenile court recognized that the appellant had an interest in developing a relationship with the child, but that this relationship was not parental in nature. The court also expressed concerns regarding the Department of Human Services' lack of efforts in fostering a parental relationship between the appellant and the child. Ultimately, the court ruled that it was not in the best interest of the child to continue with the appellant in place of another individual as a parent. Despite acknowledging the lack of reasonable efforts by DHS, the court did not abuse its discretion in making this decision.

🗨️ Question (type 'exit' to quit): what are the actions taken

📜 Answer:
The actions taken in this case included disestablishing the appellant as the child's legal parent, establishing the child's biological father as the legal parent, and setting aside the voluntary acknowledgment 

In [21]:
!pip install unstructured


Collecting unstructured
  Downloading unstructured-0.17.2-py3-none-any.whl.metadata (24 kB)
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting emoji (from unstructured)
  Downloading emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2025.2.18-py3-none-any.whl.metadata (14 kB)
Collecting langdetect (from unstructured)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting rapidfuzz (from unstructured)
  Downloading rapidfuzz-3.13.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting backoff (from unstructured)
  Downloadi

In [23]:
!pip install "unstructured[docx]"


Collecting python-docx>=1.1.2 (from unstructured[docx])
  Downloading python_docx-1.1.2-py3-none-any.whl.metadata (2.0 kB)
Downloading python_docx-1.1.2-py3-none-any.whl (244 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.3/244.3 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-docx
Successfully installed python-docx-1.1.2


In [26]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m32.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.10.0


In [29]:
!pip install "unstructured[pdf]"


Collecting onnx>=1.17.0 (from unstructured[pdf])
  Downloading onnx-1.17.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting onnxruntime>=1.19.0 (from unstructured[pdf])
  Downloading onnxruntime-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting pdf2image (from unstructured[pdf])
  Downloading pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)
Collecting pdfminer.six (from unstructured[pdf])
  Downloading pdfminer_six-20250416-py3-none-any.whl.metadata (4.1 kB)
Collecting pikepdf (from unstructured[pdf])
  Downloading pikepdf-9.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.1 kB)
Collecting pi-heif (from unstructured[pdf])
  Downloading pi_heif-0.22.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Collecting google-cloud-vision (from unstructured[pdf])
  Downloading google_cloud_vision-3.10.1-py3-none-any.whl.metadata (9.5 kB)
Collecting effdet (