<h3>LegalLoom</h3>
<h5> - Spinning Legal Knowledge Into Action</h5>
<p>
LegalLoom is an AI-powered Legal Document Analyzer that leverages Retrieval-Augmented Generation (RAG) combined with a custom-trained legal language model to help you search, interpret, and extract insights from large legal corpora such as contracts, case law, and statutes.
By intelligently retrieving relevant documents and generating clear, cite-backed responses, LegalLoom streamlines legal research and empowers legal professionals to make faster, more informed decisions.
</p>

<h4>File Input</h4>

In [8]:
import os
import pdfplumber
from pdf2image import convert_from_path
import pytesseract
from docx import Document
from ebooklib import epub
from ebooklib import ITEM_DOCUMENT
from bs4 import BeautifulSoup
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r"C:/Program Files/Tesseract-OCR/tesseract.exe"

def extract_text(file_path):
    ext = os.path.splitext(file_path)[1].lower()

    try:
        if ext == ".pdf":
            # Try pdfplumber first (text PDFs)
            with pdfplumber.open(file_path) as pdf:
                text = ""
                for page in pdf.pages:
                    page_text = page.extract_text()
                    if page_text:
                        text += page_text + "\n"
            if text.strip():
                return text
            # fallback to OCR for scanned PDFs
            pages = convert_from_path(file_path)
            text = ""
            for page in pages:
                text += pytesseract.image_to_string(page) + "\n"
            return text

        elif ext == ".doc" or ext == ".docx":
            doc = Document(file_path)
            return "\n".join(p.text for p in doc.paragraphs)

        elif ext == ".epub":
            book = epub.read_epub(file_path)
            text = ""
            for item in book.get_items_of_type(ITEM_DOCUMENT):
                soup = BeautifulSoup(item.get_content(), 'html.parser')
                text += soup.get_text() + "\n"
            return text

        elif ext in [".jpg", ".jpeg", ".png", ".bmp", ".tiff"]:
            image = Image.open(file_path)
            return pytesseract.image_to_string(image)

        else:
            return f"Unsupported file extension: {ext}"

    except Exception as e:
        return f"Error extracting text: {str(e)}"


In [20]:
extract_text("test_data/sample-1.epub")
extract_text("test_data/Land Lease Agreement.docx")
extract_text("test_data/Land Lease Agreement.pdf")
extract_text("test_data/land-lease-agreement-template.jpg")

'LAND LEASE AGREEMENT\n\n‘This Lease Agreement (hereinafter referred to as the "Agreement”) is made and effective\n(the "Effective Date),\n\nBY AND with an adress of HANBEORD\n\nBETWEEN: heteinafe refered toa the “Lessor”\n\nAND: {EENANTINAME, with an address o\nHereinafter refered to asthe “Lessee, collectively refered 0a\nthe “Pates™\n\nLEASED PREMISES\n\nLessor agrees to lease to Lessee, and Lessee agrees to lease fiom Lessor, for the term and\nupon the conditions set forth herein, the land described as follows:\n\n(Enter a detailed description, potentially with references to attached maps or surveys)\nTERM\n\n‘This Lease begins on and ends on. ‘The term is for\n‘years/inonths\n\nRENT\n\nLessee will pay Lessor dollars/euros/sterling monthly, due the fist day of\neach month.\n\nUSE OF LAND\n\nThe Lessee may only use the land for: Prohibited activities\ninclude:\n\n5. MAINTENANCE,\n\nLessee shall keep the land, including any improvements, in good condition. This includes, but\nis not 

<h4>Local LLM</h4>

In [None]:
import requests

def run_local_llm(prompt, model="mistral:7b-instruct-v0.3-q4_0"):
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }

    response = requests.post(url, json=payload)
    return response.json()['response']

# Example usage
print(run_local_llm("What is the capital of France?"))


 The capital of France is Paris. It is one of the world's leading cultural and intellectual centers, known for its influential contributions to art, science, literature, fashion, politics, and cuisine. Paris is also home to some of the most famous landmarks in the world, such as the Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, and Champs-Élysées. The city has a population of approximately 2.14 million people within its administrative limits, but it is also part of the Paris metropolitan area which has over 12 million inhabitants.


<h4>Knowledge Chunking