# ⚖️ LawGPT-KSA | Falcon Noor LLM on Google Colab
Arabic Legal Assistant powered by Falcon-7B-Instruct (Noor), Sentence Transformers, FAISS, and PyMuPDF.
**Note**: Upload your own Arabic legal PDFs when prompted.

In [1]:
# 📦 Install dependencies
!pip install transformers sentence-transformers faiss-cpu PyMuPDF torch

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Collecting PyMuPDF
  Downloading pymupdf-1.26.3-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
C

In [2]:
# 📁 Upload your PDF files
from google.colab import files
uploaded = files.upload()

Saving Criminal_procedure_law (Watheqa2_firstVersion).pdf to Criminal_procedure_law (Watheqa2_firstVersion) (1).pdf


In [3]:
# 📚 Load PDFs and chunk text
import fitz  # PyMuPDF

def load_pdfs():
    docs = []
    for file in uploaded.keys():
        with fitz.open(file) as doc:
            text = ""
            for page in doc:
                text += page.get_text()
            docs.append(text)
    return docs

def split_chunks(text, chunk_size=500):
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

documents = load_pdfs()
chunks = []
for doc in documents:
    chunks += split_chunks(doc)

In [4]:
# 🔢 Embedding chunks and building FAISS index
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

embed_model = SentenceTransformer("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
embeddings = embed_model.encode(chunks, show_progress_bar=True)

dim = embeddings.shape[1]
index = faiss.IndexFlatL2(dim)
index.add(np.array(embeddings).astype(np.float32))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/645 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/471M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

In [5]:
# 🧠 Load Falcon-7B-Instruct Noor
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map="auto"
)

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_falcon.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.



modeling_falcon.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modeling_falcon.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]



In [7]:
# ❓ Ask legal question
query = input("اكتب سؤالك القانوني بالعربية: ")
q_embedding = embed_model.encode([query])
D, I = index.search(np.array(q_embedding).astype(np.float32), k=3)
top_chunks = [chunks[i] for i in I[0]]
context = "\n\n".join(top_chunks)

prompt = f"السياق القانوني:\n{context}\n\nالسؤال: {query}\n\nالإجابة:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=250, do_sample=True, temperature=0.7)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n\n🤖 الإجابة:")
print(answer)

اكتب سؤالك القانوني بالعربية: هل يجوز تفتيش الهاتف المحمول للمتهم بدون أمر قضائي؟


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




🤖 الإجابة:
السياق القانوني:
 يجرى التحقيق فيها فيُحَ ضَّر إلى دائرة التحقيق في الجهة
 التي قبض عليه فيها، وعلى هذه الدائرة أن تتحقق من جميع البيانات الخاصة بشخصه، وتبلغه
 بالواقعة املنسوبة إليه، وتدون أقواله في شأنها، وإذا اقتضت ُالحال نقله فَيُبَلغ بالجهة التي سَ يُنْقَ ل
إليها. 
 
                                         
     
 
      
         املادة الحادية عشرة بعد املائة 
         
    
      
             
           
       
  
     
 رل
٨٧ 
إذا 
 اعترض املتهم على نقله، أو كانت حالته الصحية ال تسـمح بالنقل، يبلغ ا

ها على املحقق. 
 
 
 
 
 
 
 
 
 
70
 
 
املادة السادسة والعشرون بعد املائة 
 
 
 
 رل
٨٨ 
 إذا رأت النيابة العامة بعد انتهاء التحقيق أن األدلة كافية ضد املتهم، فترفع الدعوى إلى املحكمة
.املختصة، وتكلف املتهم بالحضور أمامها وترفع الدعوى وفق الئحة تشمل البيانات اآلتية:  
1-
 تعيين املتهم ببيان اسمه
ولقبه وجنسيته وسنه ومكان إقامته ومهنته أو وظيفته ورقم هويته وأهليته. 
2- 
 تعيين مدعي الحق الخاص- 
 إن وجد- ببيان االسم والصفة والعنوان ورقم الهوية.  
3-
 بيان الجريمة ا