Skip to content

divyaddr/LegalNER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

LegalNER

Named Entity Recognition for Legal Texts

Here's a detailed README.md for your GitHub repository, incorporating technical aspects and structured formatting.


LegalNER: Named Entity Recognition for Legal Texts

📌 Overview

LegalNER is a Named Entity Recognition (NER) system designed for processing legal documents. This project leverages Natural Language Processing (NLP) techniques to extract key entities such as case names, statutes, dates, organizations, and other domain-specific terms. Built using Python and SpaCy, it provides a complete pipeline for data preprocessing, model training, and post-processing of extracted entities.


⚙️ Features

✔️ Preprocessing pipeline for legal text normalization
✔️ Custom NER model trained on annotated legal datasets
✔️ Integration with SpaCy for efficient entity extraction
✔️ Configurable training and hyperparameters
✔️ Post-processing utilities to refine extracted entities
✔️ Support for custom annotation formats


🏗 Project Structure

LegalNER-main/
│── data_preparation.py          # Data preprocessing and formatting
│── legal_ner.py                 # Main script for training & inference
│── postprocessing_utils.py      # Post-processing extracted entities
│── training/
│   ├── config.cfg               # Configuration file for model training
│   ├── Combined_Data/           # Preprocessed training data
│── pycache/                     # Cached Python files
│── README.md                    # Project documentation

🛠 Technical Details

1️⃣ Data Preprocessing

  • Tokenization and sentence segmentation
  • Cleaning and formatting legal text
  • Converting annotations into SpaCy training format

2️⃣ NER Model Training

  • Utilizes SpaCy’s transformer-based pipelines
  • Configurable hyperparameters in config.cfg
  • Supports transfer learning with pretrained embeddings

3️⃣ Post-processing

  • Rule-based corrections for extracted entities
  • Handling overlapping and misclassified entities

🚀 Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/your-username/LegalNER.git
cd LegalNER-main

2️⃣ Create a Virtual Environment

python -m venv venv
source venv/bin/activate   # On Windows use: venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

📊 Usage

Train the NER Model

python legal_ner.py --train

Run Inference on Legal Documents

python legal_ner.py --predict "path/to/legal_document.txt"

Evaluate Model Performance

python legal_ner.py --evaluate

📌 Future Improvements

🔹 Expand entity categories for more legal use cases
🔹 Improve post-processing with better disambiguation rules
🔹 Integrate deep learning models (e.g., BERT, RoBERTa)


💡 Contributing

Feel free to fork this repository, open issues, or submit pull requests. Contributions are welcome!


📜 License This project is licensed under the MIT License.

About

Named Entity Recognition for Legal Texts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors