# 🧠 Legal Classification with Self-Consistency and Chain-of-Thought Reasoning

## 📘 Introduction

This project explores the application of advanced prompt engineering techniques—**Chain-of-Thought (CoT) Reasoning** and **Self-Consistency**—to classify legal scenarios into predefined violation types. The aim is to simulate how an AI Legal Assistant can infer the correct legal classification from textual complaints or case summaries, improving both transparency and reliability in decision-making.

## 🛠️ Methodology

### 🧩 Chain-of-Thought (CoT) Prompting

**Chain-of-Thought reasoning** enhances model outputs by explicitly guiding the language model through intermediate reasoning steps. Instead of directly asking for a classification, the prompt requests the model to:

- Analyze the text
- Identify key legal issues
- Justify the classification
- Predict the final label

This encourages more explainable and accurate decisions.

### 🔁 Self-Consistency

The **Self-Consistency** method involves generating multiple completions (e.g. 5) for the same prompt and selecting the most consistent output using majority voting. This helps mitigate randomness and hallucination from the model. It works as follows:

1. Generate multiple responses from the model for each prompt.
2. Collect the predicted labels.
3. Choose the most frequent prediction as the final answer.

Combining CoT and Self-Consistency results in both reasoned and reliable classifications.

## 🧪 Experimental Setup

- **Dataset**: [LegalLens NLI Subset] used to manually prepare 5 legal classification cases.
- **Model**: Gemini 1.5 Flash (via `generativelanguage` API)
- **Classes**:
  - breach of employment contract  
  - defamation (libel)  
  - patent infringement  
  - premises liability due to negligence  
  - fraudulent misrepresentation  

- **Evaluation Metrics**:
  - Accuracy
  - Precision, Recall, F1-score (per class)

## 📊 Results

### Final Predictions (CoT + Self-Consistency)

| Case ID | Expected                             | Predicted                            | Match |
|---------|--------------------------------------|--------------------------------------|-------|
| 001     | breach of employment contract        | breach of employment contract        | ✅    |
| 002     | defamation (libel)                   | defamation (libel)                   | ✅    |
| 003     | patent infringement                  | patent infringement                  | ✅    |
| 004     | premises liability due to negligence | premises liability due to negligence | ✅    |
| 005     | fraudulent misrepresentation         | fraudulent misrepresentation         | ✅    |

### 📈 Classification Report


## ✅ Conclusion

This experiment demonstrates that combining **Chain-of-Thought prompting** with **Self-Consistency** significantly improves the performance of LLMs on nuanced legal classification tasks. The model achieved **100% accuracy** on the test cases when both techniques were used together.

These findings suggest that prompt engineering strategies can be as effective as fine-tuning for certain classification tasks, especially in settings where transparency and justification are critical (e.g., legal and healthcare domains).

## 📌 References

- Google AI Gemini API Documentation: [https://ai.google.dev](https://ai.google.dev)
- Wei et al., *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models*, arXiv:2201.11903
- Wang et al., *Self-Consistency Improves Chain of Thought Reasoning in Language Models*, arXiv:2203.11171
