## Model 1- T5 Model fine-tuned over “unicamp-dl/ptt5-base-portuguese-vocab” t5 model.

## -----------------------------------------------------------------------------------

In [None]:
# 1st try

import json
import torch
from transformers import pipeline

# Determine if GPU is available
device = 0 if torch.cuda.is_available() else -1

# Load the summarization model from Hugging Face
summarizer = pipeline("summarization", model="stjiris/t5-portuguese-legal-summarization", device=device)

# Load JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process only the first entry for testing
first_entry = data[0] if len(data) > 0 else None

if first_entry:
    # Combine fields and truncate if needed
    input_text = f"{first_entry['procedure']} {first_entry['facts']} {first_entry['law']}"
    input_text = input_text[:1000]  # Truncate text to 1000 characters if necessary

    # Generate summary
    try:
        summary = summarizer(input_text, max_length=256, min_length=50, do_sample=False)
        first_entry['summary'] = summary[0]['summary_text']
        print("Summary for the first entry:", first_entry['summary'])
    except Exception as e:
        print(f"Error summarizing entry {first_entry['filename']}: {e}")
else:
    print("No data available in the JSON file.")



Summary for the first entry: 1. The case originated in an application (no. 23022/13) against Romania lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Romanian national, Mr D.M.D. (“the applicant”), on 22 March 2013. The Court acceded to the applicant’s request not to have his name disclosed (Rule 47 § 4 of the Rules of Court). 2. The applicant, who had been granted legal aid, was represented by Ms N.T. Popescu, a lawyer practising in Bucharest. The Romanian Government (“the Government”) were represented by their Agent, Ms C. Brumar, from the Ministry of Foreign Affairs. 3. The applicant alleged that the criminal investigations into his allegations of domestic abuse perpetrated by


In [None]:
# 2nd try

import json
import torch
from transformers import pipeline

# Determine if GPU is available
device = 0 if torch.cuda.is_available() else -1

# Load the summarization model
summarizer = pipeline("summarization", model="stjiris/t5-portuguese-legal-summarization", device=device)

# Load JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process only the first entry for testing
first_entry = data[0] if len(data) > 0 else None

if first_entry:
    # Combine and clean fields (removing extra characters and symbols)
    input_text = f"{first_entry['procedure']} {first_entry['facts']} {first_entry['law']}".replace("§", "").replace("\u00a0", " ")
    input_text = input_text[:1500]  # Truncate if necessary to 1500 characters

    # Generate summary with modified parameters
    try:
        summary = summarizer(input_text, max_length=600, min_length=200, do_sample=False)
        first_entry['summary'] = summary[0]['summary_text']
        print("Refined Summary for the first entry:", first_entry['summary'])
    except Exception as e:
        print(f"Error summarizing entry {first_entry['filename']}: {e}")
else:
    print("No data available in the JSON file.")


Your max_length is set to 600, but your input_length is only 535. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=267)


Refined Summary for the first entry: 1. The case originated in an application (no. 23022/13) against Romania lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Romanian national, Mr D.M.D. (“the applicant”), on 22 March 2013. The Court acceded to the applicant’s request not to have his name disclosed (Rule 47 4 of the Rules of Court). 2. The applicant, who had been granted legal aid, was represented by Ms N.T. Popescu, a lawyer practising in Bucharest. The Romanian Government (“the Government”) were represented by their Agent, Ms C. Brumar, from the Ministry of Foreign Affairs. 3. The applicant alleged that the criminal investigations into his allegations of domestic abuse perpetrated by his father had been ineffective and that the ensuing proceedings had been unfair. 4. On 25 March 2014 the application was communicated to the Government. I. THE CIRCUMSTANCES OF THE CASE 5. The applicant was born 

In [None]:
# 3rd try
import json
import torch
from transformers import pipeline

# Determine if GPU is available
device = 0 if torch.cuda.is_available() else -1

# Load the summarization model
summarizer = pipeline("summarization", model="stjiris/t5-portuguese-legal-summarization", device=device)

# Load JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process only the first entry for testing
first_entry = data[0] if len(data) > 0 else None

if first_entry:
    # Combine and clean fields (removing extra characters and symbols)
    input_text = f"{first_entry['procedure']} {first_entry['facts']} {first_entry['law']}".replace("§", "").replace("\u00a0", " ")
    input_text = input_text[:1500]  # Truncate if necessary to 1500 characters

    # Dynamically adjust max_length based on input length for optimal summarization
    input_length = len(input_text.split())
    max_summary_length = min(600, int(input_length * 0.5))  # Aim for about half the input length, up to 600

    # Generate summary with refined max_length
    try:
        summary = summarizer(input_text, max_length=max_summary_length, min_length=100, do_sample=False)
        first_entry['summary'] = summary[0]['summary_text']
        print("Refined Summary for the first entry:", first_entry['summary'])
    except Exception as e:
        print(f"Error summarizing entry {first_entry['filename']}: {e}")
else:
    print("No data available in the JSON file.")


Refined Summary for the first entry: 1. The case originated in an application (no. 23022/13) against Romania lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Convention”) by a Romanian national, Mr D.M.D. (“the applicant”), on 22 March 2013. The Court acceded to the applicant’s request not to have his name disclosed (Ru


In [None]:
import json
from transformers import pipeline

# Load the summarization model
summarizer = pipeline("summarization", model="stjiris/t5-portuguese-legal-summarization")

# Load your JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process the first entry only
first_entry = data[0]

# Define a function to summarize text, with reduced max_length
def summarize_text(text, max_length=200, min_length=100):
    summary = summarizer(
        text,
        max_length=max_length,
        min_length=min_length,
        do_sample=False,
        no_repeat_ngram_size=3
    )
    return summary[0]['summary_text']

# Summarize each section separately, splitting larger sections if needed
section_summaries = {}
for section in ['procedure', 'facts', 'law']:
    section_text = first_entry.get(section, '').replace("§", "").replace("\u00a0", " ")

    if section_text.strip():
        print(f"Summarizing section: {section}")

        # If section is too long, split it
        if len(section_text) > 1000:
            split_texts = [section_text[i:i+1000] for i in range(0, len(section_text), 1000)]
            summary_parts = [summarize_text(text_part) for text_part in split_texts]
            summary = " ".join(summary_parts)
        else:
            summary = summarize_text(section_text, max_length=200, min_length=100)

        # Store the summary
        section_summaries[section] = summary
    else:
        section_summaries[section] = ""

# Combine the summaries
final_summary = "\n\n".join(
    [f"Summary of {section.capitalize()}:\n{summary}"
     for section, summary in section_summaries.items() if summary]
)

# Add the final summary to the entry
first_entry['summary'] = final_summary

# Display the final summary
print("Final Summary for the first entry:\n")
print(first_entry['summary'])


Summarizing section: procedure
Summarizing section: facts


Your max_length is set to 200, but your input_length is only 193. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=96)


Summarizing section: law


Your max_length is set to 200, but your input_length is only 41. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=20)


Final Summary for the first entry:

Summary of Procedure:
1. The case originated in an application (no. 23022/13) against Romania lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms (“the Conventional”) by a Romanian national, Mr D.M.D. (“the applint”), on 22 March 2013. The Court acceded to the aptuant’s request not to have his name disclosed (Rule 47 4 of the Rules of Court). 2. The ap aplicant, who had been granted legal aid, was represented by Ms N.T. Popescu, a lawyer practising in Bucharest. The Romanian Government (“the Governmento”) were representeed by their Agent

Summary of Facts:
I. THE CIRCUMSTANCES OF THES CASE 5. The applicant was born in 2001 and lives in Bucharest. His parents, C.I. and D.D., separated in April 2004 and divorced in September 2004, mainly because of D.C.’s abusive behaviour towards his wife and their son. The Application remained with his mother. On 27 February 2004 C. I. called the hotlin

## Model-2:
PEGASUS for legal document summarization
legal-pegasus is a finetuned version of (google/pegasus-cnn_dailymail) for the legal domain, trained to perform abstractive summarization task. The maximum length of input sequence is 1024 tokens.

In [None]:
#2.1
import json
from transformers import pipeline

# Load the summarization model from Hugging Face
summarizer = pipeline("summarization", model="nsi319/legal-pegasus")

# Load your JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process the first entry only to check the output
first_entry = data[0]

# Combine the fields you want to summarize (procedure, facts, and law) and limit length to fit model capacity
input_text = f"{first_entry['procedure']} {first_entry['facts']} {first_entry['law']}"
input_text = input_text[:1000]  # Truncate input to fit model limits

# Generate summary with adjusted settings to reduce repetition
summary = summarizer(
    input_text,
    max_length=300,    # Reduced max_length to prevent lengthy summaries
    min_length=50,     # Ensures some substance in summary
    do_sample=False,
    no_repeat_ngram_size=3  # Prevent repeating trigrams
)

# Add the summary to the entry for review
first_entry['summary'] = summary[0]['summary_text']

# Display the summary for review
print("Refined Summary for the first entry:", first_entry['summary'])


Your max_length is set to 300, but your input_length is only 212. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=106)


Refined Summary for the first entry: The case originated in an application lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms. The Court acceded to the applicant's request not to have his name disclosed. The Romanian Government represented by their Agent, Ms C. Brumar, from the Ministry of Foreign Affairs. The applicant alleged that the criminal investigations into his allegations of domestic abuse perpetrated by his father had been ineffective and that the ensuing proceedings had been unfair.


In [None]:
#2.2
import json
from transformers import pipeline

# Load the summarization model
summarizer = pipeline("summarization", model="nsi319/legal-pegasus")

# Load your JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process the first entry only to check the output
first_entry = data[0]

# Define a function to summarize text with settings to reduce repetition
def summarize_text(text, max_length=300):
    summary = summarizer(
        text,
        max_length=max_length,
        min_length=50,
        do_sample=False,
        no_repeat_ngram_size=3
    )
    return summary[0]['summary_text']

# Initialize a list to hold summaries of each section
section_summaries = {}

# Summarize each section separately
for section in ['procedure', 'facts', 'law']:
    # Get the text for the section and clean it
    section_text = first_entry.get(section, '').replace("§", "").replace("\u00a0", " ")

    # Truncate the section text to fit model limits
    section_text = section_text[:1000]  # Adjust as needed

    # Check if the section has content
    if section_text.strip():
        print(f"Summarizing section: {section}")
        # Summarize the section
        summary = summarize_text(section_text, max_length=250)
        # Store the summary
        section_summaries[section] = summary
    else:
        section_summaries[section] = ""

# Combine the summaries
final_summary = "\n\n".join(
    [f"Summary of {section.capitalize()}:\n{summary}"
     for section, summary in section_summaries.items() if summary]
)

# Add the final summary to the entry
first_entry['summary'] = final_summary

# Display the final summary
print("Final Summary for the first entry:\n")
print(first_entry['summary'])


Your max_length is set to 250, but your input_length is only 187. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=93)


Summarizing section: procedure


Your max_length is set to 250, but your input_length is only 220. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=110)


Summarizing section: facts


Your max_length is set to 250, but your input_length is only 198. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=99)


Summarizing section: law
Final Summary for the first entry:

Summary of Procedure:
The case originated in an application lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms. The Court acceded to the applicant's request not to have his name disclosed. The applicant alleged that the criminal investigations into his allegations of domestic abuse perpetrated by his father had been ineffective and that the ensuing proceedings had been unfair.

Summary of Facts:
The applicant was born in 2001 and lives in Bucharest. His parents separated in April 2004 and divorced in September 2004 because of D.D.'s abusive behaviour towards his wife and their son. The applicant remained with his mother. Since then, the case has been monitored by the Bucharest Child Protection Authority. On 7 October 2008, the Authority certified that since 2004 it had included the applicant in a psychological counselling programme.

Summary of Law:
The Court o

In [None]:
#2.3

import json
from transformers import pipeline

# Load the summarization model
summarizer = pipeline("summarization", model="nsi319/legal-pegasus")

# Load JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Define a function to summarize text with section-based tuning
def summarize_text(text, max_length=300, min_length=50):
    summary = summarizer(
        text,
        max_length=max_length,
        min_length=min_length,
        do_sample=False,
        no_repeat_ngram_size=3
    )
    return summary[0]['summary_text']

# Initialize a dictionary to hold summaries of each section
section_summaries = {}

# Adjust the max_length dynamically based on section
section_length_settings = {
    'procedure': 150,   # Shorter for 'procedure' section
    'facts': 300,       # Increased for 'facts' section
    'law': 300          # Increased for 'law' section
}

# Summarize each section separately with tailored max_length
for section in ['procedure', 'facts', 'law']:
    # Get and clean the section text
    section_text = first_entry.get(section, '').replace("§", "").replace("\u00a0", " ")

    # Truncate the section text based on length
    section_text = section_text[:1500] if section in ['facts', 'law'] else section_text[:1000]

    # Check if section has content
    if section_text.strip():
        print(f"Summarizing section: {section}")
        # Summarize with dynamic max_length
        summary = summarize_text(section_text, max_length=section_length_settings[section])
        # Store the summary
        section_summaries[section] = summary
    else:
        section_summaries[section] = ""

# Combine the summaries for a final output
final_summary = "\n\n".join(
    [f"Summary of {section.capitalize()}:\n{summary}"
     for section, summary in section_summaries.items() if summary]
)

# Add the final summary to the entry
first_entry['summary'] = final_summary

# Display the final summary
print("Final Summary for the first entry:\n")
print(first_entry['summary'])


Summarizing section: procedure
Summarizing section: facts


Your max_length is set to 300, but your input_length is only 291. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=145)


Summarizing section: law
Final Summary for the first entry:

Summary of Procedure:
The case originated in an application lodged with the Court under Article 34 of the Convention for the Protection of Human Rights and Fundamental Freedoms. The Court acceded to the applicant's request not to have his name disclosed. The applicant alleged that the criminal investigations into his allegations of domestic abuse perpetrated by his father had been ineffective and that the ensuing proceedings had been unfair.

Summary of Facts:
The applicant was born in 2001 and lives in Bucharest. His parents separated in April 2004 and divorced in September 2004 because of D.D.'s abusive behaviour towards his wife and their son. The applicant remained with his mother. On 27 February 2004 C.I. called the hotline of the Bucharest Child Protection Authority to report the domestic abuse she and the applicant had been suffering at the hands of the applicant's father. Since then, the case has been monitored by the

##Model 3:
LegalBERT_BART_fixed_V1 This model is a fine-tuned version of BART. The research involves a multi-step summarization approach to long, legal documents.

In [None]:
# 3.1
import json
from transformers import pipeline

# Load the summarization model
summarizer = pipeline("summarization", model="MikaSie/LegalBERT_BART_fixed_V1")

# Load your JSON data
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

# Process the first entry only to check the output
first_entry = data[0]

# Define a function to summarize text with settings to reduce repetition
def summarize_text(text, max_length=300):
    summary = summarizer(
        text,
        max_length=max_length,
        min_length=50,
        do_sample=False,
        no_repeat_ngram_size=3
    )
    return summary[0]['summary_text']

# Initialize a list to hold summaries of each section
section_summaries = {}

# Summarize each section separately
for section in ['procedure', 'facts', 'law']:
    # Get the text for the section and clean it
    section_text = first_entry.get(section, '').replace("§", "").replace("\u00a0", " ")

    # Truncate the section text to fit model limits
    section_text = section_text[:1000]  # Adjust as needed

    # Check if the section has content
    if section_text.strip():
        print(f"Summarizing section: {section}")
        # Summarize the section
        summary = summarize_text(section_text, max_length=250)
        # Store the summary
        section_summaries[section] = summary
    else:
        section_summaries[section] = ""

# Combine the summaries
final_summary = "\n\n".join(
    [f"Summary of {section.capitalize()}:\n{summary}"
     for section, summary in section_summaries.items() if summary]
)

# Add the final summary to the entry
first_entry['summary'] = final_summary

# Display the final summary
print("Final Summary for the first entry:\n")
print(first_entry['summary'])


Your max_length is set to 250, but your input_length is only 218. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=109)


Summarizing section: procedure


Your max_length is set to 250, but your input_length is only 243. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=121)


Summarizing section: facts


Your max_length is set to 250, but your input_length is only 230. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=115)


Summarizing section: law
Final Summary for the first entry:

Summary of Procedure:
Protection of human rights and fundamental freedoms in the context of the Convention for the Protection of Human Rights and Fundamental Freedoms (CPHRF) against Romania (OJ L 183, 25.3.2014, pp. 1-8)
Protection in the framework of the CPHRF against Romania
SUMMARY OF:
Decision (EU) 2015/852/EU — application for protection under Article 34 of the convention for the protection of human Rights and fundamental Freedoms against Romania lodged with the European Court of Justice (ECJ)
WHAT IS THE AIM OF THE DECISION?
The decision seeks to protect the rights of a European Union (EU, EU) citizen who was the subject of a criminal investigation in Romania for alleged domestic abuse perpetrated by his father.
KEY POINTS
The case originated in an application (no. 23022/13) by the European Union lodged against Romania for protection against the EU country’s infringement of its obligations under the Convention on the p

In [1]:
import json
from transformers import pipeline
from rouge_score import rouge_scorer
import torch

# Use just the first entry to reduce memory usage
with open('extracted_legal_sections.json', 'r', encoding='utf-8') as f:
    data = json.load(f)
first_entry = data[0]

# Initialize models (switch the model names here for different tests)
models = {
    "LegalBERT_BART": pipeline("summarization", model="MikaSie/LegalBERT_BART_fixed_V1", device=0 if torch.cuda.is_available() else -1),
    "LegalPegasus": pipeline("summarization", model="nsi319/legal-pegasus", device=0 if torch.cuda.is_available() else -1),
    "T5_Portuguese": pipeline("summarization", model="stjiris/t5-portuguese-legal-summarization", device=0 if torch.cuda.is_available() else -1)
}

# Truncate text if necessary to fit model limits (adjust this as per model needs)
def summarize_text(text, summarizer, max_length=300):
    summary = summarizer(
        text[:1000],  # Truncate to 1000 characters if needed
        max_length=max_length,
        min_length=50,
        do_sample=False,
        no_repeat_ngram_size=3
    )
    return summary[0]['summary_text']

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Store and compare summaries
section_summaries = {model_name: {} for model_name in models}

# Summarize each section and score
for model_name, summarizer in models.items():
    for section in ['procedure', 'facts', 'law']:
        text = first_entry.get(section, '').replace("§", "").replace("\u00a0", " ")
        if text.strip():
            print(f"Summarizing '{section}' section with model: {model_name}")
            summary = summarize_text(text, summarizer, max_length=250)
            section_summaries[model_name][section] = summary

# Display the summaries for review
for model_name, summaries in section_summaries.items():
    print(f"\nModel: {model_name}")
    for section, summary in summaries.items():
        print(f"Summary of {section.capitalize()}:\n{summary}\n")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.38k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/756k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Your max_length is set to 250, but your input_length is only 218. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=109)


Summarizing 'procedure' section with model: LegalBERT_BART


Your max_length is set to 250, but your input_length is only 243. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=121)


Summarizing 'facts' section with model: LegalBERT_BART


Your max_length is set to 250, but your input_length is only 230. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=115)


Summarizing 'law' section with model: LegalBERT_BART


Your max_length is set to 250, but your input_length is only 187. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=93)


Summarizing 'procedure' section with model: LegalPegasus


Your max_length is set to 250, but your input_length is only 220. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=110)


Summarizing 'facts' section with model: LegalPegasus


Your max_length is set to 250, but your input_length is only 198. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=99)


Summarizing 'law' section with model: LegalPegasus
Summarizing 'procedure' section with model: T5_Portuguese
Summarizing 'facts' section with model: T5_Portuguese
Summarizing 'law' section with model: T5_Portuguese

Model: LegalBERT_BART
Summary of Procedure:
Protection of human rights and fundamental freedoms in the context of the Convention for the Protection of Human Rights and Fundamental Freedoms (CPHRF) against Romania (OJ L 183, 25.3.2014, pp. 1-8)
Protection in the framework of the CPHRF against Romania
SUMMARY OF:
Decision (EU) 2015/852/EU — application for protection under Article 34 of the convention for the protection of human Rights and fundamental Freedoms against Romania lodged with the European Court of Justice (ECJ)
WHAT IS THE AIM OF THE DECISION?
The decision seeks to protect the rights of a European Union (EU, EU) citizen who was the subject of a criminal investigation in Romania for alleged domestic abuse perpetrated by his father.
KEY POINTS
The case originated in

In [None]:
## rouge score
## we dont have actual legal summaries for ths data so we took reference summary that is generated by chatGPT 4o model.



In [2]:
from rouge_score import rouge_scorer

# Initialize ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Define reference summaries for evaluation (replace with correct references for your data)
reference_summaries = {
    'procedure': "The applicant, a Romanian national, lodged an application with the European Court of Human Rights on March 22, 2013, alleging ineffective investigations and unfair proceedings concerning domestic abuse inflicted by his father. Legal aid was granted, and the Romanian government was represented by Ms. C. Brumar. The application was officially communicated to the government on March 25, 2014.",
    'facts': "The applicant, born in 2001 and residing in Bucharest, experienced domestic abuse from his father, D.D., which led to his parents' separation and eventual divorce in 2004. His mother reported the abuse to the Bucharest Child Protection Authority, leading to ongoing monitoring and psychological counseling for the applicant. The police and prosecutors investigated and gathered evidence, resulting in D.D.'s indictment for abusive behavior. After a lengthy legal process, D.D. was initially acquitted, but subsequent appeals and hearings led to a conviction for ill-treatment, a suspended sentence, and a damages award for the applicant.",
    'law': "The applicant claimed that authorities failed to promptly and effectively investigate the domestic abuse allegations, a breach of Article 3 of the European Convention on Human Rights. The authorities argued that the investigation’s complexity justified its duration, while the applicant contended that delays and lack of action compromised the investigation’s effectiveness and fairness. The court found significant procedural flaws, ultimately concluding that the investigation into the applicant's allegations was ineffective, leading to a violation of the procedural guarantees under Article 3."
}

# Function to calculate ROUGE scores
def calculate_rouge_scores(generated, reference):
    scores = scorer.score(reference, generated)
    return {
        "ROUGE-1": scores['rouge1'].fmeasure,
        "ROUGE-2": scores['rouge2'].fmeasure,
        "ROUGE-L": scores['rougeL'].fmeasure
    }

# Compare model summaries with reference summaries
results = {}
for model_name, summaries in section_summaries.items():
    print(f"\nROUGE scores for {model_name}:")
    model_scores = {}
    for section, generated_summary in summaries.items():
        if section in reference_summaries:
            scores = calculate_rouge_scores(generated_summary, reference_summaries[section])
            model_scores[section] = scores
            print(f"{section.capitalize()} - ROUGE-1: {scores['ROUGE-1']:.4f}, ROUGE-2: {scores['ROUGE-2']:.4f}, ROUGE-L: {scores['ROUGE-L']:.4f}")
    results[model_name] = model_scores



ROUGE scores for LegalBERT_BART:
Procedure - ROUGE-1: 0.2656, ROUGE-2: 0.1004, ROUGE-L: 0.1743
Facts - ROUGE-1: 0.3443, ROUGE-2: 0.1107, ROUGE-L: 0.1758
Law - ROUGE-1: 0.2394, ROUGE-2: 0.0851, ROUGE-L: 0.1549

ROUGE scores for LegalPegasus:
Procedure - ROUGE-1: 0.4480, ROUGE-2: 0.1626, ROUGE-L: 0.2880
Facts - ROUGE-1: 0.4678, ROUGE-2: 0.2130, ROUGE-L: 0.3158
Law - ROUGE-1: 0.4767, ROUGE-2: 0.1885, ROUGE-L: 0.2798

ROUGE scores for T5_Portuguese:
Procedure - ROUGE-1: 0.4262, ROUGE-2: 0.1878, ROUGE-L: 0.2732
Facts - ROUGE-1: 0.4055, ROUGE-2: 0.1395, ROUGE-L: 0.2765
Law - ROUGE-1: 0.4271, ROUGE-2: 0.1474, ROUGE-L: 0.2500



Conclusion:

LegalPegasus outperformed the other models across all sections, particularly with higher ROUGE-1, ROUGE-2, and ROUGE-L scores, indicating better overlap with the reference summary.
T5_Portuguese also performed well, especially in the Procedure and Facts sections, showing strong ROUGE-1 scores but slightly lower ROUGE-L scores compared to LegalPegasus.
LegalBERT_BART showed the lowest scores across the board, suggesting that its generated summaries had less content overlap with the reference summary.

LegalPegasus provides the best alignment with the reference summary based on ROUGE metrics, particularly excelling in capturing both individual words and phrases (ROUGE-1 and ROUGE-2) as well as sequence structure (ROUGE-L).