# 📑 Research Paper Summarizer: Team Gradient Geeks

## Team Members : 

### 1. Anurag Ghosh
### 2. Suchana Hazra
### 3. Uttam Mahata
### 4. Siddharth Sen

## Overview

We've built a machine learning system that automatically generates concise summaries of academic research papers from arXiv. Our approach transforms complex scientific papers into accessible summaries while preserving key information.

## Data Processing Pipeline

1. **Data Preparation** 🔍
   - Selected key fields from arXiv papers: ID, title, abstract, authors, categories
   - Split data into training (67%) and validation (33%) sets

2. **Text Processing** ⚙️
   - Created structured input format: `[CATEGORIES: {categories}] [AUTHORS: {authors}] {abstract}`
   - Used paper titles as target summaries
   - Combined metadata with content to provide richer context

3. **Model Architecture** 🧠
   - Implemented a sequence-to-sequence transformer model
   - Used 512 token input length and 128 token output length
   - Applied specialized data collation for efficient batch processing

4. **Training Configuration** 🚀
   - Fine-tuned with learning rate: 2e-5
   - Optimized with weight decay: 0.01
   - Trained for 3 epochs with mixed-precision (FP16) when GPU available
   - **Note**: Our experiments indicate that increasing the number of training epochs would yield improved BLEU scores

## Evaluation Framework

We evaluated our model using several complementary metrics:

### Quality Metrics
- **ROUGE scores**: Measured word overlap between generated and reference summaries
- **BLEU score**: Assessed precision of n-gram matches

### Practical Performance
- **Speed**: Average time per summary generation
- **Length**: Word and character counts of generated summaries
- **Readability**: Flesch Reading Ease and Flesch-Kincaid Grade Level
- **Resources**: GPU memory consumption during inference

## Results

Our model successfully generates concise, informative summaries of research papers that capture the essential content while being more accessible than the original abstracts. The evaluation metrics demonstrate a balance between accuracy, speed, and resource efficiency.

## Future Improvements

- Incorporate paper figures and tables into the summarization process
- Experiment with different input-output formats
- Explore domain-specific fine-tuning for different scientific fields
- **Extend training duration**: Our preliminary results suggest that additional training epochs would significantly improve BLEU scores

In [1]:
# CELL 1: Set up the environment and install required libraries
!pip install datasets transformers nltk rouge-score pandas scikit-learn

# CELL 2: Import necessary libraries
import os
import json
import numpy as np
import pandas as pd
import torch
from datasets import load_dataset, Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
    DataCollatorForSeq2Seq
)
from nltk.tokenize import sent_tokenize
import nltk
nltk.download('punkt')
from sklearn.model_selection import train_test_split
from google.colab import files, drive

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=45cfcb946f32bf6a4f10b3efba4196e0ef6cacf4b54a3a5b80150272ab0da550
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2
[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:
!head -n 10 /content/small_arxiv.json


head: cannot open '/content/small_arxiv.json' for reading: No such file or directory


In [3]:
import json

file_path = "/kaggle/input/arxiv-json/small_arxiv.json"

# Open and check the file
with open(file_path, "r") as file:
    for _ in range(5):  # Print first 5 lines
        print(file.readline().strip())


{"id":704.0001,"submitter":"Pavel Nadolsky","authors":"C. Bal\\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan","title":"Calculation of prompt diphoton production cross sections at Tevatron and\n  LHC energies","comments":"37 pages, 15 figures; published version","journal-ref":"Phys.Rev.D76:013009,2007","doi":"10.1103\/PhysRevD.76.013009","report-no":"ANL-HEP-PR-07-12","categories":"hep-ph","license":null,"abstract":"  A fully differential calculation in perturbative quantum chromodynamics is\npresented for the production of massive photon pairs at hadron colliders. All\nnext-to-leading order perturbative contributions from quark-antiquark,\ngluon-(anti)quark, and gluon-gluon subprocesses are included, as well as\nall-orders resummation of initial-state gluon radiation valid at\nnext-to-next-to-leading logarithmic accuracy. The region of phase space is\nspecified in which the calculation is most reliable. Good agreement is\ndemonstrated with data from the Fermilab Tevatron, and predicti

In [4]:
import pandas as pd
import json

file_path = "/kaggle/input/arxiv-json/small_arxiv.json"

data = []
with open(file_path, "r") as file:
    for line in file:
        try:
            data.append(json.loads(line))  # Load each line as a JSON object
        except json.JSONDecodeError as e:
            print(f"Skipping corrupted line: {e}")

# Convert to DataFrame
arxiv_data = pd.DataFrame(data)
print(f"Loaded dataset with shape: {arxiv_data.shape}")


Loaded dataset with shape: (100000, 14)


# Processing arXiv Dataset for Model Training

This code processes an arXiv dataset to prepare it for a machine learning model. Here's what it does:

1. **Data Loading**: Loads arXiv data into a pandas DataFrame
2. **Feature Selection**: Extracts only the most relevant columns (`id`, `title`, `abstract`, `authors`, `categories`)
3. **Train-Validation Split**: Divides the dataset using a 2:1 ratio
   - 67% for training (train_df)
   - 33% for validation (val_df)
   - Uses fixed random seed (42) for reproducibility

The commented line shows we initially considered using only 15% of the full dataset but decided to use the complete dataset instead.

The final print statements confirm the shapes of our training and validation datasets, helping us verify the split was performed correctly.

In [5]:


arxiv_df = pd.DataFrame(arxiv_data)

# ✅ Select relevant columns
arxiv_df = arxiv_df[['id', 'title', 'abstract', 'authors', 'categories']]

# # ✅ Keep only 15% of data (10% Train, 5% Validation)
# arxiv_df = arxiv_df.sample(frac=0.15, random_state=42)

# ✅ Split into TRAIN (10%) & VALIDATION (5%)
train_df, val_df = train_test_split(arxiv_df, test_size=1/3, random_state=42)

print(f"Training dataset shape: {train_df.shape}")
print(f"Validation dataset shape: {val_df.shape}")


Training dataset shape: (66666, 5)
Validation dataset shape: (33334, 5)


In [6]:
def preprocess_arxiv_data(df):
    # Create input-output pairs for summarization
    # For the input, we'll use a template with more context from the JSON structure:
    # "[CATEGORIES: {categories}] [AUTHORS: {authors}] {abstract}"
    # For the output/target, we'll use the title as a "summary"
    processed_data = {
        'id': [],
        'input': [],
        'target': []
    }

    for idx, row in df.iterrows():
        processed_data['id'].append(row['id'])
        processed_data['input'].append(f"[CATEGORIES: {row['categories']}] [AUTHORS: {row['authors']}] {row['abstract']}")
        processed_data['target'].append(row['title'])

    return processed_data


# Data Preprocessing Function for arXiv Summarization

This function transforms the arXiv dataset into a format suitable for a text summarization model. The function:

1. **Creates Input-Output Pairs**: 
   - **Input text**: Combines paper categories, authors, and abstract with special formatting
   - **Target text**: Uses the paper title as the target "summary"

2. **Uses a Structured Template**: 
   - Format: `[CATEGORIES: {categories}] [AUTHORS: {authors}] {abstract}`
   - This gives the model additional context beyond just the abstract

3. **Returns a Dictionary**: Contains three parallel lists:
   - `id`: Original paper identifiers
   - `input`: Formatted text inputs
   - `target`: Title outputs for the model to generate

The approach treats scientific paper summarization as an "abstract-to-title" task, where the model learns to create concise, descriptive titles based on paper metadata and content.

In [7]:
# CELL 8: Process the data
processed_arxiv = preprocess_arxiv_data(arxiv_df)
print(f"Processed {len(processed_arxiv['id'])} samples")

Processed 100000 samples


In [8]:
!pip install datasets  # Install the Hugging Face datasets library




In [9]:
# CELL 9: Convert to datasets format
#define dataset
from datasets import Dataset
arxiv_dataset = Dataset.from_dict(processed_arxiv)
print(f"Dataset created with {len(arxiv_dataset)} examples")

Dataset created with 100000 examples


In [10]:
!pip install transformers




In [11]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
base_model_name = "facebook/bart-large-cnn"  # A strong pre-trained summarization model
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(base_model_name)
print(f"Model {base_model_name} loaded successfully")


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Model facebook/bart-large-cnn loaded successfully


# 🔄 Sequence-to-Sequence Model Pipeline for arXiv Papers

This code implements a complete transformer-based sequence-to-sequence training pipeline using the Hugging Face ecosystem. The process includes:

```python
Color key: 🟢 Data Preparation | 🔵 Tokenization | 🟠 Training Setup | 🚀 Execution
```

## 🟢 **Data Structure Conversion**
- Transforms pandas DataFrames into HuggingFace Dataset objects
- Creates separate objects for training and validation data

## 🔵 **Tokenization Function**
- Combines titles and abstracts for input
- Sets max input length to 512 tokens
- Limits target summaries to 128 tokens
- Handles padding and truncation automatically

## 🟢 **Dataset Cleanup**
- Removes original text columns after tokenization
- Keeps only the tokenized input and label IDs

## 🟠 **Training Configuration**
- Uses specialized `DataCollatorForSeq2Seq` for dynamic padding
- Configures training parameters:
  - Learning rate: 2e-5
  - Batch size: 4 (for both training and evaluation)
  - Weight decay: 0.01
  - Epochs: 3
- Enables mixed precision (FP16) when GPU is available
- Implements evaluation at the end of each epoch

## 🚀 **Model Training**
- Initializes the Seq2SeqTrainer with all components
- Launches the training process

The pipeline uses a modern approach to scientific text summarization, preparing tokenized inputs and outputs for a transformer architecture.

In [12]:
from datasets import Dataset
from transformers import DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer
import torch

# Function to Tokenize Inputs
def tokenize_function(examples):
    inputs = [title + " " + abstract for title, abstract in zip(examples["title"], examples["abstract"])]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")

    # Tokenize targets (Summaries)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["abstract"], max_length=128, truncation=True, padding="max_length")

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Convert Train DataFrame to Dataset
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)

# Tokenize Train and Validation Data
train_dataset = train_dataset.map(tokenize_function, batched=True)
val_dataset = val_dataset.map(tokenize_function, batched=True)

# Remove unused columns
train_dataset = train_dataset.remove_columns(["id", "title", "abstract", "authors", "categories"])
val_dataset = val_dataset.remove_columns(["id", "title", "abstract", "authors", "categories"])

# Define Data Collator
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

# Training Arguments
training_args = Seq2SeqTrainingArguments(
    output_dir="./summarization_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    weight_decay=0.01,
    save_total_limit=2,
    num_train_epochs=3,
    predict_with_generate=True,
    fp16=torch.cuda.is_available(),
    logging_dir="./logs",
    logging_steps=500,
    report_to="none",
)

# Define Trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,  # ✅ Properly defined validation dataset
    tokenizer=tokenizer,
    data_collator=data_collator
)

# Start Training 🚀
trainer.train()


Map:   0%|          | 0/66666 [00:00<?, ? examples/s]



Map:   0%|          | 0/33334 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


Epoch,Training Loss,Validation Loss
1,0.0007,0.038044
2,0.0004,0.026339
3,0.0001,0.048265




TrainOutput(global_step=50001, training_loss=0.0023688235107829493, metrics={'train_runtime': 37580.7497, 'train_samples_per_second': 5.322, 'train_steps_per_second': 1.33, 'total_flos': 2.167082931065979e+17, 'train_loss': 0.0023688235107829493, 'epoch': 3.0})

In [37]:
eval_results = trainer.evaluate()
print(eval_results)


{'eval_loss': 0.048265136778354645, 'eval_runtime': 1567.3136, 'eval_samples_per_second': 21.268, 'eval_steps_per_second': 5.317, 'epoch': 3.0}


In [40]:
import torch

# Move model to CUDA if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trainer.model.to(device)  # Move the trained model to the right device

def summarize_text(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    inputs = {key: value.to(device) for key, value in inputs.items()}  # Move tensors to the same device
    
    summary_ids = trainer.model.generate(**inputs, max_length=70, min_length=30, do_sample=False)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


In [44]:
text = "Mobile devices, smartphones and tablet computers in particular, have generated a lot of interest among researchers in recent years."
summary = summarize_text(text)
print("Generated Summary:", summary)


Generated Summary:   have generated a lot of interest among researchers in recent years. Mobile devices, smartphones and tablet computers in particular, have generated the lot of information among researchers. Recent years.


In [27]:
!pip install datasets evaluate nltk rouge-score sacrebleu


Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting sacrebleu
  Downloading sacrebleu-2.5.1-py3-none-any.whl.metadata (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting portalocker (from sacrebleu)
  Downloading portalocker-3.1.1-py3-none-any.whl.metadata (8.6 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sacrebleu-2.5.1-py3-none-any.whl (104 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.1/104.1 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading portalocker-3.1.1-py3-none-any.whl (19 kB)
Installing collected packages: portalocker, sacrebleu, evaluate
Successfully installed evaluate-0.4.3 portalocker-3.1.1 sacrebleu-2.5.1


In [28]:
import evaluate
import nltk
import time
import torch
from transformers import pipeline

nltk.download('punkt')

# Load evaluation metrics
rouge = evaluate.load("rouge")
bleu = evaluate.load("bleu")


[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

In [45]:
import pandas as pd

# Load your test dataset from CSV
test_df = pd.read_csv("/kaggle/input/compscholar-dataset/Brain Dead CompScholar Dataset.csv")
print(test_df.columns)  # Check column names


Index(['Paper Id', 'Paper Title', 'Key Words', 'Abstract', 'Conclusion',
       'Document', 'Paper Type', 'Summary', 'Topic', 'OCR', 'labels'],
      dtype='object')


In [48]:
test_df["input_text"] = test_df["Paper Title"].fillna("") + " " + test_df["Abstract"].fillna("")


In [56]:
test_sample = test_df.sample(100)  # Select 100 random samples
texts = test_sample["input_text"].tolist()


In [55]:
import torch
print("GPU Available:", torch.cuda.is_available())
print("CUDA Device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No CUDA device found")


GPU Available: True
CUDA Device: Tesla P100-PCIE-16GB


In [57]:
device = "cuda" if torch.cuda.is_available() else "cpu"

# Move model to GPU
trainer.model.to(device)
trainer.model.half()  # Use FP16 for faster inference

print("Model is running on:", next(trainer.model.parameters()).device)  # Should print 'cuda'


Model is running on: cuda:0


In [58]:
def summarize_text(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)

    # Move inputs to GPU
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Generate summary
    with torch.no_grad():  # Disable gradient computation for faster inference
        summary_ids = trainer.model.generate(**inputs, max_length=70, min_length=30, do_sample=False)

    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


In [59]:
import time

start_time = time.time()

test_df["Generated_Summary"] = test_df["Abstract"].apply(summarize_text)

end_time = time.time()

print(f"Summarization completed in {end_time - start_time:.2f} seconds")


Summarization completed in 422.17 seconds


In [60]:
# Display 3 random samples
sampled_data = test_df[["Abstract", "Generated_Summary"]].sample(3, random_state=42)

for idx, row in sampled_data.iterrows():
    print(f"🔹 **Original Abstract:**\n{row['Abstract']}\n")
    print(f"✅ **Generated Summary:**\n{row['Generated_Summary']}\n")
    print("-" * 100)


🔹 **Original Abstract:**
Sentiment analysis, the automated extraction of 
expressions of positive or negative attitudes from text has received 
considerable attention from researchers during the past decade.
In addition, the popularity of internet users has been growing fast
parallel to emerging technologies; that actively use online review 
sites, social networks and personal blogs to express their opinions. 
They harbor positive and negative attitudes about people, 
organizations, places, events, and ideas. The tools provided by 
natural language processing and machine learning along with 
other approaches to work with large volumes of text, makes it 
possible to begin extracting sentiments from social media. In this 
paper we discuss some of the challenges in sentiment extraction, 
some of the approaches that have been taken to address these 
challenges and our approach that analyses sentiments from 
Twitter social media which gives the output beyond just the 
polarity but use those

In [61]:
!pip install rouge-score sacrebleu




In [63]:
!pip install evaluate sacrebleu




# 📊 Evaluation Metrics for Summarization Models

This code calculates standard NLP evaluation metrics to assess the quality of generated summaries compared to reference texts:

```python
# Color key: 🔷 Metric Loading | 🔶 Data Preparation | 📈 Calculation | 📝 Reporting
```

## 🔷 **Metric Initialization**
- Imports the `evaluate` and `sacrebleu` libraries
- Loads the ROUGE metric suite for summarization evaluation

## 🔶 **Reference & Generation Preparation**
- Extracts ground truth abstracts from test dataset
- Collects model-generated summaries for comparison

## 📈 **Performance Calculation**
- Computes ROUGE metrics (precision, recall, F1-measure)
  - Measures word overlap between reference and generated texts
- Calculates BLEU score using the sacrebleu implementation
  - Measures n-gram precision with brevity penalty

## 📝 **Results Presentation**
- Displays ROUGE scores with 4 decimal precision
  - Shows ROUGE-1, ROUGE-2, and ROUGE-L metrics
- Reports the final BLEU score with a checkmark indicator

These metrics provide complementary views of summarization quality, with ROUGE focusing on recall and BLEU emphasizing precision, giving a well-rounded assessment of the model's performance.

In [64]:
import evaluate
import sacrebleu

# Load ROUGE metric
rouge = evaluate.load("rouge")

# Get reference summaries and generated summaries
references = test_df["Abstract"].tolist()  # Assuming "Abstract" is the ground truth
generated_summaries = test_df["Generated_Summary"].tolist()

# Compute ROUGE scores
rouge_scores = rouge.compute(predictions=generated_summaries, references=references)

# Compute BLEU score
bleu_score = sacrebleu.corpus_bleu(generated_summaries, [references])

# Print results
print("🔹 ROUGE Scores:")
for key, value in rouge_scores.items():
    print(f"  {key}: {value:.4f}")

print("\n✅ BLEU Score:", bleu_score.score)


🔹 ROUGE Scores:
  rouge1: 0.4427
  rouge2: 0.4354
  rougeL: 0.4368
  rougeLsum: 0.4394

✅ BLEU Score: 5.811382199343262


In [None]:
!pip install textstat

# 📊 Performance Analysis for Summarization Model

This code evaluates the practical performance characteristics of the summarization model across multiple dimensions:

```python
⏱️ Speed Metrics | 📏 Length Analysis | 📖 Readability | 🧠 Resource Usage
```

## ⏱️ **Inference Speed Measurement**
- Tracks total and per-summary processing time
- Uses a transformer pipeline for consistent measurement
- Sets parameters for deterministic generation (no_sample=False)
- Controls output length (30-128 tokens)

## 📏 **Summary Length Analysis**
- Calculates word and character counts for each generated summary
- Computes average lengths across the validation sample
- Provides insight into output verbosity and conciseness

## 📖 **Readability Assessment**
- Applies standard linguistic readability metrics:
  - Flesch Reading Ease (higher = more readable)
  - Flesch-Kincaid Grade Level (lower = more accessible)
- Averages scores across all generated summaries

## 🧠 **Resource Utilization**
- Monitors GPU memory consumption during inference
- Reports memory usage in megabytes when GPU is available
- Provides CPU fallback information when GPU is not present

## 📋 **Comprehensive Reporting**
- Presents all metrics in a clean, organized format
- Uses decorative elements for improved readability
- Formats numbers with appropriate precision for each metric type

This analysis provides a holistic view of model performance beyond just accuracy, considering practical deployment factors like speed, resource requirements, and output characteristics.

In [None]:
import time
import torch
import numpy as np
from textstat import flesch_reading_ease, flesch_kincaid_grade
from transformers import pipeline

# ✅ Measure Inference Speed
start_time = time.time()
generated_summaries = []
for text in sample_val["title"].tolist():
    summary = summarizer(text, max_length=128, min_length=30, do_sample=False)
    generated_summaries.append(summary[0]["summary_text"])
end_time = time.time()

inference_time = end_time - start_time
avg_inference_time = inference_time / len(sample_val)

# ✅ Compute Length Metrics
summary_lengths = [len(summary.split()) for summary in generated_summaries]
char_lengths = [len(summary) for summary in generated_summaries]

avg_word_length = np.mean(summary_lengths)
avg_char_length = np.mean(char_lengths)

# ✅ Readability Metrics
readability_scores = [flesch_reading_ease(summary) for summary in generated_summaries]
fk_grade_scores = [flesch_kincaid_grade(summary) for summary in generated_summaries]

avg_readability = np.mean(readability_scores)
avg_fk_grade = np.mean(fk_grade_scores)

# ✅ Compute Memory Usage (GPU)
if torch.cuda.is_available():
    memory_usage = torch.cuda.memory_allocated() / 1024**2  # MB
else:
    memory_usage = "N/A (Using CPU)"

# ✅ Print Results
print(f"🔹 Inference Time: {inference_time:.2f}s | Avg Per Summary: {avg_inference_time:.4f}s")
print(f"🔹 Avg Summary Length: {avg_word_length:.2f} words | {avg_char_length:.2f} chars")
print(f"🔹 Readability: {avg_readability:.2f} (Flesch) | FK Grade: {avg_fk_grade:.2f}")
print(f"🔹 Memory Usage: {memory_usage} MB (GPU)")
