# BERT Philosophical Text Classifier

## Continental vs Analytic Philosophy Style Detection

This notebook implements a BERT-based deep learning model to classify philosophical texts as either **Continental** or **Analytic** philosophy styles. The model analyzes the linguistic patterns, terminology, and stylistic features that distinguish these two major philosophical traditions.

### What You'll Learn:
- How to fine-tune BERT for text classification
- Distinguishing features of Continental vs Analytic philosophy
- Building a complete ML pipeline from data preparation to prediction
- Evaluating model performance and visualizing results

### Philosophical Traditions Overview:

**Continental Philosophy** is characterized by:
- Emphasis on lived experience and historical context
- Phenomenological and hermeneutical approaches  
- Focus on existential and dialectical themes
- Key figures: Heidegger, Sartre, Derrida, Foucault

**Analytic Philosophy** is characterized by:
- Logical rigor and formal analysis
- Emphasis on conceptual clarity and precision
- Problem-solving approach using formal methods
- Key figures: Russell, Quine, Davidson, Kripke

Let's build a classifier that can distinguish between these styles!

## 1. Import Required Libraries

First, let's import all the necessary libraries for our BERT-based philosophical text classifier.

In [None]:
# Core libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# Transformers and NLP
from transformers import AutoTokenizer, AutoModel, AutoConfig
import numpy as np
import pandas as pd

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import precision_recall_fscore_support

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Utilities
from typing import Dict, List, Tuple, Optional
import json
import os
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set style for plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Check device availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
    print("GPU not available, using CPU")

## 2. Define the BERT Classifier Model

Now let's create our BERT-based classifier. This model will:
- Use a pre-trained BERT model as the base
- Add a classification head with dropout for regularization
- Output probability scores for Continental vs Analytic classification

In [None]:
class PhilosophicalBERTClassifier(nn.Module):
    """
    BERT-based classifier for distinguishing between Continental and Analytic philosophy texts.
    
    Architecture:
    - Pre-trained BERT model for feature extraction
    - Dropout layer for regularization  
    - Linear classification layer
    - Softmax activation for probability outputs
    """
    
    def __init__(self, model_name: str = 'bert-base-uncased', num_classes: int = 2, dropout_rate: float = 0.3):
        super(PhilosophicalBERTClassifier, self).__init__()
        
        # Load pre-trained BERT model
        self.bert = AutoModel.from_pretrained(model_name)
        
        # Freeze first few layers to retain pre-trained knowledge
        # We'll fine-tune the last few layers
        for param in self.bert.embeddings.parameters():
            param.requires_grad = False
        for layer in self.bert.encoder.layer[:8]:  # Freeze first 8 layers
            for param in layer.parameters():
                param.requires_grad = False
        
        # Classification head
        self.dropout = nn.Dropout(dropout_rate)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_classes)
        self.softmax = nn.Softmax(dim=1)
        
        print(f"Model initialized with {model_name}")
        print(f"Hidden size: {self.bert.config.hidden_size}")
        print(f"Number of classes: {num_classes}")
        print(f"Dropout rate: {dropout_rate}")
        
    def forward(self, input_ids, attention_mask):
        """
        Forward pass through the model.
        
        Args:
            input_ids: Token IDs from BERT tokenizer
            attention_mask: Attention mask to ignore padding tokens
            
        Returns:
            logits: Raw classification scores
            probabilities: Softmax probabilities for each class
        """
        # Get BERT outputs
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        
        # Use pooled output (CLS token representation)
        pooled_output = outputs.pooler_output
        
        # Apply dropout for regularization
        output = self.dropout(pooled_output)
        
        # Classification layer
        logits = self.classifier(output)
        
        # Get probabilities
        probabilities = self.softmax(logits)
        
        return logits, probabilities

# Test model instantiation
test_model = PhilosophicalBERTClassifier()
print(f"\nModel created successfully!")
print(f"Total parameters: {sum(p.numel() for p in test_model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in test_model.parameters() if p.requires_grad):,}")

# Clean up test model
del test_model

## 3. Create the Philosophy Dataset Class

We need a custom Dataset class to handle the preprocessing of philosophical texts for BERT input format.

In [None]:
class PhilosophyDataset(Dataset):
    """
    Custom PyTorch Dataset for philosophical texts.
    
    This class handles:
    - Text tokenization using BERT tokenizer
    - Proper padding and truncation
    - Label encoding for classification
    """
    
    def __init__(self, texts: List[str], labels: List[str], tokenizer, max_length: int = 512):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Create label mapping
        self.label_to_idx = {'Continental': 0, 'Analytic': 1}
        self.idx_to_label = {0: 'Continental', 1: 'Analytic'}
        
        print(f"Dataset created with {len(texts)} samples")
        print(f"Max sequence length: {max_length}")
        print(f"Label mapping: {self.label_to_idx}")
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        """
        Get a single item from the dataset.
        
        Returns:
            Dictionary containing:
            - input_ids: Token IDs for BERT
            - attention_mask: Mask for padding tokens
            - labels: Encoded label (0=Continental, 1=Analytic)
        """
        text = str(self.texts[idx])
        label = self.labels[idx]
        
        # Encode the text
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(self.label_to_idx[label], dtype=torch.long)
        }
    
    def get_label_distribution(self):
        """Get the distribution of labels in the dataset."""
        from collections import Counter
        label_counts = Counter(self.labels)
        return label_counts

# Test the dataset class with sample data
sample_texts = [
    "Being-in-the-world reveals the fundamental structure of human existence.",
    "The principle of charity requires interpreting arguments in their strongest form."
]
sample_labels = ["Continental", "Analytic"]

# Initialize tokenizer for testing
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Create test dataset
test_dataset = PhilosophyDataset(sample_texts, sample_labels, tokenizer, max_length=128)

# Test getting an item
sample_item = test_dataset[0]
print(f"\nSample item structure:")
for key, value in sample_item.items():
    print(f"{key}: {value.shape if hasattr(value, 'shape') else type(value)}")

print(f"\nLabel distribution: {test_dataset.get_label_distribution()}")

# Clean up
del test_dataset, sample_item

## 4. Initialize the Main Classifier Class

Let's create the main class that orchestrates the entire pipeline for training and inference.

In [None]:
class PhilosophyClassifier:
    """
    Main classifier class that handles the complete pipeline for philosophical text classification.
    
    Features:
    - Model initialization and configuration
    - Data preparation and preprocessing
    - Training with validation
    - Prediction with probability outputs
    - Model saving and loading
    """
    
    def __init__(self, model_name: str = 'bert-base-uncased', max_length: int = 512):
        self.model_name = model_name
        self.max_length = max_length
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        
        # Initialize tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        
        # Model will be initialized during training
        self.model = None
        
        # Class mapping
        self.class_names = ['Continental', 'Analytic']
        self.class_to_idx = {name: idx for idx, name in enumerate(self.class_names)}
        self.idx_to_class = {idx: name for idx, name in enumerate(self.class_names)}
        
        print(f"PhilosophyClassifier initialized:")
        print(f"  Model: {model_name}")
        print(f"  Max length: {max_length}")
        print(f"  Device: {self.device}")
        print(f"  Classes: {self.class_names}")
        
    def prepare_data(self, texts: List[str], labels: List[str], test_size: float = 0.2, batch_size: int = 8):
        """
        Prepare data loaders for training and validation.
        
        Args:
            texts: List of philosophical texts
            labels: List of corresponding labels ('Continental' or 'Analytic')
            test_size: Fraction of data to use for validation
            batch_size: Batch size for training
            
        Returns:
            train_loader, val_loader: PyTorch DataLoaders
        """
        print(f"Preparing data with {len(texts)} samples...")
        
        # Split data
        train_texts, val_texts, train_labels, val_labels = train_test_split(
            texts, labels, test_size=test_size, random_state=42, stratify=labels
        )
        
        print(f"Train samples: {len(train_texts)}")
        print(f"Validation samples: {len(val_texts)}")
        
        # Create datasets
        train_dataset = PhilosophyDataset(train_texts, train_labels, self.tokenizer, self.max_length)
        val_dataset = PhilosophyDataset(val_texts, val_labels, self.tokenizer, self.max_length)
        
        # Create data loaders
        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
        
        print(f"Data loaders created with batch size: {batch_size}")
        
        return train_loader, val_loader
    
    def predict_single(self, text: str) -> Dict[str, float]:
        """
        Predict the philosophical style of a single text.
        
        Args:
            text: Input philosophical text
            
        Returns:
            Dictionary with probability scores for each class
        """
        if self.model is None:
            raise ValueError("Model not trained yet. Please train the model first.")
        
        self.model.eval()
        
        # Preprocess text
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        ).to(self.device)
        
        with torch.no_grad():
            logits, probabilities = self.model(
                encoding['input_ids'], 
                encoding['attention_mask']
            )
            probs = probabilities.cpu().numpy()[0]
        
        # Create result dictionary
        result = {}
        for idx, prob in enumerate(probs):
            class_name = self.idx_to_class[idx]
            result[class_name] = float(prob)
        
        return result
    
    def save_model(self, path: str):
        """Save the trained model and configuration."""
        if self.model is None:
            raise ValueError("No model to save. Train the model first.")
        
        checkpoint = {
            'model_state_dict': self.model.state_dict(),
            'model_name': self.model_name,
            'max_length': self.max_length,
            'class_names': self.class_names,
            'class_to_idx': self.class_to_idx,
            'idx_to_class': self.idx_to_class
        }
        
        torch.save(checkpoint, path)
        print(f"Model saved to {path}")
    
    def load_model(self, path: str):
        """Load a trained model from disk."""
        checkpoint = torch.load(path, map_location=self.device)
        
        self.model = PhilosophicalBERTClassifier(checkpoint['model_name']).to(self.device)
        self.model.load_state_dict(checkpoint['model_state_dict'])
        
        self.model_name = checkpoint['model_name']
        self.max_length = checkpoint['max_length']
        self.class_names = checkpoint['class_names']
        self.class_to_idx = checkpoint['class_to_idx']
        self.idx_to_class = checkpoint['idx_to_class']
        
        print(f"Model loaded from {path}")

# Initialize the classifier
classifier = PhilosophyClassifier()
print("\\nClassifier ready for data preparation and training!")

## 5. Generate Sample Philosophical Data

Let's create a comprehensive dataset with representative texts from both Continental and Analytic philosophy traditions. These samples showcase the distinctive linguistic patterns and conceptual approaches of each tradition.

In [None]:
def create_philosophical_dataset():
    """
    Create a comprehensive dataset of philosophical texts from both traditions.
    
    Continental Philosophy Features:
    - Phenomenological terminology (Dasein, Being-in-the-world, etc.)
    - Existential themes (anxiety, authenticity, freedom)
    - Dialectical thinking and historical consciousness
    - Emphasis on lived experience and interpretation
    
    Analytic Philosophy Features:
    - Logical precision and formal analysis
    - Epistemological problems (knowledge, justification)
    - Conceptual analysis and linguistic clarity
    - Problem-solving approach with rigorous arguments
    """
    
    # Continental Philosophy Texts
    continental_texts = [
        # Heidegger-inspired existential analysis
        "Being-in-the-world is a fundamental structure of Dasein that reveals the primordial unity of our existence. The phenomenon of anxiety discloses the nothingness that underlies all beings, showing us the groundlessness of our thrown existence.",
        
        # Hegelian dialectical philosophy
        "The dialectical movement of history unfolds through the negation of negation, where consciousness encounters its other and returns to itself transformed. This process of Bildung constitutes the very essence of Spirit's self-realization.",
        
        # Heidegger on language and Being
        "Language is the house of Being, and in its home dwells man. When we go to the well, when we go through the woods, we are always already going through the word 'well,' through the word 'woods,' even if we do not speak these words out loud or think of anything relating to language.",
        
        # Levinasian ethics
        "The Other precedes the Same, and responsibility for the Other is prior to freedom. The face-to-face encounter reveals the infinite responsibility that constitutes subjectivity itself.",
        
        # Sartrean existentialism
        "Existence precedes essence, and man is condemned to be free. In anguish, we confront the radical contingency of our situation and the weight of absolute responsibility for our choices.",
        
        # Husserlian phenomenology
        "The lifeworld provides the horizon of meaning within which all scientific abstractions find their ultimate foundation. The crisis of the European sciences stems from the forgetting of this fundamental stratum of experience.",
        
        # Derridean deconstruction
        "Difference differs from itself, creating the play of signification that undermines any metaphysics of presence. The trace structure of meaning reveals the impossibility of pure self-presence.",
        
        # Foucauldian analysis
        "Power produces knowledge, and knowledge reinforces power relations. The subject is not the sovereign author of discourse but is constituted through discursive practices and technologies of the self.",
        
        # Lacanian psychoanalysis
        "The Imaginary, Symbolic, and Real form the three registers that structure human subjectivity. The unconscious is structured like a language, and desire circulates through the signifying chain.",
        
        # Phenomenological analysis
        "Lived experience (Erlebnis) differs fundamentally from mere occurrence, as it carries within itself the intentional structure that connects consciousness to its objects through various modes of givenness.",
        
        # Gadamerian hermeneutics
        "Understanding is not a methodological procedure but the fundamental mode of human being-in-the-world. The fusion of horizons occurs when the interpreter's prejudices encounter the tradition's claims to truth.",
        
        # Merleau-Ponty on embodiment
        "The body is not merely an object among objects but the very condition of possibility for experiencing a world. Perception is not a mental act but a bodily engagement with the sensible world.",
        
        # Adorno on negative dialectics
        "The concept's totalizing tendency must be broken by its own contradictions. Negative dialectics preserves the non-identical by insisting on the primacy of the object against conceptual domination.",
        
        # Benjamin on historical materialism
        "The angel of history sees one single catastrophe which keeps piling wreckage upon wreckage. What we call progress is this storm blowing from Paradise, piling up the debris of the past.",
        
        # Bataille on excess and expenditure
        "The accursed share reveals the fundamental principle of loss that governs all economies. Human existence is characterized by the need to destroy or waste the excess energy that cannot be accumulated.",
    ]
    
    # Analytic Philosophy Texts
    analytic_texts = [
        # Gettier problem in epistemology
        "If knowledge is justified true belief, then the Gettier problem shows that justification alone is insufficient. We need an additional condition that prevents the kind of epistemic luck that makes justified true belief fall short of knowledge.",
        
        # Principle of charity in interpretation
        "The principle of charity requires that we interpret others' arguments in their strongest form. This methodological constraint ensures that philosophical dialogue proceeds through genuine engagement with opposing positions rather than straw man attacks.",
        
        # David Lewis on modal realism
        "Modal realism holds that all possible worlds exist as concrete physical realities. This thesis, while counterintuitive, provides elegant solutions to problems about the nature of properties, propositions, and possibility itself.",
        
        # Kripke on rigid designation
        "The causal theory of reference establishes that proper names are rigid designators whose reference is fixed through causal chains leading back to initial baptismal events. This explains how names can refer even when speakers lack definite descriptions.",
        
        # Functionalism in philosophy of mind
        "Functionalism defines mental states in terms of their causal roles rather than their physical realizations. Pain, for instance, is whatever state typically causes withdrawal behavior and is caused by tissue damage, regardless of its material substrate.",
        
        # Hume's is-ought problem
        "The is-ought problem demonstrates that normative conclusions cannot be validly derived from purely descriptive premises. This logical gap requires additional normative assumptions to bridge the transition from facts to values.",
        
        # Quine on indeterminacy of translation
        "Quine's indeterminacy thesis shows that translation between languages is underdetermined by all possible behavioral evidence. This result undermines the notion of determinate meaning and supports a holistic view of language.",
        
        # Jackson's knowledge argument
        "The knowledge argument purports to show that physicalism is false because Mary learns new facts upon experiencing color for the first time. However, the ability hypothesis suggests she gains abilities rather than propositional knowledge.",
        
        # Putnam's Twin Earth argument
        "Externalism about mental content holds that the environment partially determines what we think. Twin Earth scenarios demonstrate that molecularly identical beings can have different thoughts if their environments differ.",
        
        # Problem of induction
        "The problem of induction challenges the rational justification for inductive reasoning. Even if past instances of bread nourished us, logic alone cannot establish that future bread will continue to do so.",
        
        # Russell's theory of descriptions
        "Definite descriptions are not genuine singular terms but rather quantified expressions. The sentence 'The present King of France is bald' is false rather than lacking a truth value, because it contains a false existential presupposition.",
        
        # Tarski on truth
        "The semantic conception of truth provides a formally adequate definition within a metalanguage. For any sentence S in the object language, 'S is true' if and only if S, where the right-hand occurrence of S is used rather than mentioned.",
        
        # Davidson on radical interpretation
        "Radical interpretation requires the simultaneous assignment of beliefs and meanings to maximize agreement with the interpreter's own beliefs. The principle of charity is not optional but constitutive of interpretation itself.",
        
        # Searle's Chinese Room argument
        "The Chinese Room argument demonstrates that syntax is insufficient for semantics. A system can manipulate symbols according to formal rules without understanding their meaning, showing that computation alone cannot account for intentionality.",
        
        # Lewis on laws of nature
        "Laws of nature are regularities that hold in all possible worlds where the Humean mosaic of particular facts achieves the best balance of simplicity and strength. This analysis reduces laws to patterns in the distribution of local qualities.",
    ]
    
    # Create DataFrame
    all_texts = continental_texts + analytic_texts
    all_labels = ['Continental'] * len(continental_texts) + ['Analytic'] * len(analytic_texts)
    
    df = pd.DataFrame({
        'text': all_texts,
        'label': all_labels,
        'text_length': [len(text) for text in all_texts],
        'word_count': [len(text.split()) for text in all_texts]
    })
    
    return df

# Create the dataset
philosophy_df = create_philosophical_dataset()

print(f"Dataset created with {len(philosophy_df)} samples")
print(f"\\nLabel distribution:")
print(philosophy_df['label'].value_counts())

print(f"\\nText statistics:")
print(f"Average text length: {philosophy_df['text_length'].mean():.1f} characters")
print(f"Average word count: {philosophy_df['word_count'].mean():.1f} words")
print(f"Max text length: {philosophy_df['text_length'].max()} characters")
print(f"Min text length: {philosophy_df['text_length'].min()} characters")

# Display sample texts
print(f"\\n=== Sample Continental Text ===")
continental_sample = philosophy_df[philosophy_df['label'] == 'Continental'].iloc[0]
print(f"Text: {continental_sample['text']}")
print(f"Length: {continental_sample['text_length']} chars, {continental_sample['word_count']} words")

print(f"\\n=== Sample Analytic Text ===")
analytic_sample = philosophy_df[philosophy_df['label'] == 'Analytic'].iloc[0]
print(f"Text: {analytic_sample['text']}")
print(f"Length: {analytic_sample['text_length']} chars, {analytic_sample['word_count']} words")

## 6. Prepare Data for Training

Now let's prepare our data for training by creating train/validation splits and DataLoaders.

In [None]:
# Prepare data for training
texts = philosophy_df['text'].tolist()
labels = philosophy_df['label'].tolist()

# Create train/validation data loaders
train_loader, val_loader = classifier.prepare_data(
    texts=texts,
    labels=labels,
    test_size=0.2,
    batch_size=8
)

print(f"Training batches: {len(train_loader)}")
print(f"Validation batches: {len(val_loader)}")

# Examine a batch
sample_batch = next(iter(train_loader))
print(f"\\nSample batch structure:")
for key, value in sample_batch.items():
    print(f"  {key}: {value.shape}")

# Check tokenizer vocabulary size
print(f"\\nTokenizer info:")
print(f"  Vocabulary size: {len(classifier.tokenizer)}")
print(f"  Model max length: {classifier.tokenizer.model_max_length}")
print(f"  Special tokens: {classifier.tokenizer.special_tokens_map}")

# Decode a sample to verify tokenization
sample_input_ids = sample_batch['input_ids'][0]
decoded_text = classifier.tokenizer.decode(sample_input_ids, skip_special_tokens=True)
print(f"\\nSample decoded text (first 100 chars): {decoded_text[:100]}...")

## 7. Train the BERT Model

Now let's implement the training loop to fine-tune BERT for our philosophical text classification task.

In [None]:
def train_philosophy_classifier(classifier, train_loader, val_loader, epochs=3, learning_rate=2e-5):
    """
    Train the BERT classifier for philosophical text classification.
    
    Args:
        classifier: PhilosophyClassifier instance
        train_loader: Training data loader
        val_loader: Validation data loader
        epochs: Number of training epochs
        learning_rate: Learning rate for optimizer
        
    Returns:
        training_history: Dictionary with training metrics
    """
    
    # Initialize model
    classifier.model = PhilosophicalBERTClassifier(classifier.model_name).to(classifier.device)
    
    # Setup optimizer and loss function
    optimizer = optim.AdamW(classifier.model.parameters(), lr=learning_rate, weight_decay=0.01)
    criterion = nn.CrossEntropyLoss()
    
    # Learning rate scheduler
    scheduler = optim.lr_scheduler.LinearLR(optimizer, start_factor=1.0, end_factor=0.1, total_iters=epochs)
    
    # Training history
    history = {
        'train_loss': [],
        'val_loss': [],
        'val_accuracy': [],
        'val_precision': [],
        'val_recall': [],
        'val_f1': []
    }
    
    print(f"Starting training for {epochs} epochs...")
    print(f"Learning rate: {learning_rate}")
    print(f"Device: {classifier.device}")
    print("-" * 60)
    
    for epoch in range(epochs):
        epoch_start_time = datetime.now()
        
        # Training phase
        classifier.model.train()
        total_train_loss = 0
        train_steps = 0
        
        for batch_idx, batch in enumerate(train_loader):
            # Move batch to device
            input_ids = batch['input_ids'].to(classifier.device)
            attention_mask = batch['attention_mask'].to(classifier.device)
            labels = batch['labels'].to(classifier.device)
            
            # Forward pass
            optimizer.zero_grad()
            logits, _ = classifier.model(input_ids, attention_mask)
            loss = criterion(logits, labels)
            
            # Backward pass
            loss.backward()
            
            # Gradient clipping to prevent exploding gradients
            torch.nn.utils.clip_grad_norm_(classifier.model.parameters(), max_norm=1.0)
            
            optimizer.step()
            
            total_train_loss += loss.item()
            train_steps += 1
            
            # Print progress every 5 batches
            if (batch_idx + 1) % 5 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Batch {batch_idx+1}/{len(train_loader)}, Loss: {loss.item():.4f}")
        
        avg_train_loss = total_train_loss / train_steps
        history['train_loss'].append(avg_train_loss)
        
        # Validation phase
        classifier.model.eval()
        total_val_loss = 0
        all_predictions = []
        all_labels = []
        val_steps = 0
        
        with torch.no_grad():
            for batch in val_loader:
                input_ids = batch['input_ids'].to(classifier.device)
                attention_mask = batch['attention_mask'].to(classifier.device)
                labels = batch['labels'].to(classifier.device)
                
                logits, probabilities = classifier.model(input_ids, attention_mask)
                loss = criterion(logits, labels)
                
                total_val_loss += loss.item()
                val_steps += 1
                
                # Get predictions
                predictions = torch.argmax(probabilities, dim=1)
                all_predictions.extend(predictions.cpu().numpy())
                all_labels.extend(labels.cpu().numpy())
        
        # Calculate metrics
        avg_val_loss = total_val_loss / val_steps
        val_accuracy = accuracy_score(all_labels, all_predictions)
        val_precision, val_recall, val_f1, _ = precision_recall_fscore_support(
            all_labels, all_predictions, average='weighted'
        )
        
        # Store metrics
        history['val_loss'].append(avg_val_loss)
        history['val_accuracy'].append(val_accuracy)
        history['val_precision'].append(val_precision)
        history['val_recall'].append(val_recall)
        history['val_f1'].append(val_f1)
        
        # Update learning rate
        scheduler.step()
        
        # Print epoch summary
        epoch_time = datetime.now() - epoch_start_time
        print(f"\\nEpoch {epoch+1}/{epochs} Summary:")
        print(f"  Train Loss: {avg_train_loss:.4f}")
        print(f"  Val Loss: {avg_val_loss:.4f}")
        print(f"  Val Accuracy: {val_accuracy:.4f}")
        print(f"  Val F1: {val_f1:.4f}")
        print(f"  Time: {epoch_time}")
        print(f"  Learning Rate: {scheduler.get_last_lr()[0]:.2e}")
        print("-" * 60)
    
    print("Training completed!")
    return history

# Train the model
training_history = train_philosophy_classifier(
    classifier=classifier,
    train_loader=train_loader,
    val_loader=val_loader,
    epochs=3,
    learning_rate=2e-5
)

## 8. Evaluate Model Performance

Let's evaluate our trained model and create detailed performance reports.

In [None]:
def evaluate_model(classifier, val_loader):
    """
    Comprehensive evaluation of the trained model.
    
    Returns:
        Dictionary with detailed evaluation metrics
    """
    classifier.model.eval()
    
    all_predictions = []
    all_probabilities = []
    all_labels = []
    
    with torch.no_grad():
        for batch in val_loader:
            input_ids = batch['input_ids'].to(classifier.device)
            attention_mask = batch['attention_mask'].to(classifier.device)
            labels = batch['labels'].to(classifier.device)
            
            logits, probabilities = classifier.model(input_ids, attention_mask)
            
            predictions = torch.argmax(probabilities, dim=1)
            
            all_predictions.extend(predictions.cpu().numpy())
            all_probabilities.extend(probabilities.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    # Calculate metrics
    accuracy = accuracy_score(all_labels, all_predictions)
    
    # Per-class metrics
    precision, recall, f1, support = precision_recall_fscore_support(
        all_labels, all_predictions, average=None
    )
    
    # Overall weighted metrics
    precision_weighted, recall_weighted, f1_weighted, _ = precision_recall_fscore_support(
        all_labels, all_predictions, average='weighted'
    )
    
    # Confusion matrix
    cm = confusion_matrix(all_labels, all_predictions)
    
    return {
        'accuracy': accuracy,
        'precision_per_class': precision,
        'recall_per_class': recall,
        'f1_per_class': f1,
        'support_per_class': support,
        'precision_weighted': precision_weighted,
        'recall_weighted': recall_weighted,
        'f1_weighted': f1_weighted,
        'confusion_matrix': cm,
        'predictions': all_predictions,
        'probabilities': all_probabilities,
        'true_labels': all_labels
    }

# Evaluate the model
evaluation_results = evaluate_model(classifier, val_loader)

print("=== Model Performance Evaluation ===\\n")

print(f"Overall Accuracy: {evaluation_results['accuracy']:.4f}")
print(f"Weighted Precision: {evaluation_results['precision_weighted']:.4f}")
print(f"Weighted Recall: {evaluation_results['recall_weighted']:.4f}")
print(f"Weighted F1-Score: {evaluation_results['f1_weighted']:.4f}")

print("\\n=== Per-Class Performance ===")
class_names = ['Continental', 'Analytic']
for i, class_name in enumerate(class_names):
    print(f"\\n{class_name}:")
    print(f"  Precision: {evaluation_results['precision_per_class'][i]:.4f}")
    print(f"  Recall: {evaluation_results['recall_per_class'][i]:.4f}")
    print(f"  F1-Score: {evaluation_results['f1_per_class'][i]:.4f}")
    print(f"  Support: {evaluation_results['support_per_class'][i]}")

# Confusion Matrix
print("\\n=== Confusion Matrix ===")
cm = evaluation_results['confusion_matrix']
print(f"\\n{'':>12} {'Continental':>12} {'Analytic':>12}")
print(f"{'Continental':>12} {cm[0,0]:>12} {cm[0,1]:>12}")
print(f"{'Analytic':>12} {cm[1,0]:>12} {cm[1,1]:>12}")

# Classification Report
print("\\n=== Detailed Classification Report ===")
print(classification_report(
    evaluation_results['true_labels'], 
    evaluation_results['predictions'], 
    target_names=class_names
))

## 9. Implement Text Prediction Function

Now let's create a user-friendly prediction function that takes any philosophical text and returns probability scores.

In [None]:
def predict_philosophical_style(text, classifier, detailed=True):
    """
    Predict the philosophical style of input text.
    
    Args:
        text: Input philosophical text to classify
        classifier: Trained PhilosophyClassifier instance
        detailed: Whether to return detailed analysis
        
    Returns:
        Dictionary with prediction results and probabilities
    """
    if not text.strip():
        return {"error": "Please provide non-empty text"}
    
    try:
        # Get prediction
        result = classifier.predict_single(text)
        
        # Determine predicted class
        predicted_class = max(result, key=result.get)
        confidence = result[predicted_class]
        
        # Basic result
        prediction_result = {
            "input_text": text,
            "predicted_style": predicted_class,
            "confidence": confidence,
            "probabilities": {
                "Continental": result["Continental"],
                "Analytic": result["Analytic"]
            }
        }
        
        if detailed:
            # Add detailed analysis
            continental_prob = result["Continental"]
            analytic_prob = result["Analytic"]
            
            # Style characteristics
            if predicted_class == "Continental":
                characteristics = [
                    "🏛️ Emphasis on lived experience and historical context",
                    "🌊 Dialectical and phenomenological approach", 
                    "📖 Focus on interpretation and hermeneutics",
                    "🎭 Existential and ontological themes",
                    "🌀 Critique of traditional metaphysics"
                ]
            else:
                characteristics = [
                    "🔬 Logical rigor and formal analysis",
                    "🎯 Conceptual clarity and precision",
                    "⚖️ Problem-solving methodology",
                    "📊 Empirical and scientific approach",
                    "🔍 Language and meaning analysis"
                ]
            
            prediction_result.update({
                "detailed_analysis": {
                    "continental_percentage": f"{continental_prob*100:.1f}%",
                    "analytic_percentage": f"{analytic_prob*100:.1f}%",
                    "confidence_level": "High" if confidence > 0.8 else "Medium" if confidence > 0.6 else "Low",
                    "style_characteristics": characteristics,
                    "text_length": len(text),
                    "word_count": len(text.split())
                }
            })
        
        return prediction_result
        
    except Exception as e:
        return {"error": f"Prediction failed: {str(e)}"}

def display_prediction(result):
    """Display prediction results in a formatted way."""
    if "error" in result:
        print(f"❌ Error: {result['error']}")
        return
    
    print("🎓 PHILOSOPHICAL STYLE ANALYSIS")
    print("=" * 60)
    
    print(f"📝 Input Text:")
    text = result["input_text"]
    if len(text) > 150:
        print(f"   {text[:150]}...")
    else:
        print(f"   {text}")
    
    print(f"\\n🎯 Prediction: {result['predicted_style']}")
    print(f"🔍 Confidence: {result['confidence']:.1%}")
    
    print(f"\\n📊 Probability Breakdown:")
    for style, prob in result["probabilities"].items():
        bar_length = int(prob * 20)  # Scale to 20 characters
        bar = "█" * bar_length + "░" * (20 - bar_length)
        print(f"   {style:12} {bar} {prob:.1%}")
    
    if "detailed_analysis" in result:
        details = result["detailed_analysis"]
        print(f"\\n🎨 Style Characteristics:")
        for char in details["style_characteristics"]:
            print(f"   {char}")
        
        print(f"\\n📏 Text Statistics:")
        print(f"   Length: {details['text_length']} characters")
        print(f"   Words: {details['word_count']}")
        print(f"   Confidence Level: {details['confidence_level']}")

# Test the prediction function with a sample text
test_text = "Being-in-the-world is a fundamental structure of Dasein that reveals the primordial unity of our existence."

print("Testing prediction function...")
sample_prediction = predict_philosophical_style(test_text, classifier)
display_prediction(sample_prediction)

## 10. Test the Classifier with Sample Texts

Let's test our classifier with various philosophical texts from famous philosophers to see how well it performs.

In [None]:
# Test texts from famous philosophers
test_texts = {
    "Heidegger (Continental)": "The essence of Dasein lies in its existence. Dasein is that entity which is in each case mine, and which has, as its manner of Being, the possibility of existing authentically or inauthentically.",
    
    "Sartre (Continental)": "Man is condemned to be free; because once thrown into the world, he is responsible for everything he does. It is up to you to give life a meaning.",
    
    "Derrida (Continental)": "Every sign, linguistic or nonlinguistic, spoken or written, in a small or large unit, can be cited, grafted, iterated. This iterability alters, parasitically, the identity of the element.",
    
    "Russell (Analytic)": "The method of 'postulating' what we want has many advantages; they are the same as the advantages of theft over honest toil. Let us see what can be done with conscientious toil.",
    
    "Wittgenstein (Analytic)": "The limits of my language mean the limits of my world. Whereof one cannot speak, thereof one must be silent.",
    
    "Quine (Analytic)": "The totality of our so-called knowledge or beliefs, from the most casual matters of geography and history to the profoundest laws of atomic physics, is a man-made fabric which impinges on experience only along the edges.",
    
    "Foucault (Continental)": "Where there is power, there is resistance, and yet, or rather consequently, this resistance is never in a position of exteriority in relation to power.",
    
    "Davidson (Analytic)": "In giving up the dualism of scheme and world, we do not give up the world, but reestablish unmediated touch with the familiar objects whose antics make our sentences and opinions true or false.",
    
    "Merleau-Ponty (Continental)": "The body is our general medium for having a world. Sometimes it is restricted to the actions necessary for the conservation of life, and accordingly it posits around us a biological world.",
    
    "Kripke (Analytic)": "A rigid designator designates the same object in all possible worlds in which that object exists and never designates anything else."
}

print("🧪 TESTING CLASSIFIER WITH FAMOUS PHILOSOPHERS")
print("=" * 80)

# Test each text and store results
test_results = []

for philosopher, text in test_texts.items():
    print(f"\\n📚 Testing: {philosopher}")
    print("-" * 40)
    
    result = predict_philosophical_style(text, classifier, detailed=False)
    
    if "error" not in result:
        predicted = result["predicted_style"]
        confidence = result["confidence"]
        continental_prob = result["probabilities"]["Continental"]
        analytic_prob = result["probabilities"]["Analytic"]
        
        # Extract expected style from label
        expected = "Continental" if "Continental" in philosopher else "Analytic"
        correct = predicted == expected
        
        print(f"Expected: {expected}")
        print(f"Predicted: {predicted} ({confidence:.1%} confidence)")
        print(f"Continental: {continental_prob:.1%} | Analytic: {analytic_prob:.1%}")
        print(f"✅ Correct" if correct else "❌ Incorrect")
        
        test_results.append({
            'philosopher': philosopher,
            'expected': expected,
            'predicted': predicted,
            'correct': correct,
            'confidence': confidence,
            'continental_prob': continental_prob,
            'analytic_prob': analytic_prob
        })
    else:
        print(f"❌ Error: {result['error']}")

# Calculate overall accuracy on test samples
if test_results:
    correct_predictions = sum(1 for r in test_results if r['correct'])
    total_predictions = len(test_results)
    accuracy = correct_predictions / total_predictions
    
    print(f"\\n\\n📊 OVERALL TEST RESULTS")
    print("=" * 40)
    print(f"Accuracy: {accuracy:.1%} ({correct_predictions}/{total_predictions})")
    
    # Breakdown by expected class
    continental_results = [r for r in test_results if r['expected'] == 'Continental']
    analytic_results = [r for r in test_results if r['expected'] == 'Analytic']
    
    if continental_results:
        continental_accuracy = sum(1 for r in continental_results if r['correct']) / len(continental_results)
        avg_continental_confidence = sum(r['confidence'] for r in continental_results if r['correct']) / max(1, sum(1 for r in continental_results if r['correct']))
        print(f"Continental accuracy: {continental_accuracy:.1%}")
        print(f"Avg confidence (correct): {avg_continental_confidence:.1%}")
    
    if analytic_results:
        analytic_accuracy = sum(1 for r in analytic_results if r['correct']) / len(analytic_results)
        avg_analytic_confidence = sum(r['confidence'] for r in analytic_results if r['correct']) / max(1, sum(1 for r in analytic_results if r['correct']))
        print(f"Analytic accuracy: {analytic_accuracy:.1%}")
        print(f"Avg confidence (correct): {avg_analytic_confidence:.1%}")
    
    # Show misclassified examples
    misclassified = [r for r in test_results if not r['correct']]
    if misclassified:
        print(f"\\n❌ Misclassified examples:")
        for r in misclassified:
            print(f"  {r['philosopher']}: Expected {r['expected']}, got {r['predicted']} ({r['confidence']:.1%})")

print(f"\\n\\n🎯 Try your own text!")
print("Use: predict_philosophical_style('your text here', classifier)")
print("Or:  display_prediction(predict_philosophical_style('your text', classifier))")

## 11. Visualize Training Results

Let's create comprehensive visualizations of our training process and model performance.

In [None]:
# Create comprehensive visualizations
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('BERT Philosophical Text Classifier - Training & Performance Analysis', fontsize=16, fontweight='bold')

# 1. Training and Validation Loss
epochs = range(1, len(training_history['train_loss']) + 1)
axes[0, 0].plot(epochs, training_history['train_loss'], 'b-o', label='Training Loss', linewidth=2)
axes[0, 0].plot(epochs, training_history['val_loss'], 'r-s', label='Validation Loss', linewidth=2)
axes[0, 0].set_title('Training and Validation Loss', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Validation Accuracy
axes[0, 1].plot(epochs, training_history['val_accuracy'], 'g-^', label='Validation Accuracy', linewidth=2)
axes[0, 1].set_title('Validation Accuracy Progress', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Accuracy')
axes[0, 1].set_ylim(0, 1)
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Add accuracy percentage labels
for i, acc in enumerate(training_history['val_accuracy']):
    axes[0, 1].annotate(f'{acc:.1%}', (i+1, acc), textcoords="offset points", xytext=(0,10), ha='center')

# 3. F1 Score Progress
axes[0, 2].plot(epochs, training_history['val_f1'], 'm-d', label='Validation F1', linewidth=2)
axes[0, 2].set_title('F1-Score Progress', fontsize=14, fontweight='bold')
axes[0, 2].set_xlabel('Epoch')
axes[0, 2].set_ylabel('F1-Score')
axes[0, 2].set_ylim(0, 1)
axes[0, 2].legend()
axes[0, 2].grid(True, alpha=0.3)

# 4. Confusion Matrix Heatmap
cm = evaluation_results['confusion_matrix']
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
           xticklabels=['Continental', 'Analytic'], 
           yticklabels=['Continental', 'Analytic'],
           ax=axes[1, 0])
axes[1, 0].set_title('Confusion Matrix', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Predicted')
axes[1, 0].set_ylabel('Actual')

# 5. Per-Class Performance Metrics
class_names = ['Continental', 'Analytic']
metrics = ['Precision', 'Recall', 'F1-Score']
continental_scores = [evaluation_results['precision_per_class'][0], 
                     evaluation_results['recall_per_class'][0], 
                     evaluation_results['f1_per_class'][0]]
analytic_scores = [evaluation_results['precision_per_class'][1], 
                  evaluation_results['recall_per_class'][1], 
                  evaluation_results['f1_per_class'][1]]

x = np.arange(len(metrics))
width = 0.35

bars1 = axes[1, 1].bar(x - width/2, continental_scores, width, label='Continental', color='skyblue')
bars2 = axes[1, 1].bar(x + width/2, analytic_scores, width, label='Analytic', color='lightcoral')

axes[1, 1].set_title('Per-Class Performance Metrics', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Metrics')
axes[1, 1].set_ylabel('Score')
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(metrics)
axes[1, 1].legend()
axes[1, 1].set_ylim(0, 1)
axes[1, 1].grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bar in bars1:
    height = bar.get_height()
    axes[1, 1].annotate(f'{height:.3f}', xy=(bar.get_x() + bar.get_width() / 2, height),
                       xytext=(0, 3), textcoords="offset points", ha='center', va='bottom')
for bar in bars2:
    height = bar.get_height()
    axes[1, 1].annotate(f'{height:.3f}', xy=(bar.get_x() + bar.get_width() / 2, height),
                       xytext=(0, 3), textcoords="offset points", ha='center', va='bottom')

# 6. Probability Distribution Analysis
probabilities = np.array(evaluation_results['probabilities'])
continental_probs = probabilities[:, 0]
analytic_probs = probabilities[:, 1]

axes[1, 2].hist(continental_probs, bins=20, alpha=0.7, label='Continental Predictions', color='skyblue', density=True)
axes[1, 2].hist(analytic_probs, bins=20, alpha=0.7, label='Analytic Predictions', color='lightcoral', density=True)
axes[1, 2].set_title('Prediction Confidence Distribution', fontsize=14, fontweight='bold')
axes[1, 2].set_xlabel('Probability Score')
axes[1, 2].set_ylabel('Density')
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Create a summary statistics table
print("\\n📊 TRAINING SUMMARY STATISTICS")
print("=" * 50)
print(f"Final Training Loss: {training_history['train_loss'][-1]:.4f}")
print(f"Final Validation Loss: {training_history['val_loss'][-1]:.4f}")
print(f"Final Validation Accuracy: {training_history['val_accuracy'][-1]:.4f}")
print(f"Final F1-Score: {training_history['val_f1'][-1]:.4f}")
print(f"Best Validation Accuracy: {max(training_history['val_accuracy']):.4f}")
print(f"Best F1-Score: {max(training_history['val_f1']):.4f}")

# Model complexity info
total_params = sum(p.numel() for p in classifier.model.parameters())
trainable_params = sum(p.numel() for p in classifier.model.parameters() if p.requires_grad)

print(f"\\n🔧 MODEL STATISTICS")
print("=" * 30)
print(f"Total Parameters: {total_params:,}")
print(f"Trainable Parameters: {trainable_params:,}")
print(f"Frozen Parameters: {total_params - trainable_params:,}")
print(f"Model Size: ~{total_params * 4 / 1024 / 1024:.1f} MB")

# Training data statistics
print(f"\\n📚 DATA STATISTICS")
print("=" * 25)
print(f"Training Samples: {len(train_loader.dataset)}")
print(f"Validation Samples: {len(val_loader.dataset)}")
print(f"Total Samples: {len(train_loader.dataset) + len(val_loader.dataset)}")
print(f"Batch Size: {train_loader.batch_size}")
print(f"Total Training Batches: {len(train_loader)}")
print(f"Total Validation Batches: {len(val_loader)}")

## 12. Save and Load Model Functions

Finally, let's implement functions to save our trained model and load it for future use.

In [None]:
# Save the trained model
model_save_path = "philosophy_bert_classifier.pth"

print("💾 Saving trained model...")
try:
    classifier.save_model(model_save_path)
    
    # Save training history and evaluation results
    training_data = {
        'training_history': training_history,
        'evaluation_results': {
            'accuracy': evaluation_results['accuracy'],
            'precision_per_class': evaluation_results['precision_per_class'].tolist(),
            'recall_per_class': evaluation_results['recall_per_class'].tolist(),
            'f1_per_class': evaluation_results['f1_per_class'].tolist(),
            'confusion_matrix': evaluation_results['confusion_matrix'].tolist(),
            'class_names': class_names
        },
        'model_info': {
            'total_parameters': total_params,
            'trainable_parameters': trainable_params,
            'training_samples': len(train_loader.dataset),
            'validation_samples': len(val_loader.dataset)
        }
    }
    
    with open("training_results.json", "w") as f:
        json.dump(training_data, f, indent=2)
    
    print("✅ Model and training results saved successfully!")
    print(f"   Model: {model_save_path}")
    print(f"   Training data: training_results.json")
    
except Exception as e:
    print(f"❌ Error saving model: {e}")

# Demonstrate model loading
print(f"\\n🔄 Testing model loading...")

# Create a new classifier instance
new_classifier = PhilosophyClassifier()

try:
    # Load the saved model
    new_classifier.load_model(model_save_path)
    
    # Test that the loaded model works
    test_text = "The dialectical movement of consciousness reveals the self-negating nature of absolute knowledge."
    result = new_classifier.predict_single(test_text)
    
    print("✅ Model loaded successfully!")
    print(f"   Test prediction: {max(result, key=result.get)} ({max(result.values()):.1%} confidence)")
    
except Exception as e:
    print(f"❌ Error loading model: {e}")

# Create a simple function for end users
def load_philosophy_classifier(model_path="philosophy_bert_classifier.pth"):
    """
    Convenience function to load a trained philosophy classifier.
    
    Args:
        model_path: Path to the saved model file
        
    Returns:
        Loaded PhilosophyClassifier instance
    """
    classifier = PhilosophyClassifier()
    classifier.load_model(model_path)
    return classifier

def classify_text(text, model_path="philosophy_bert_classifier.pth"):
    """
    Quick function to classify a single text.
    
    Args:
        text: Philosophical text to classify
        model_path: Path to the saved model file
        
    Returns:
        Dictionary with classification results
    """
    classifier = load_philosophy_classifier(model_path)
    return predict_philosophical_style(text, classifier)

print(f"\\n🎯 USAGE EXAMPLES FOR END USERS")
print("=" * 50)
print("# Load a trained model:")
print("classifier = load_philosophy_classifier('philosophy_bert_classifier.pth')")
print()
print("# Classify a single text:")
print("result = classify_text('Your philosophical text here')")
print("display_prediction(result)")
print()
print("# Or use the loaded classifier directly:")
print("result = predict_philosophical_style('Your text', classifier)")

# Create a final summary
print(f"\\n\\n🎓 PROJECT SUMMARY")
print("=" * 60)
print("✅ Successfully created a BERT-based philosophical text classifier")
print("✅ Trained on Continental vs Analytic philosophy samples")
print(f"✅ Achieved {evaluation_results['accuracy']:.1%} accuracy on validation data")
print("✅ Model saved and ready for deployment")
print()
print("📚 The classifier can distinguish between:")
print("   🏛️  Continental Philosophy (Heidegger, Sartre, Derrida, etc.)")
print("   🔬 Analytic Philosophy (Russell, Quine, Kripke, etc.)")
print()
print("🚀 Ready to classify your own philosophical texts!")
print("   Use the prediction functions above to analyze any text.")