To prune characters effectively using the three metrics (**in-mentions**, **word count**, and **out-mentions**), we can adopt ranking strategies that prioritize **in-mentions** and **word count** while still accounting for **out-mentions**. Below are some approaches to solve this:

---

### **Option 1: Weighted Scoring**
Assign weights to each metric based on its importance. Compute a total score for each character, and filter characters based on their scores:
1. **Define Weights**:
   - Assign higher weights to **in-mentions** and **word count** compared to **out-mentions**.
   - Example: `in-mentions = 0.4`, `word count = 0.4`, `out-mentions = 0.2`.
2. **Score Formula**:
   ```python
   score = 0.4 * normalized_in_mentions + 0.4 * normalized_word_count + 0.2 * normalized_out_mentions
   ```
   Normalize the values to a range of `[0, 1]` using min-max scaling:
   ```python
   normalized_value = (value - min_value) / (max_value - min_value)
   ```
3. **Sort and Filter**:
   - Rank characters by their scores and prune based on a threshold or keep the top N characters.

---

### **Option 2: Multi-Metric Ranking**
Rank characters separately by each metric and combine their rankings to create a composite rank:
1. **Rank by Each Metric**:
   - Create three sorted lists: 
     - By **in-mentions** (descending).
     - By **word count** (descending).
     - By **out-mentions** (descending).
   - Assign a rank to each character for each metric.
2. **Weighted Rank Sum**:
   - Compute a combined rank for each character using weights (e.g., `in-mentions_rank * 0.4 + word_count_rank * 0.4 + out_mentions_rank * 0.2`).
3. **Filter**:
   - Keep characters with the lowest combined ranks.

---

### **Option 3: Pareto Optimal Filtering**
Use a **Pareto frontier approach** to retain characters that are optimal with respect to the three metrics:
1. **Pareto Principle**:
   - A character is considered "better" if no other character outperforms it in all three metrics.
2. **Filter with Thresholds**:
   - Define a threshold for each metric (e.g., `in-mentions > 3`, `word count > 50`) and retain only characters satisfying these thresholds.

---

### **Option 4: Custom Rule-Based Filtering**
Define a set of rules to prune characters explicitly based on your priorities:
1. **Rules**:
   - Retain characters with **in-mentions > X** or **word count > Y** regardless of their out-mentions.
   - Optionally, remove characters with **out-mentions < Z**.
2. **Implementation**:
   - Iterate through the dictionary, applying the rules and removing characters that fail.

---

### **How to Choose?**
- **Weighted Scoring**: Flexible and provides a continuous score. Good if you want to rank characters and pick the top ones.
- **Multi-Metric Ranking**: Provides transparency in how rankings are combined. Better if you want to understand relative importance.
- **Pareto Filtering**: Useful for retaining the "best" characters without predefined thresholds. May require manual inspection of results.
- **Rule-Based Filtering**: Simple and fast if you know specific thresholds for pruning.

---

### Example Code: Weighted Scoring

```python
def prune_characters(char_summary, weights={"in": 0.4, "words": 0.4, "out": 0.2}, keep_top=10):
    # Extract metrics
    in_counts = [len(data["mentioned_by"]) for data in char_summary.values()]
    word_counts = [data["word_count"] for data in char_summary.values()]
    out_counts = [len(data["mentions"]) for data in char_summary.values()]
    
    # Normalize metrics
    def normalize(data):
        min_val, max_val = min(data), max(data)
        return [(x - min_val) / (max_val - min_val) if max_val > min_val else 0 for x in data]
    
    norm_in_counts = normalize(in_counts)
    norm_word_counts = normalize(word_counts)
    norm_out_counts = normalize(out_counts)
    
    # Compute weighted scores
    scores = {}
    for idx, char in enumerate(char_summary):
        scores[char] = (
            weights["in"] * norm_in_counts[idx] +
            weights["words"] * norm_word_counts[idx] +
            weights["out"] * norm_out_counts[idx]
        )
    
    # Sort characters by scores
    sorted_characters = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    
    # Keep top N characters
    pruned_summary = {char: char_summary[char] for char, score in sorted_characters[:keep_top]}
    return pruned_summary
```

### Example Code: Multi-Metric Ranking

```python
def prune_characters_by_rank(char_summary, weights={"in": 0.4, "words": 0.4, "out": 0.2}, keep_top=10):
    # Extract metrics
    in_counts = {char: len(data["mentioned_by"]) for char, data in char_summary.items()}
    word_counts = {char: data["word_count"] for char, data in char_summary.items()}
    out_counts = {char: len(data["mentions"]) for char, data in char_summary.items()}
    
    # Rank characters by each metric
    def rank(data):
        return {k: rank for rank, (k, v) in enumerate(sorted(data.items(), key=lambda x: x[1], reverse=True), 1)}
    
    in_ranks = rank(in_counts)
    word_ranks = rank(word_counts)
    out_ranks = rank(out_counts)
    
    # Compute weighted rank sum
    scores = {
        char: (
            weights["in"] * in_ranks[char] +
            weights["words"] * word_ranks[char] +
            weights["out"] * out_ranks[char]
        )
        for char in char_summary
    }
    
    # Sort characters by scores
    sorted_characters = sorted(scores.items(), key=lambda x: x[1])
    
    # Keep top N characters
    pruned_summary = {char: char_summary[char] for char, score in sorted_characters[:keep_top]}
    return pruned_summary

def pareto_filter(char_summary):
    # Extract metrics
    metrics = [
        (char, len(data["mentioned_by"]), data["word_count"], len(data["mentions"]))
        for char, data in char_summary.items()
    ]
    
    # Pareto frontier computation
    pareto_frontier = []
    for char, in_count, word_count, out_count in metrics:
        if not any(
            other_in > in_count and other_words > word_count and other_out > out_count
            for _, other_in, other_words, other_out in metrics
        ):
            pareto_frontier.append(char)
    
    # Return filtered dictionary
    return {char: char_summary[char] for char in pareto_frontier}


def percentile_filter(char_summary, percentile=90):
    # Extract metrics
    in_counts = [len(data["mentioned_by"]) for data in char_summary.values()]
    word_counts = [data["word_count"] for data in char_summary.values()]
    out_counts = [len(data["mentions"]) for data in char_summary.values()]
    
    # Compute percentile thresholds
    in_threshold = np.percentile(in_counts, percentile)
    word_threshold = np.percentile(word_counts, percentile)
    out_threshold = np.percentile(out_counts, percentile)
    
    # Filter characters
    filtered_summary = {
        char: data for char, data in char_summary.items()
        if len(data["mentioned_by"]) >= in_threshold
        or data["word_count"] >= word_threshold
        or len(data["mentions"]) >= out_threshold
    }
    
    return filtered_summary
```
---

### Key Notes
1. **Pareto Filtering**:
   - Retains characters that cannot be strictly outperformed on all metrics by others.
   - Tends to keep a diverse set of "strong" characters.

2. **Percentile-Based Filtering**:
   - Removes characters that fall below the 90th percentile in **all** metrics.
   - Adjustable by changing the `percentile` parameter (e.g., to 85% or 95%).
