# Novel Characters' Personality Analyzer

This notebook analyzes character personalities from books using NLP techniques.
It will extract character mentions, group name variants, and map adjectives to personality traits.
Intended use is to provide a list of characters to look for and the book's name, and to receive a list of the traits for each of said characters based on how they're depicted in it. If no list is provided, the code will return traits for all the "person" items found through spacy's NER (that also verify a few relevance checks, e.g. number of mentions > 3).

All the way through the code, I'll use "Frankenstein" for examples to better explain how the code works.

In [None]:
#loading all the libraries 
import requests #HTTP requests 
import spacy #nlp lib
import re # finding patterns in text and cleaning
from collections import defaultdict, Counter #autoini for dict, counting
from typing import Dict, List, Tuple, Set #to document what types dict etc are for people with poor memory
import pandas as pd #maybe will be used for some data analysis
from dataclasses import dataclass #nicer classes
import json #maybe for saving results to JSON files

Load large spacy model (ENG). Probably the small one can work fine as well. The trasformer-based one could possibly perform better but I'll stick to this as seen in class

In [3]:
nlp = spacy.load('en_core_web_lg') 

The following class will be used for each reference and include where the sentence starts and ends, its content and context.

In [4]:
@dataclass
class CharacterMention:
    """Represents a character mention in text"""
    name: str
    start: int
    end: int
    sentence: str #sentence
    doc_context: str  #larger sentence

The following class will make up the core part of this work. This design choice came from the intention of trying to make a call for the "Analyzing" process as streamlined as possible in order to allow faster testing and utilize, as not having any true loss (or numerical) function to optimize means that any assessment on the performance can only be done by observing the model's behavior on repeated tasks. Due to markdown comments being hard to implement in the middle of a class, I'll use (''') to comment the most important methods of the code.

In [None]:

class CharacterPersonalityAnalyzer:
    """
    This is the main class for analyzing character personality.
    The pipeline:
    1 Fetch and preprocess books from Project Gutenberg
    2 Extract character mentions using NER
    3 Group character name variants
    4 Extract adjectives associated with characters using dependency parsing
    5 Map adjectives to personality traits and calculate scores
    """

    def __init__(self):
        self.books = {}  # dict book_name -> text
        self.character_mentions = defaultdict(list)  # dict book_name -> list [CharacterMention]
        self.character_variants = defaultdict(set)  # dict canonical_name -> {name_variants}
        self.personality_traits = self._load_personality_lexicon() #the trait adjective map (dict)
        
    def _load_personality_lexicon(self):
        """This acts as a function from adjectives to personality traits.
        Inititally it was crafted using Oxford dictionary and my limited knowledge of english, the result was disappointing.
        Then I printed the adjectives I was pulling and added some from there, but while proceding it seemed daunting and still failed to generalize,
        Finally, defeated by the weakness of us flesh-beings I had a gpt model do it for me, which seems to work better.
        Can be greatly expanded upon to improve performance, as several terms in the books are not modern english and are not caught by this mapping.
        Output a dictionary mapping trait names to lists of associated adjectives.
        """
        return {
            'brave': ['brave', 'courageous', 'bold', 'fearless', 'heroic', 'valiant', 'daring', 'gallant', 'intrepid', 'audacious', 'resolute', 'undaunted'],
            'kind': ['kind', 'gentle', 'compassionate', 'caring', 'tender', 'benevolent', 'merciful', 'sympathetic', 'altruistic', 'empathetic', 'gracious', 'humane', 'warmhearted'],
            'intelligent': ['intelligent', 'smart', 'clever', 'wise', 'brilliant', 'astute', 'shrewd', 'sharp', 'analytical', 'perceptive', 'knowledgeable', 'rational', 'lucid'],
            'cruel': ['cruel', 'harsh', 'brutal', 'merciless', 'ruthless', 'savage', 'vicious', 'heartless', 'malevolent', 'sadistic', 'inhumane', 'callous'],
            'proud': ['proud', 'arrogant', 'haughty', 'conceited', 'vain', 'pompous', 'boastful', 'overconfident', 'self-important', 'egotistical'],
            'humble': ['humble', 'modest', 'meek', 'unassuming', 'self-effacing', 'unpretentious', 'deferential', 'reserved', 'lowly'],
            'loyal': ['loyal', 'faithful', 'devoted', 'true', 'steadfast', 'reliable', 'trustworthy', 'constant', 'dedicated', 'dependable'],
            'treacherous': ['treacherous', 'deceitful', 'dishonest', 'false', 'traitorous', 'unfaithful', 'duplicitous', 'perfidious', 'scheming', 'two-faced'],
            'strong': ['strong', 'powerful', 'mighty', 'robust', 'sturdy', 'stalwart', 'resilient', 'tenacious', 'determined', 'unshakable'],
            'weak': ['weak', 'feeble', 'frail', 'fragile', 'delicate', 'vulnerable', 'infirm', 'impotent', 'timid', 'ineffectual'],
            'obsessive': ['obsessive', 'fixated', 'compulsive', 'single-minded', 'driven', 'monomaniacal', 'fanatical', 'relentless'],
            'remorseful': ['remorseful', 'repentant', 'contrite', 'penitent', 'rueful', 'guilt-ridden', 'ashamed', 'self-reproaching'],
            'isolated': ['isolated', 'lonely', 'solitary', 'alienated', 'estranged', 'detached', 'reclusive', 'abandoned'],
            'sublime': ['sublime', 'majestic', 'awe-inspiring', 'exalted', 'transcendent', 'solemn', 'grand', 'lofty'],
            'sensitive': ['sensitive', 'emotional', 'empathetic', 'tender-hearted', 'perceptive', 'touchy', 'vulnerable', 'expressive'],
            'vindictive': ['vindictive', 'vengeful', 'spiteful', 'wrathful', 'punitive', 'resentful', 'malicious', 'retaliatory'],
            'naive': ['naive', 'innocent', 'guileless', 'ingenuous', 'unworldly', 'trusting', 'credulous', 'childlike'],
            'pious': ['pious', 'devoted', 'reverent', 'spiritual', 'faithful', 'godly', 'righteous', 'churchgoing'],
            'passionate': ['passionate', 'fiery', 'ardent', 'fervent', 'zealous', 'impetuous', 'intense', 'emotional'],
            'cowardly': ['cowardly', 'timid', 'fearful', 'craven', 'faint-hearted', 'spineless', 'hesitant', 'weak-kneed'],
            'heroic': ['heroic', 'righteous', 'noble', 'honorable', 'just', 'virtuous', 'gallant', 'selfless'],
            'complex': ['complex', 'conflicted', 'morally ambiguous', 'multifaceted', 'layered', 'ambivalent', 'contradictory'],
            'malicious': ['malicious', 'malevolent', 'wicked', 'evil', 'sinister', 'malignant', 'villainous', 'nefarious'],
            'resilient': ['resilient', 'determined', 'persistent', 'tenacious', 'steadfast', 'enduring', 'tough', 'unbreakable'],
            'worldly-wise': ['worldly-wise', 'savvy', 'seasoned', 'sophisticated', 'experienced', 'astute', 'pragmatic'],
            'melancholic': ['melancholic', 'gloomy', 'mournful', 'sorrowful', 'despondent', 'dejected', 'forlorn', 'woeful'],
            'manipulative': ['manipulative', 'scheming', 'conniving', 'devious', 'calculating', 'controlling', 'coercive'],
            'nurturing': ['nurturing', 'caring', 'motherly', 'protective', 'supportive', 'comforting', 'kind-hearted'],
            'just': ['just', 'fair', 'equitable', 'righteous', 'impartial', 'upright', 'principled'],
            'charismatic': ['charismatic', 'charming', 'persuasive', 'magnetic', 'captivating', 'engaging', 'likeable'],
            'reckless': ['reckless', 'rash', 'impulsive', 'careless', 'foolhardy', 'hotheaded', 'brash'],
            'stoic': ['stoic', 'calm', 'composed', 'unflappable', 'detached', 'impassive', 'resigned'],
            'ambitious': ['ambitious', 'aspiring', 'driven', 'determined', 'goal-oriented', 'enterprising', 'striving'],
            'honest': ['honest', 'truthful', 'sincere', 'candid', 'forthright', 'genuine', 'open'],
            'greedy': ['greedy', 'avaricious', 'covetous', 'materialistic', 'grasping', 'self-indulgent'],
            'generous': ['generous', 'charitable', 'giving', 'selfless', 'open-handed', 'philanthropic', 'bountiful'],
            'jealous': ['jealous', 'envious', 'covetous', 'possessive', 'suspicious', 'resentful'],
            'dutiful': ['dutiful', 'obedient', 'responsible', 'respectful', 'conscientious', 'loyal', 'reliable'],
            'apathetic': ['apathetic', 'indifferent', 'unemotional', 'detached', 'unconcerned', 'uninvolved', 'listless']
        }
    
    
    #-------------------------------Step 1 : Getting the Text for the Books-----------------------------------
    '''
    With the first two methods I define the methods for fetching for the books I'll use. Ideally, this tool can be generalized to many other books. 
    These are fetched from Project Gutenberg (PG), a library of over 75,000 free eBooks of great classics. 
    Link to their main page [here](https://www.gutenberg.org/). 
    You are encouraged to try other books if any character is not known well enough to evaluate his/her personality. 
    Removing PG's metadata is also necessary to prevent errors and imprecisions.
    '''   
    def fetch_book(self, url: str, book_name: str):
        """
        url is PG's URL for the book
        """
        print(f"fetching {book_name}...")
        response = requests.get(url)
        # remove Project Gutenberg headers/footers calling the other method)
        text = self._clean_gutenberg_text(response.text)
        self.books[book_name] = text
        print(f"{book_name} loaded ({len(text)} characters)")
    
    def _clean_gutenberg_text(self, text: str) -> str:
        """ Remove PG metadata and clean text
         Remove everything before "START OF THE PROJECT GUTENBERG EBOOK".
         Visual inspection of a sample of ebooks shown that they all have this line, sometimes this won't exclude the "front page" or the Preface though, but this is a very minor problem      .
         However in case a book would differ it should be easy to tweak the following line on a case by case scenario
         """
        start_pattern = r'\*\*\* START OF (?:THE |THIS )?PROJECT GUTENBERG EBOOK.*?\*\*\*'
        end_pattern = r'\*\*\* END OF (?:THE |THIS )?PROJECT GUTENBERG EBOOK.*?\*\*\*'
        
        start_match = re.search(start_pattern, text) #using regex to find the start/end
        if start_match:
            text = text[start_match.end():]
        
        end_match = re.search(end_pattern, text)
        if end_match:
            text = text[:end_match.start()]
        
        return text.strip()
    
    # ------------------------------------Step 2: Named Entity Recognition-----------------------------------------------
    def extract_characters(self, book_name: str, target_characters: List[str] = None):
        """
        Extract character mentions from the book using NER and/or targeted search.
            
        This is the core of the character detection. Using spacy's NER to find
        PERSON entities, but it can also look for specific characters if provided.

        target_characters is the optional list of specific characters to look for. If false, extracts all PERSON entities found by NER.
        """
        # Check
        if book_name not in self.books:
            raise ValueError(f"Book '{book_name}' not loaded")
        
        if target_characters:
            print(f"Looking for specific characters in {book_name}: {target_characters}")
        else:
            print(f"Extracting all characters from {book_name}...")
            
        text = self.books[book_name]
        # process the text with spacy (yeah this does all the nlp stuff, tokenization, deoendency parsing, whitespace normalization etc)
        doc = nlp(text)
        
        character_mentions = []
        
        # Acts differently depending on if the character list is present, either target search or ner 
        if target_characters:
            # Look for specific characters through a later defined f
            character_mentions = self._find_target_characters(doc, target_characters)
        else:
            # Extract all person entities through a later defined f
            for ent in doc.ents:
                if ent.label_ == "PERSON":
                    mention = self._create_character_mention(ent, doc)
                    character_mentions.append(mention)
        
        self.character_mentions[book_name] = character_mentions
        print(f"Found {len(character_mentions)} character mentions")
    
    def _find_target_characters(self, doc, target_characters: List[str]) -> List[CharacterMention]:
        """
        Just picking the person token after ner. 
        Originally I tried to implement also other methods to enforce the results compared to just picking from ner but it was useless
        """
        mentions = []
        
        # Base approach of trying to find them through ner
        for ent in doc.ents:
            if ent.label_ == "PERSON":
                if self._is_target_character(ent.text, target_characters):
                    mention = self._create_character_mention(ent, doc)
                    mentions.append(mention)
        #other approaches were removed as never beneficial
        return mentions
    
    def _is_target_character(self, found_name: str, target_characters: List[str]) -> bool:
        """Check if a found name matches any target character
        Input:
            A single found name string
            A list of target names
        Output:
            Bool (Belongs or does not)
        """
        found_lower = found_name.lower()
        for target in target_characters:
            target_lower = target.lower()
            # Check for exact match or partial match. I tried different implementation but no matter what it seems this is sometime not working
            # this try to adress all the cases:
            # case 1: target is in found name
            # case 2: found name is in target
            # case 3: any word from target appears in found name
            if (target_lower in found_lower or 
                found_lower in target_lower or
                any(part in found_lower for part in target_lower.split())):
                return True
        return False
    
    def _find_character_by_name(self, doc, character_name: str) -> List[CharacterMention]:
        """
        Finds all occurrences of a specific name in the text
        Input:
            Spacy document
            Single character name to search for
        Output:
            List of CharacterMention
        """
        mentions = []
        char_lower = character_name.lower()
        char_words = char_lower.split()
        
        for sent in doc.sents:
            sent_lower = sent.text.lower()
            
            # Look for exact matches first
            if char_lower in sent_lower:
                start_idx = sent_lower.find(char_lower)
                if start_idx != -1:
                    mention = CharacterMention( #this part does basically the same both for exact and partial
                        name=character_name,
                        start=sent.start_char + start_idx,
                        end=sent.start_char + start_idx + len(character_name),
                        sentence=sent.text.strip(),
                        doc_context=self._get_context_around_sentence(doc, sent) #also get context around the sentence
                    )
                    mentions.append(mention)
            
            # Again also look for partial matches 
            #Here I'm trying to consider all the cases like reverse
            elif all(word in sent_lower for word in char_words):
                tokens = [token.text.lower() for token in sent]
                for i in range(len(tokens)):
                    if tokens[i] == char_words[0]:
                        if tokens[i:i+len(char_words)] == char_words:
                            mention = CharacterMention( #this part does basically the same both for exact and partial
                                name=character_name,
                                start=sent[i].idx,
                                end=sent[i+len(char_words)-1].idx + len(sent[i+len(char_words)-1].text),
                                sentence=sent.text.strip(),
                                doc_context=self._get_context_around_sentence(doc, sent) #also get context around the sentence
                            )
                            mentions.append(mention)
        
        return mentions
    
    def _create_character_mention(self, ent, doc) -> CharacterMention:
        """Create a Charactermention object from a spacy entity"""
        sentence = ent.sent.text.strip()
        context = self._get_context_around_sentence(doc, ent.sent)
        
        return CharacterMention(
            name=ent.text,
            start=ent.start_char,
            end=ent.end_char,
            sentence=sentence,
            doc_context=context
        )
    
    def _get_context_around_sentence(self, doc, sentence):
        """Get context around a sentence"""
        sent_start = sentence.start
        sent_end = sentence.end
        context_start = max(0, sent_start - 50)  # 75 tokens before
        context_end = min(len(doc), sent_end + 50)  # 75 tokens after
        return doc[context_start:context_end].text
    
    #----------------------------------------Step 3: Character Name Variant Grouping----------------------------------------
    
    def group_character_variants(self, book_name: str, min_mentions: int = 3):
        """
        Group character name variants (e.g., 'Victor', 'Victor Frankenstein', 'Dr. Frankenstein')
        Takes a book name and minimum mention threshold (def 3)
        """
        #Check
        if book_name not in self.character_mentions:
            raise ValueError(f"no character mentions found for '{book_name}'")
        
        print(f"Grouping characters variants for {book_name}...")
        mentions = self.character_mentions[book_name] #get all the mentions
        name_counts = Counter(mention.name for mention in mentions) # count all character names
        
        # check count of name vs threshold
        frequent_names = {name for name, count in name_counts.items() if count >= min_mentions}
        
        # My goal here is to create another mapping from partials to full name, so {Frankenstein, Victor, V.Frankenstein} -> Victor Frankenstein.
        
        grouped_chars = {}  # create dict that will store the grouped character names
        processed = set()   # keep track of already grouped names
        #sorting longer names first to try to pick more complete names as canonical
        for name in sorted(frequent_names, key=len, reverse=True):
            if name in processed:
                continue

            canonical = name
            variants = set([name])
            name_parts = set(name.lower().split())

            # look for any other names that share at least one part
            for other_name in frequent_names:
                if other_name == name or other_name in processed:
                    continue
                other_parts = set(other_name.lower().split())
                if name_parts & other_parts:  # if they share any part
                    variants.add(other_name)
                    processed.add(other_name)

            
            processed.add(name)
            grouped_chars[canonical] = variants

        #store 
        for canonical, variants in grouped_chars.items():
            self.character_variants[f"{book_name}:{canonical}"] = variants

        print(f"Grouped into {len(grouped_chars)} main characters:")
        for canonical, variants in sorted(grouped_chars.items()):
            print(f"  - {canonical}: {variants}")

        return grouped_chars

    #---------------------------------------------- Step 4: Dependency Parsing and Adjective Extraction-----------------------------------
    def extract_character_adjectives(self, book_name: str):
        """
        Extracts the adjectives associated with character mentions using spacy dependency parsing, analyzing synctactic relationships between words.
        Output:
             Dictionary mapping character names to their associated adjectives
        """
        print(f"Extcracting adjectives for characters in {book_name}...")
        
        character_adjectives = defaultdict(list)
        mentions = self.character_mentions[book_name]
        
        for mention in mentions:
            # process the context around the character mention, probably not the most efficient solution to re-run NLP, could there be a way to store pre-processed Doc
            doc = nlp(mention.doc_context)
            
            # finding the character entity in the processed context
            char_tokens = []
            for ent in doc.ents:
                if ent.label_ == "PERSON" and self._names_match(ent.text, mention.name):
                    char_tokens.extend(ent)
            
            #if we didnt find the entity, try to find tokens by text matching
            if not char_tokens:
                for token in doc:
                    if mention.name.lower() in token.text.lower():
                        char_tokens.append(token)
            
            # Extract adjectives related to these character tokens
            for char_token in char_tokens:
                adjectives = self._extract_adjectives_for_token(char_token)
                character_adjectives[mention.name].extend(adjectives)
        
        return character_adjectives
    
    def _names_match(self, name1: str, name2: str) -> bool:
        """
        Case-insensitive check if names share either:
        full containment or any common word part.
        """
        return (name1.lower() in name2.lower() or 
                name2.lower() in name1.lower() or
                set(name1.lower().split()) & set(name2.lower().split()))
    
    def _extract_adjectives_for_token(self, token) -> List[str]:
        """
            Extracts adjectives that are semantically or syntactically related to a given token.
    
            The function checks for a list of ways adjectives are typically used in written language, 
            further expanding the cases would improve the performance as my information on the subject is rather limited 
            Possibly there are cases beyond these 6.
            1. Direct adj. (e.g. "the ferocious Creature")
            2. Subject complements ("the Creature is ferocious")
            3. Clause descriptions ("the Creature, who was ferocious, attacked")
            4. Appositional adj. ("the Creature, ferocious, attacked.)
            5. Conjoined adj. ("brutal and cunning")
            6. Adj. in the same noun phrase ("the brutal and cunning Creature")
        """
        adjectives = []
        
        # 1. Direct modifiers 
        for child in token.children:
            if child.dep_ == "amod":
                adjectives.append(child.lemma_.lower())
        
        # 2. Subject complements
        if token.dep_ in ["nsubj", "nsubjpass"]:
            for child in token.head.children:
                if child.dep_ in ["acomp", "attr"] and child.pos_ == "ADJ":
                    adjectives.append(child.lemma_.lower())
        
        # 3. Clause descriptions
        for ancestor in token.ancestors:
            for child in ancestor.children:
                if child.dep_ == "relcl" and child.pos_ == "VERB":
                    for adj in child.subtree:
                        if adj.pos_ == "ADJ":
                            adjectives.append(adj.lemma_.lower())
        
        # 4. Appositional adj.
        if token.dep_ == "appos":
            for child in token.head.children:
                if child.pos_ == "ADJ":
                    adjectives.append(child.lemma_.lower())
        
        # 5. Conjoined adj.
        if token.dep_ == "conj":
            if token.head.pos_ == "ADJ":
                adjectives.append(token.head.lemma_.lower())
        
        # 6. Adj. in the same phrase
        for ancestor in token.ancestors:
            if ancestor.pos_ == "NOUN":
                for child in ancestor.children:
                    if child.pos_ == "ADJ" and child != token:
                        adjectives.append(child.lemma_.lower())
        
        return list(set(adjectives))  # Remove duplicates
    
    #--------------------------------------------- Step 5: Personality Trait Mapping----------------------------------
    def map_adjectives_to_traits(self, character_adjectives: Dict[str, List[str]]) -> Dict[str, Dict[str, float]]:
        """
        Map adjectives to personality traits and calculate scores
        Input:
            dict of character to adj.
        OUtput:
            dict of character to dict of traits to score
        """
        character_personalities = {}
        
        for character, adjectives in character_adjectives.items():
            trait_scores = defaultdict(float)
            adj_counter = Counter(adjectives)
            
            # calculate score for each trait
            for trait, trait_adjectives in self.personality_traits.items():
                score = 0
                for adj in trait_adjectives:
                    if adj in adj_counter:
                        score += adj_counter[adj]
                
               
                total_adj = sum(adj_counter.values())
                if 0 < total_adj:
                    trait_scores[trait] = score
                    
            character_personalities[character] = dict(trait_scores)
        
        return character_personalities
    
    # -----------------------------------------Step 6: Complete Analysis pipeline---------------------------------------------
    def analyze_book(self, url: str, book_name: str, target_characters: List[str] = None):
        """
        This is the final function that executes the 5 steps of the character personality analysis pipeline, 
        from URL (and possibly character list) to final trait profiles.
        """
        print(f"\n=== Analyzing {book_name} ===")
        
        # Step 1: Fetch book
        self.fetch_book(url, book_name)
        
        # Step 2: Extract characters
        self.extract_characters(book_name, target_characters)
    
        # Step 3: Group character variants
        character_groups = self.group_character_variants(book_name)
        
        # Step 4: Extract adjectives
        character_adjectives = self.extract_character_adjectives(book_name)
        
        # Merge adjectives for character variants
        merged_adjectives = defaultdict(list)
        for canonical, variants in character_groups.items():
            for variant in variants:
                if variant in character_adjectives:
                    merged_adjectives[canonical].extend(character_adjectives[variant])
    
        # Step 5: Map to personality traits
        personalities = self.map_adjectives_to_traits(merged_adjectives)
        #Printing
        print(f"\n==== Character personalities in {book_name} ===")
        has_results = False
        for character, traits in personalities.items():
            if any(score > 0 for score in traits.values()):
                has_results = True
                print(f"\n{character}:")
                sorted_traits = sorted(traits.items(), key=lambda x: x[1], reverse=True)
                for trait, score in sorted_traits[:5]:  # Top 5 traits sorted by importance (score)
                    if score > 0:
                        label = self._get_trait_label(score)
                        print(f"  {trait}: ({label})")
        
        if not has_results:
            print("\nNo personality trait found for any characters")
        
        return personalities

    
    def _get_trait_label(self, score: float) -> str:
        """Get descriptive label for a trait score"""
        if score == 1:
            return "has been called"
        elif 2 <= score <= 3:
            return "is often"
        elif 4 <= score <= 5:
            return "is very"
        elif 6 <= score <= 7:
            return "is incredibly"
        elif score >= 8:
            return "is a paragon of"
#end of the very long class

Here's a collection of other url for books you may want to try (I didn't try them all, these are just for easy of use):

Pride and Prejudice (Jane Austen) https://www.gutenberg.org/files/1342/1342-0.txt

Moby Dick (Herman Melville) https://www.gutenberg.org/files/2701/2701-0.txt

Dracula (Bram Stoker) https://www.gutenberg.org/files/345/345-0.txt

Crime and Punishment (Dostoevsky) https://www.gutenberg.org/files/2554/2554-0.txt

The Strange Case of Dr. Jekyll and Mr. Hyde https://www.gutenberg.org/files/43/43-0.txt

The Picture of Dorian Gray (Oscar Wilde) https://www.gutenberg.org/files/174/174-0.txt

In [None]:
# Usage Examples
def main():
    #----------------------------- Examples Analyzing with specific characters list--------------------------------------
    analyzer = CharacterPersonalityAnalyzer()
    
    # Analyze Frankenstein with specific characters.
    frankenstein_characters = [
        'Victor Frankenstein', 'Elizabeth', 'Creature', 'Clerval'
    ]
    
    frankenstein_personalities = analyzer.analyze_book(
        "https://www.gutenberg.org/files/84/84-0.txt", 
        "Frankenstein",
        target_characters=frankenstein_characters
    )
  
    # Analyze The Betrothed with specific characters
    betrothed_characters = [
        'Lorenzo Tramaglino', 
        'Lucia Mondella',
        'Don Rodrigo',
        'Fra Cristoforo',
        'Don Abbondio'
    ]
    
    betrothed_personalities = analyzer.analyze_book(
        "https://www.gutenberg.org/files/35155/35155-0.txt", 
        "The Betrothed",
        target_characters=betrothed_characters
    )

    
    
   
   
    
    #-------------------------------- Examples with pure NER (no target characters) (these are obviously longer, be patient)--------------------
    '''
     Discover relevant characters through NER
     analyzer.analyze_book(url, book_name)   (No target_characters parameter)
    '''
    analyzer = CharacterPersonalityAnalyzer()
    
    #print("\n=== Analyzing Pride and Prejudice with NER ===")
    #pride_prejudice_personalities = analyzer.analyze_book(
        #"https://www.gutenberg.org/files/1342/1342-0.txt",
        #"Pride and Prejudice"
    #)
    print("\n=== Analyzing Frankeinstein with NER ===")
    Frankeinstein_personalities = analyzer.analyze_book(
        "https://www.gutenberg.org/files/84/84-0.txt",
        "Frankenstein"
    )
    


if __name__ == "__main__":
    main()


=== Analyzing Frankenstein ===
fetching Frankenstein...
Frankenstein loaded (426692 characters)
Looking for specific characters in Frankenstein: ['Victor Frankenstein', 'Elizabeth', 'Creature', 'Clerval']
Found 191 character mentions
Grouping characters variants for Frankenstein...
Grouped into 4 main characters:
  - Clerval: {'Clerval'}
  - Elizabeth Lavenza: {'Elizabeth Lavenza', 'Elizabeth'}
  - Frankenstein: {'Frankenstein'}
  - Victor: {'Victor'}
Extcracting adjectives for characters in Frankenstein...

==== Character personalities in Frankenstein ===

Elizabeth Lavenza:
  brave: (has been called)
  heroic: (has been called)

Frankenstein:
  just: (has been called)

Clerval:
  heroic: (has been called)

=== Analyzing The Betrothed ===
fetching The Betrothed...
The Betrothed loaded (966279 characters)
Looking for specific characters in The Betrothed: ['Lorenzo Tramaglino', 'Lucia Mondella', 'Don Rodrigo', 'Fra Cristoforo', 'Don Abbondio']
Found 947 character mentions
Grouping char


# INTERPRETING RESULTS
 The personality traits extracted for characters depend heavily on how often and how explicitly traits are attributed to them through adjectives. 
 It is also to note that the decision to not normalize the results was taken, so while it is indeed true that if a character is called 3 times cool he's probably a cool guy, it is also true that longer novels have better chance of describing some characters in some way that is captured by this program.
 Running this code on "The Sentinel" of Arthur C. Clarke is therefore likely to yeld no result.
 Note that this alspo means characters that are central on average receive more descriptive attention, so their personality profiles tend to be more complete. Less central characters often fall below the mention threshold or lack adjectives captured in my mapping altogether.



# Limitations              
1. The system struggles handling typical nicknames, like "Renzo" for "Lorenzo" or "Lizzy" for "Elizabeth" . This is also true for variation of the same entity, like "The Creature" or "The Monster" that are considered two different beings. This could be mitgated by also implementing a mapping of most common nicknames.

2. Viktor Frankeinstein gave me a few problems: there are more than one F. in the book, but due to the style of the book he is rarely adressed by name, even less often to attribute him adjectives.

3. Depending on writing style characters are described by actions and not adjectives. Frankenstein's obsession for death is never described explicity as "V.F. is obsessed by the idea of defeating death". This key component of his personality is instead easily understandable by his vicissitudes. A good number of authors of more famous books use rarely direct descriptions of their characters.

# Possible improvements 
1. Pronouns ("he," "the scientist" etc.) divide trait attribution by treating references to the same character as separate entities. From my understanding, Coreference resolution tools could stitch these fragments into unified profiles, but I failed to implement it in the code as in Spacy it is relegated to the experimental package, which I couldn't add to the pipeline as it was exclusive of an older version of spacy. 

2. There are a few things (e.g. the trait-adjective map, the adjective extractor function) that the larger and more accurate they are the better this code will perform, expanding those more carefully would improve the results.

3. Manual mappings for tasks like adjective extraction limit scalability. I had to hardcode several part of this code due to my inability to found already made python libraries, but finding and using something like that, provided they are properly formulated, would improve the results. Another road could be to rely more on Neural Networks (Bert-based systems specifically) for these tasks they excel in generalizing, but I left them out on purpose to avoid solving all the problems with the same tool.