# Smart Parser

Ziel ist ein Parser, der einfache Sätze besser versteht. Eingaben sind gewöhnliche Text-Adventure-Sätze, Ausgaben sind die Commands.

Der Parser benötigt mehrere Stufen um die Syntax zu verstehen, diese mit den erlaubten Verben zu Matchen und die Objekte zu identifizieren.

Verwendet werden:
- SpaCy: Model zur Analyse und Annotation der eingegebenen Sätze. Vermutlich "de_dep_news_trf".
- SentenceTransformer: Zum Matching der Commands, vermutlich mit "paraphrase-multilingual-MiniLM-L12-v2".
- Neo4J: Zum identifizieren der Objekte.

In [30]:
import spacy 
from sentence_transformers import SentenceTransformer, util

# parsing_model_lg = spacy.load("de_core_news_lg")
parsing_model_trf = spacy.load("de_dep_news_trf")
matching_model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

In [31]:
# doc_lg = parsing_model_lg("Nimm den goldenen Schlüssel und öffne die verzauberte Truhe")
doc_trf = parsing_model_trf("Nimm den goldenen Schlüssel und öffne die verzauberte Truhe")

# spacy.displacy.render(doc_lg, style='dep', jupyter=True)
spacy.displacy.render(doc_trf, style='dep', jupyter=True)

## Testsätze

KI-generierte Sätze verschiedener Kategorien inklusive expected outcome.

In [32]:
basic = [
    {
        'sentence': "Nimm den Schlüssel",
        'expected': {'command': 'take', 'objects': ['Schlüssel']}
    },
    {
        'sentence': "Lege das Schwert ab",
        'expected': {'command': 'drop', 'objects': ['Schwert']}
    },
    {
        'sentence': "Geh zur Taverne",
        'expected': {'command': 'go', 'objects': ['Taverne']}
    },
    {
        'sentence': "Untersuche die Truhe",
        'expected': {'command': 'examine', 'objects': ['Truhe']}
    },
    {
        'sentence': "Lies das Buch",
        'expected': {'command': 'read', 'objects': ['Buch']}
    },
    {
        'sentence': "Benutze den Schlüssel",
        'expected': {'command': 'use', 'objects': ['Schlüssel']}
    }
]

trennbar = [
    {
        'sentence': "Nimm den Schlüssel auf",
        'expected': {'command': 'take', 'objects': ['Schlüssel']}
    },
    {
        'sentence': "Wirf den Apfel weg",
        'expected': {'command': 'drop', 'objects': ['Apfel']}
    },
    {
        'sentence': "Lauf zur Taverne",
        'expected': {'command': 'go', 'objects': ['Taverne']}
    },
    {
        'sentence': "Sieh dir die Karte an",
        'expected': {'command': 'examine', 'objects': ['Karte']}
    },
    {
        'sentence': "Les das Pergament vor",
        'expected': {'command': 'read', 'objects': ['Pergament']}
    },
    {
        'sentence': "Wende den Trank an",
        'expected': {'command': 'use', 'objects': ['Trank']}
    }
]

komplex = [
    {
        'sentence': "Nimm den goldenen Schlüssel",
        'expected': {'command': 'take', 'objects': ['goldenen Schlüssel']}
    },
    {
        'sentence': "Lege das schwere eiserne Schwert ab",
        'expected': {'command': 'drop', 'objects': ['schwere eiserne Schwert']}
    },
    {
        'sentence': "Geh in die dunkle Taverne",
        'expected': {'command': 'go', 'objects': ['dunkle Taverne']}
    },
    {
        'sentence': "Untersuche das mysteriöse Buch",
        'expected': {'command': 'examine', 'objects': ['mysteriöse Buch']}
    },
    {
        'sentence': "Lies die alte Inschrift",
        'expected': {'command': 'read', 'objects': ['alte Inschrift']}
    },
    {
        'sentence': "Öffne die verzauberte Truhe",
        'expected': {'command': 'use', 'objects': ['verzauberte Truhe']}
    }
]

praepositionen = [
    {
        'sentence': "Hole den Schlüssel aus der Truhe",
        'expected': {'command': 'take', 'objects': ['Schlüssel']}
    },
    {
        'sentence': "Lege das Schwert auf den Tisch",
        'expected': {'command': 'drop', 'objects': ['Schwert']}
    },
    {
        'sentence': "Gehe in die Taverne",
        'expected': {'command': 'go', 'objects': ['Taverne']}
    },
    {
        'sentence': "Sieh dir die Inschrift an der Wand an",
        'expected': {'command': 'examine', 'objects': ['Inschrift']}
    },
    {
        'sentence': "Lies den Text auf dem Schild",
        'expected': {'command': 'read', 'objects': ['Text']}
    },
    {
        'sentence': "Öffne die Tür mit dem Schlüssel",
        'expected': {'command': 'use', 'objects': ['Tür', 'Schlüssel']}
    }
]

synonyme = [
    {
        'sentence': "Greif nach dem Schwert",
        'expected': {'command': 'take', 'objects': ['Schwert']}
    },
    {
        'sentence': "Lass das Schwert fallen",
        'expected': {'command': 'drop', 'objects': ['Schwert']}
    },
    {
        'sentence': "Besuche die Taverne",
        'expected': {'command': 'go', 'objects': ['Taverne']}
    },
    {
        'sentence': "Betrachte das Gemälde",
        'expected': {'command': 'examine', 'objects': ['Gemälde']}
    },
    {
        'sentence': "Durchlese das Dokument",
        'expected': {'command': 'read', 'objects': ['Dokument']}
    },
    {
        'sentence': "Verwende den Hebel",
        'expected': {'command': 'use', 'objects': ['Hebel']}
    }
]

schwierig = [
    {
        'sentence': "Schnapp dir den Schlüssel",
        'expected': {'command': 'take', 'objects': ['Schlüssel']}
    },
    {
        'sentence': "Schmeiß das Schwert weg",
        'expected': {'command': 'drop', 'objects': ['Schwert']}
    },
    {
        'sentence': "Mach dass du zur Taverne kommst",
        'expected': {'command': 'go', 'objects': ['Taverne']}
    },
    {
        'sentence': "Guck dir das mal genauer an",
        'expected': {'command': 'examine', 'objects': []}
    },
    {
        'sentence': "Was steht da drauf?",
        'expected': {'command': 'read', 'objects': []}
    },
    {
        'sentence': "Probier mal den Schalter aus",
        'expected': {'command': 'use', 'objects': ['Schalter']}
    }
]

edge_cases = [
    {
        'sentence': "Nimm Schlüssel",
        'expected': {'command': 'take', 'objects': ['Schlüssel']}
    },
    {
        'sentence': "Schwert ablegen",
        'expected': {'command': 'drop', 'objects': ['Schwert']}
    },
    {
        'sentence': "Ich möchte zur Taverne gehen",
        'expected': {'command': 'go', 'objects': ['Taverne']}
    },
    {
        'sentence': "Kannst du dir das ansehen?",
        'expected': {'command': 'examine', 'objects': []}
    },
    {
        'sentence': "",
        'expected': None
    }
]

all_tests = {
    'basic': basic,
    'trennbar': trennbar,
    'komplex': komplex,
    'praepositionen': praepositionen,
    'synonyme': synonyme,
    'schwierig': schwierig,
    'edge_cases': edge_cases
}

## NLP Processing

In [33]:
# Parser für einzelne Items aus den Testdaten
def add_trf_parsing(items):

    doc = parsing_model_trf(items['sentence'])
    
    # Fügt Verb und Objekte den Testdaten hinzu
    items['rootverb'] = None
    items['objects'] = []
    
    for token in doc:
        # Hauptverb finden (lemma_)
        if token.dep_ == "ROOT":
            items['rootverb'] = token.lemma_
        # Objekte finden
        if token.dep_ in ['obj', 'dobj', 'oa', 'pobj']:
            items['objects'].append(token)

# Command Embedding

In [34]:
# Command-Struktur:
# Meta (hardcoded): inventory, quit, help, look
# In-World (geparst): Commands unten

command_verbs = {
    # Beobachtung
    'examine': [
        'untersuchen', 'betrachten', 'ansehen', 'anschauen', 'inspizieren', 'prüfen', 'mustern',
        'untersuch', 'betracht', 'sieh an', 'schau an', 'guck an'  # Imperativ
    ],
    'read': [
        'lesen', 'durchlesen', 'vorlesen',
        'lies', 'les'  # Imperativ
    ],
    
    # Bewegung
    'go': [
        'gehen', 'laufen', 'bewegen', 'besuchen', 'kommen',
        'geh', 'lauf', 'beweg', 'besuch', 'komm'  # Imperativ
    ],
    
    # Objekt-Interaktion
    'take': [
        'nehmen', 'holen', 'packen', 'greifen', 'schnappen', 'aufheben', 'raffen',
        'nimm', 'hol', 'pack', 'greif', 'schnapp'  # Imperativ
    ],
    'drop': [
        'ablegen', 'werfen', 'lassen', 'fallenlassen', 'wegwerfen', 'schmeißen',
        'leg ab', 'wirf', 'lass', 'schmeiß'  # Imperativ
    ],
    'use': [
        'benutzen', 'verwenden', 'anwenden', 'öffnen', 'betätigen', 'probieren', 'aktivieren',
        'benutz', 'verwend', 'wend an', 'öffne', 'probier'  # Imperativ
    ],
}

# Commands embedden
command_verb_embeddings = {}

for cmd, verbs in command_verbs.items():
    command_verb_embeddings[cmd] = matching_model.encode(verbs)

In [35]:
# Embedding und Vergleich der Verben mit den Commands
def verb_to_command(items):

    # Abbruch wenn kein Verb gefunden
    if items['rootverb'] is None:
        items['best_command'] = None        
        items['best_sim'] = 0.0
        return

    items['best_command'] = None
    items['best_sim'] = -1

    # Verb embedden
    verb_emb = matching_model.encode(items['rootverb'])   

    for cmd, embs in command_verb_embeddings.items():

        # Ähnlichkeit vergleichen
        similarities = util.cos_sim(verb_emb, embs)
        max_sim = similarities.max().item()

        # Bestes Ergebnis schreiben wenn > -1 ;-)
        if max_sim > items['best_sim']:
            items['best_sim'] = max_sim
            items['best_command'] = cmd

# Smart Parsing

"Einmal mit alles"

In [36]:
for category, tests in all_tests.items():

    for items in tests:

        add_trf_parsing(items)
        verb_to_command(items)

## AI Analyse

Danke Claude :)

In [None]:
# Fehleranalyse: Expected vs. Predicted
from collections import defaultdict

errors = defaultdict(lambda: defaultdict(int))

for category, tests in all_tests.items():
    for item in tests:
        if item['expected'] is None:
            continue
        
        expected = item['expected']['command']
        predicted = item.get('best_command')
        
        # Zähle alle Kombinationen (auch korrekte)
        errors[expected][predicted] += 1

# Nur Fehler anzeigen
print("\n" + "="*80)
print("FEHLERANALYSE: Expected → Predicted (nur Fehler)")
print("="*80)

total_errors = 0
for expected in sorted(errors.keys()):
    has_errors = False
    error_list = []
    
    for predicted, count in sorted(errors[expected].items()):
        if expected != predicted:  # Nur Fehler
            error_list.append(f"{predicted}({count}x)")
            total_errors += count
            has_errors = True
    
    if has_errors:
        correct = errors[expected].get(expected, 0)
        total = sum(errors[expected].values())
        print(f"\n{expected.upper():12} ({correct}/{total} korrekt)")
        print(f"  Verwechselt mit: {', '.join(error_list)}")

print(f"\n{'='*80}")
print(f"Gesamt-Fehler: {total_errors}")

# Confusion Matrix (optional, detaillierter)
print("\n" + "="*80)
print("CONFUSION MATRIX")
print("="*80)

all_commands = sorted(set(errors.keys()) | {pred for preds in errors.values() for pred in preds.keys()})

# Header
print(f"{'Expected':12}", end='')
for cmd in all_commands:
    print(f"{cmd:>10}", end='')
print()
print("-"*80)

# Rows
for expected in all_commands:
    print(f"{expected:12}", end='')
    for predicted in all_commands:
        count = errors[expected].get(predicted, 0)
        if expected == predicted and count > 0:
            print(f"{count:>10}", end='')  # Korrekte
        elif count > 0:
            print(f"\033[91m{count:>10}\033[0m", end='')  # Fehler in rot
        else:
            print(f"{'':>10}", end='')
    print()

print("\n" + "="*80)


FEHLERANALYSE: Expected → Predicted (nur Fehler)

DROP         (6/7 korrekt)
  Verwechselt mit: read(1x)

EXAMINE      (4/7 korrekt)
  Verwechselt mit: drop(1x), use(2x)

GO           (5/7 korrekt)
  Verwechselt mit: take(2x)

READ         (5/6 korrekt)
  Verwechselt mit: take(1x)

TAKE         (6/7 korrekt)
  Verwechselt mit: examine(1x)

USE          (5/6 korrekt)
  Verwechselt mit: drop(1x)

Gesamt-Fehler: 9

CONFUSION MATRIX
Expected          drop   examine        go      read      take       use
--------------------------------------------------------------------------------
drop                 6                    [91m         1[0m                    
examine     [91m         1[0m         4                              [91m         2[0m
go                                       5          [91m         2[0m          
read                                               5[91m         1[0m          
take                  [91m         1[0m                             6    

In [38]:
print("\n" + "="*80)
print(" "*25 + "SMART PARSER TEST REPORT")
print("="*80)

# Per Command Accuracy
from collections import defaultdict

command_stats = defaultdict(lambda: {'total': 0, 'correct': 0})

for category, tests in all_tests.items():
    for item in tests:
        if item['expected'] is None:
            continue

        expected_cmd = item['expected']['command']
        predicted_cmd = item.get('best_command')

        command_stats[expected_cmd]['total'] += 1
        if expected_cmd == predicted_cmd:
            command_stats[expected_cmd]['correct'] += 1

print("\nACCURACY PRO COMMAND:")
print('-'*80)
for cmd, stats in sorted(command_stats.items()):
    acc = 100 * stats['correct'] / stats['total'] if stats['total'] > 0 else 0
    bar = '█' * int(acc / 5)  # Visual bar
    print(f"{cmd:<10} {stats['correct']:>2}/{stats['total']:<2} ({acc:>5.1f}%) {bar}")

print("\n" + "="*80)
print(" "*25 + "ACCURANCY PRO SENTENCE")
print("="*80)

for category, tests in all_tests.items():

    print(f"\nKATEGORIE: {category.upper()}")
    print('-'*80)

    for i, item in enumerate(tests, 1):
        print(f"\n[{i}] {item['sentence']}")
        print(f"    Expected:  {item['expected']}")

        print(f"    Parsed:    verb='{item['rootverb']}', objects={item['objects']}")

        # Prüfen ob expected vorhanden ist
        if item['expected'] is not None:  # ✅
            expected_cmd = item['expected']['command']
            match_icon = '✓' if expected_cmd == item['best_command'] else '✗'
            print(f"    Predicted: {item['best_command']} (score: {item['best_sim']:.3f}) {match_icon}")
        else:
            # Kein Expected → nur Predicted ausgeben
            print(f"    Predicted: {item.get('best_command', 'N/A')} (score: {item.get('best_sim', 0):.3f}) [no expected]")




                         SMART PARSER TEST REPORT

ACCURACY PRO COMMAND:
--------------------------------------------------------------------------------
drop        6/7  ( 85.7%) █████████████████
examine     4/7  ( 57.1%) ███████████
go          5/7  ( 71.4%) ██████████████
read        5/6  ( 83.3%) ████████████████
take        6/7  ( 85.7%) █████████████████
use         5/6  ( 83.3%) ████████████████

                         ACCURANCY PRO SENTENCE

KATEGORIE: BASIC
--------------------------------------------------------------------------------

[1] Nimm den Schlüssel
    Expected:  {'command': 'take', 'objects': ['Schlüssel']}
    Parsed:    verb='Nimm', objects=[Schlüssel]
    Predicted: take (score: 0.937) ✓

[2] Lege das Schwert ab
    Expected:  {'command': 'drop', 'objects': ['Schwert']}
    Parsed:    verb='Legen', objects=[Schwert]
    Predicted: drop (score: 0.914) ✓

[3] Geh zur Taverne
    Expected:  {'command': 'go', 'objects': ['Taverne']}
    Parsed:    verb='Geh', o