LLM_IE v1.4.2

Latest

Latest

daviden1013 released this 18 May 03:43

833144d

📐Documentation Site

User guide, API reference, and documentation are available at Documentation Page.

📣Features and Changes

Fixed Warning.warn bug in LLMUnitChunker

LLMUnitChunker.chunk() previously called the non-existent Warning.warn(...) when a header was missing or had an invalid anchor_text, which raised AttributeError instead of emitting a warning. Now correctly calls warnings.warn(...).

Fixed import bug in PromptEditor IPython chat

PromptEditor._IPython_chat() referenced importlib.util.find_spec(...) without importing importlib.util, causing AttributeError on first call in a Jupyter environment. The missing import has been added.

Fixed assistant message rendering in PromptEditor IPython chat

The assistant response in _IPython_chat was rendered by interpolating the full response dictionary into the HTML output. It now correctly renders the response text (response["response"]).

Fixed ReviewFrameExtractor default prompt MRO fallback

When a default review prompt for a subclass of ReviewFrameExtractor was not found, the MRO walk was supposed to fall back to an ancestor's prompt but always probed the same filename. The lookup now uses the current class in the MRO and emits a UserWarning when an ancestor's prompt is used.

Widened PromptEditor extractor type

PromptEditor now accepts any Extractor (including StructExtractor) instead of only FrameExtractor. The internal usage only depends on get_prompt_guide() which is defined on the base Extractor.

MultiClassRelationExtractor now requires possible_relation_types_func

MultiClassRelationExtractor previously assigned self.possible_relation_types_func only when the argument was truthy, which let users construct an extractor that would later fail with AttributeError on use. The argument is now validated unconditionally and a missing/invalid value raises immediately at construction time. This matches the behavior of BinaryRelationExtractor.

Fixed AttributeError when AttributeExtractor receives empty JSON

AttributeExtractor._extract_from_frame returned None when the LLM produced no JSON object, which caused frame.attr.update(None) to raise TypeError downstream. It now returns {} so attribute extraction is a safe no-op for empty responses.

Removed obsolete entity_key parameter from FrameExtractor abstract API

The abstract FrameExtractor.extract_frames and extract_frames_async previously declared an entity_key:str parameter that the concrete implementations did not accept. The parameter has been removed from the abstract signatures and docstring; the entity key is fixed to "entity_text" internally.

⚡ Performance

Cached PunktSentenceTokenizer in SentenceUnitChunker

SentenceUnitChunker previously instantiated PunktSentenceTokenizer on every chunk() call. The tokenizer is now imported at module level and instantiated once in __init__.

Fixed short-text fuzzy matching in FrameExtractor

_get_closest_substring (used for fuzzy entity span matching) computed an empty iteration range when the unit text was shorter than the fuzzy window, silently skipping fuzzy matches. The scan now covers every valid start position regardless of text length, and short-circuits before the inner window loop when the first token does not match.

Assets 2