Skip to content

[4.1.0] Add global search filters (/filter command)#144

Merged
dam2452 merged 20 commits intomainfrom
filter
Mar 22, 2026
Merged

[4.1.0] Add global search filters (/filter command)#144
dam2452 merged 20 commits intomainfrom
filter

Conversation

@dam2452
Copy link
Copy Markdown
Owner

@dam2452 dam2452 commented Mar 18, 2026

Objective
Introduce a global search filter mechanism that allows users to narrow down results across all search-related commands (e.g., /sz, /sens, /clip, /characters, /object). Filters are persisted in the database per chat and automatically expire after 1 hour of inactivity.

New Commands & Features

  • /filter (aliases: /filtr, /f)
    • /filter <filters>: Sets or appends new search criteria. Supported filters include:
      • season:X (e.g., season:2, season:1-3)
      • episode:X (e.g., episode:S01E05)
      • title:X (fuzzy match against episode title)
      • character:X (e.g., character:Pawlak,Kusy)
      • emotion:X (e.g., emotion:happy)
      • object:X (supports operators, e.g., object:chair>3)
    • /filter info: Displays currently active filters for the chat.
    • /filter reset: Clears all active filters.
  • Seamless Integration: All major search handlers (text, semantic, characters, objects) have been updated to fetch and apply active filters automatically.

Technical Details / Under the Hood

  • Parser & Validator (FilterParser, FilterValidator): Robust parsing of user input (including ranges and relational operators) and canonical resolution of entity names against Elasticsearch indices.
  • Applicator (FilterApplicator): Dynamically injects clauses (FILTER, SHOULD, NESTED) into Elasticsearch queries and performs memory-based deduplication/filtering of results based on frame timestamps and episode metadata.
  • Database Persistency: Added the user_search_filters PostgreSQL table with a JSONB column to store filter structures and an expiration touch mechanism (last_used_at).
  • Tests: Added comprehensive test coverage in test_filter.py and updated expected_file_hashes.json.
  • Bumped app version to 4.1.0.

dam2452 added 8 commits March 18, 2026 12:37
Introduce per-chat persistent search filters and apply them across text/semantic/video searches. Changes include:

- DB: create user_search_filters table and index; add DatabaseManager methods to get/upsert/reset/touch filters.
- API: add SearchFilterService and FilterParser (new service files) and a FilterHandler for commands (/filtr, /filter, /f) with reset/info/set subcommands.
- Applicator: new FilterApplicator to translate filters into ES clauses and to filter text/semantic segments using video-frame data (character/object/emotion matching, season/episode scoping).
- Integration: thread filters into existing flows (text search, semantic search, transcription, clip handlers, character/object commands) including expiration handling (filters expire after 1h of inactivity) and user-facing messages when filters expire.
- UX: add responses for filter commands and update COMMANDS.md/COMMANDSen.md docs to list the new filter commands and examples.

Files added/updated include DB SQL, DatabaseManager, handler registrations, many handlers to respect active filters, filter responses, filter_applicator, and small changes to search TextSegmentsFinder to incorporate ES-level filter clauses.
Change DatabaseManager to json.loads the stored "filters" column when returning user filters (previously attempted dict(...) which fails for JSON strings). Add a comprehensive test suite for the not_sending_videos filter handler (bot/tests/not_sending_videos/test_filter.py) covering commands, aliases, resets and error cases. Adjust example character names in filter_handler_responses and bump VERSION to 4.1.0.
Add a FilterValidator to normalize and validate parsed search filters (characters, emotions, objects) using CharacterFinder/ObjectFinder and emotion mapping, returning any resolution notes. Integrate validator into FilterHandler: fetch the user's active series, resolve the filter before persisting, show resolution notes to the user if present, and store the resolved filter. Also add a response helper to format resolution notes and export FilterValidator from the search_filter package.
Consolidate filter resolution notes into the filter set response: remove the separate get_filter_resolution_notes_message helper and update get_filter_set_message to accept optional notes and prepend them to the body. Update FilterHandler to stop sending a separate notes reply and pass the resolved notes into get_filter_set_message. Also add user_search_filters to test DB cleanup in conftest and refresh expected_file_hashes.json to match updated test artifacts.
Refresh expected SHA256 hashes in bot/tests/expected_file_hashes.json to match updated test fixture files. Several video-related entries were updated (e.g. clip_geniusz*, klip*, sd_geniusz*, snap_*), reflecting changes to those assets so tests assert the new checksums.
Introduce support for deleting series indices and several related improvements:

- Reindex: add `delete` target handling in reindex handler, new user responses for delete start/complete, and updated usage text. ReindexService: add delete_series() to remove per-series indices and expose _INDEX_TYPES; keep legacy __delete_series_indices. Logs deleted indices and warnings on failures.
- Bot messaging: add a clip size guard (__extract_clip_with_size_guard) that shrinks clips exceeding FILE_SIZE_LIMIT_MB and updates _send_top_segment_as_clip to use it and correct duration sent to responder.
- Handlers & commands: register new search commands for characters/objects (szukajpostac/szp and szukajobiekt/szo), update validators to accept flexible arg counts, and update COMMANDS.md / COMMANDSen.md to document them.
- Search: expand Elasticsearch mappings (episode_metadata, scene_info, character_appearances, detected_objects, perceptual_hash, etc.) to store richer metadata and vectors.
- Character matching: improve name matching by trying query permutations (including reversed word order) and fuzzy matching.

These changes enable index deletion, prevent oversized clip uploads, improve object/character search UX, and enrich stored metadata for better search results.
Try the full argument string as a character first, and only treat the last token as an emotion if the full query fails and the last token maps to a known emotion. This moves parsing logic into the CharacterBotHandler, removes the old parse_character_args helper and bot/utils/character_utils.py, and updates imports accordingly. Also increases the emotion fuzzy-match cutoff from 0.5 to 0.75 to reduce false positives. Additionally, add a missing-argument validation in ReindexHandler to handle short command inputs.
Prevent extremely large max_duration values when bitrate_bps is very low by clamping the computed duration to at most 30.0 seconds. The original formula (limit_bytes * 8 / bitrate_bps * 0.85) is preserved but wrapped with min(..., 30.0) to avoid unreasonably long allowed durations.
Comment thread bot/search/video_frames/character_finder.py Outdated
Comment thread bot/handlers/administration/reindex_handler.py
Comment thread bot/handlers/not_sending_videos/filter_handler.py Outdated
Comment thread bot/handlers/not_sending_videos/filter_handler.py Outdated
Comment thread bot/handlers/not_sending_videos/objects_handler.py Outdated
Comment thread bot/services/search_filter/filter_validator.py Outdated
Comment thread bot/services/search_filter/filter_validator.py
Comment thread bot/services/search_filter/filter_validator.py Outdated
Comment thread bot/services/search_filter/filter_parser.py Outdated
Comment thread bot/search/filter_applicator.py Outdated
dam2452 added 9 commits March 19, 2026 13:23
Convert search filter parsing/validation to instance-style APIs and simplify handler logic. Key changes: make FilterParser stateful (private regex attrs, parse() instance method), change FilterValidator to return resolved values plus message lists and aggregate messages, remove SearchFilterService.get_active_filters_with_expiry and the "filters expired" code path (and associated response), and update callers to use get_active_filters. Add a default no-op _get_validator_functions implementation in BotMessageHandler and remove redundant overrides across many handlers. Other cleanup: add/adjust validator in FilterHandler, defer fetching seasons until needed in ObjectsHandler, simplify several type hints and list initializations, and remove unused imports. These changes streamline filter handling, clarify error/message flows, and reduce duplicated validator boilerplate.
Add a private FilterParser instance (self.__parser) to FilterHandler and use it in __handle_set instead of creating a new parser each time. This reduces repeated instantiation and keeps parser usage consistent across handler methods.
Update FilterApplicator to detect scene overlap by checking intervals between consecutive frame timestamps for matching season/episode, rather than requiring a single timestamp to fall inside the segment. This correctly handles scenes defined by timestamp pairs that span a segment. Also enhance format_seconds_to_mmss to emit hours when needed (H:MM:SS) and use it in format_segment for consistent time formatting.
Add _get_all_frame_timestamps to load and sort all frame timestamps per episode from Elasticsearch, and include it in the async gather tasks. Modify _segment_passes_all to accept the precomputed timestamps and use bisect to find the next frame timestamp (avoiding per-segment sorting). Import bisect and RequestError, handle ES RequestError by returning an empty map, and log the number of episodes loaded. This optimizes segment overlap checks and reduces repeated sorting work.
Build per-episode (season, episode) bool SHOULD clauses instead of a broad season-only TERMS filter. Ignore entries missing either season or episode and return early if none are valid. Wrap the SHOULD clauses in a FILTER with minimum_should_match=1. Also reduce the Elasticsearch result size from 100000 to 10000 to limit returned hits.
Avoid fetching all frame timestamps up-front. Collect character/emotion/object frame-key tasks, await them to produce frame_key_sets, compute hit_episode_keys from those results (ignoring None season/episode entries), and then call _get_all_frame_timestamps only for the matched episodes. Also rename tasks to char_tasks and reorder the calls to reduce unnecessary work.
Move the initial CharacterFinder.find_best_matching_name call to after the emotion-parsing block so the handler first checks if the last argument is an emotion and performs an emotion-aware lookup. Keeps the full-query lookup as a fallback and preserves the existing not-found error path.
Lower difflib.get_close_matches cutoff from 0.75 to 0.6 in map_emotion_to_en to broaden fuzzy matching of emotion labels. This makes the handler more tolerant of misspellings and variant inputs when mapping to canonical emotion keys.
Introduce _EMOTION_ALIASES (and add Dict to imports) to map common informal emotion labels (e.g. "happy" -> "happiness") to canonical keys. Update map_emotion_to_en to consult these aliases before falling back to fuzzy matching, and raise difflib.get_close_matches cutoff from 0.6 to 0.75 to reduce incorrect matches.
Comment thread bot/services/search_filter/search_filter_service.py Outdated
Comment thread bot/responses/not_sending_videos/emotions_handler_responses.py Outdated
Comment thread bot/handlers/not_sending_videos/semantic_search_handler.py Outdated
Comment thread bot/handlers/not_sending_videos/objects_handler.py Outdated
Comment thread bot/handlers/administration/reindex_handler.py
Comment thread bot/responses/not_sending_videos/filter_handler_responses.py Outdated
Comment thread bot/services/search_filter/filter_parser.py Outdated
Comment thread bot/search/filter_applicator.py
Comment thread bot/search/filter_applicator.py Outdated
Comment thread bot/search/filter_applicator.py
Replace the SearchFilterService indirection and centralize search filter operations in DatabaseManager. Handlers now call DatabaseManager.get_and_touch_user_filters / get_user_filters / upsert_user_filters / reset_user_filters directly; upsert_user_filters now accepts a SearchFilter and JSON-serializes it internally. Deleted the old services/search_filter/search_filter_service.py module and updated imports across many handlers and clip/search/semantic handlers. Also adjusted return types and SQL in database_manager.get_user_filters to return a SearchFilter (or None) and updated callers accordingly. Additional small refactors: renamed internal parser methods in FilterParser to use private names, renamed _format_filter to __format_filter and updated usages, removed some explicit typing annotations and cleaned up list initializations. These changes centralize filter persistence logic and simplify handler code.
@dam2452 dam2452 enabled auto-merge (squash) March 21, 2026 13:51
Update bot/tests/expected_file_hashes.json: replace the stored hash for 'semantic_search_ucieczka.message' with f3b561cc9b530fb78cc027ba0257a8edb7e432987c18d7bd43404346f20850ed (previously c6b9534757da0150fa6c555f2eae047c1a2505d2037f45e624b2c04001f2068e). Keeps test fixtures in sync with the updated file contents.
@dam2452 dam2452 merged commit 5640f00 into main Mar 22, 2026
5 of 6 checks passed
@dam2452 dam2452 deleted the filter branch March 22, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants