Skip to content

Add query expansion feature for neuroscience search#38

Closed
zohaib-7035 wants to merge 1 commit intoINCF:mainfrom
zohaib-7035:feature/query-expansion
Closed

Add query expansion feature for neuroscience search#38
zohaib-7035 wants to merge 1 commit intoINCF:mainfrom
zohaib-7035:feature/query-expansion

Conversation

@zohaib-7035
Copy link
Contributor

@zohaib-7035 zohaib-7035 commented Jan 25, 2026

Summary:
This PR introduces a query expansion feature in the ks_search_tool.py module to improve search results for neuroscience-related terms. It automatically expands user queries with relevant synonyms or related concepts before performing searches.

Changes made:

  1. Added a QUERY_SYNONYMS dictionary containing neuroscience keywords and their synonyms.

  2. Implemented the expand_query() function:

    • Expands full phrases first, then individual words.
    • Prevents duplicate terms in expanded queries.
  3. Integrated query expansion in the smart_knowledge_search() function so that all searches now automatically use expanded queries.

  4. Added test code at the end of ks_search_tool.py to demonstrate query expansion.

Benefits:

  • Increases search coverage and recall for domain-specific terms (e.g., "mouse brain" now searches for "Rattus norvegicus" and "somatosensory cortex").
  • Reduces the chances of missing relevant datasets due to synonym mismatches.
  • Fully backward compatible — if no expansion is needed, the original query is used.

Example:

  • Input query: mouse brain
  • Expanded query: mouse brain Rattus norvegicus somatosensory cortex
  • Input query: memory
  • Expanded query: memory hippocampus

Next steps / Future improvements:

  • Expand the QUERY_SYNONYMS dictionary with more neuroscience terms.
  • Consider loading synonyms from a JSON file for easier maintenance.
  • Integrate with async search to handle large batch queries more efficiently.
Screenshot from 2026-01-26 00-51-43

@QuantumByte-01
Copy link
Collaborator

Closing — the synonym dictionary has only 3 entries and contains a factual error: 'mouse brain' is mapped to 'Rattus norvegicus' which is the rat, not mouse (Mus musculus). Beyond that, concatenating all synonyms into the query string will confuse the search rather than improve it. Query expansion is a valid idea but needs a proper neuroscience ontology (e.g. NIFSTD, NeuroNames) and should expand at the embedding level, not by string concatenation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants