Retrieval systems are fundamental to many AI applications, efficiently identifying relevant information from large datasets. These systems accommodate various data formats:

Unstructured text (e.g., documents) is often stored in vector stores or lexical search indexes.
Structured data is typically housed in relational or graph databases with defined schemas.
Despite this diversity in data formats, modern AI applications increasingly aim to make all types of data accessible through natural language interfaces. Models play a crucial role in this process by translating natural language queries into formats compatible with the underlying search index or database. This translation enables more intuitive and flexible interactions with complex data structures.

(1) Query analysis: A process where models transform or construct search queries to optimize retrieval.

(2) Information retrieval: Search queries are used to fetch information from various retrieval systems.

Query analysis
While users typically prefer to interact with retrieval systems using natural language, retrieval systems can specific query syntax or benefit from particular keywords. Query analysis serves as a bridge between raw user input and optimized search queries. Some common applications of query analysis include:

Query Re-writing: Queries can be re-written or expanded to improve semantic or lexical searches.
Query Construction: Search indexes may require structured queries (e.g., SQL for databases).
Query analysis employs models to transform or construct optimized search queries from raw user input.

Query re-writing
Retrieval systems should ideally handle a wide spectrum of user inputs, from simple and poorly worded queries to complex, multi-faceted questions. To achieve this versatility, a popular approach is to use models to transform raw user queries into more effective search queries. This transformation can range from simple keyword extraction to sophisticated query expansion and reformulation. Here are some key benefits of using models for query analysis in unstructured data retrieval:

Query Clarification: Models can rephrase ambiguous or poorly worded queries for clarity.
Semantic Understanding: They can capture the intent behind a query, going beyond literal keyword matching.
Query Expansion: Models can generate related terms or concepts to broaden the search scope.
Complex Query Handling: They can break down multi-part questions into simpler sub-queries.
Various techniques have been developed to leverage models for query re-writing, including:

Name	When to use	Description
Multi-query	
When you want to ensure high recall in retrieval by providing multiple pharsings of a question.	
Rewrite the user question with multiple pharsings, retrieve documents for each rewritten question, return the unique documents for all queries.

Decomposition	
When a question can be broken down into smaller subproblems.
Decompose a question into a set of subproblems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer).

Step-back
When a higher-level conceptual understanding is required.
First prompt the LLM to ask a generic step-back question about higher-level concepts or principles, and retrieve relevant facts about them. Use this grounding to help answer the user question. Paper.

HyDE
If you have challenges retrieving relevant documents using the raw user inputs.	Use an LLM to convert questions into hypothetical documents that answer the question. Use the embedded hypothetical documents to retrieve real documents with the premise that doc-doc similarity search can produce more relevant matches. Paper.


Query construction
Query analysis also can focus on translating natural language queries into specialized query languages or filters. This translation is crucial for effectively interacting with various types of databases that house structured or semi-structured data.

Structured Data examples: For relational and graph databases, Domain-Specific Languages (DSLs) are used to query data.

Text-to-SQL: Converts natural language to SQL for relational databases.
Text-to-Cypher: Converts natural language to Cypher for graph databases.
Semi-structured Data examples: For vectorstores, queries can combine semantic search with metadata filtering.

Natural Language to Metadata Filters: Converts user queries into appropriate metadata filters.
These approaches leverage models to bridge the gap between user intent and the specific query requirements of different data storage systems. Here are some popular techniques:

Name	When to Use	Description
Self Query	
If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.	
This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filter to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).

Text to SQL	
If users are asking questions that require information housed in a relational database, accessible via SQL.	
This uses an LLM to transform user input into a SQL query.

Text-to-Cypher	
If users are asking questions that require information housed in a graph database, accessible via Cypher.	
This uses an LLM to transform user input into a Cypher query.


