Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QA] Change MultiQuery Prompt, Add Hybrid Search (BM25 + Embedding), Cohere Reranker & LLM Chain Filter #253

Closed
vatsrahul1001 opened this issue Jan 9, 2024 · 1 comment
Assignees

Comments

@vatsrahul1001
Copy link
Collaborator

[QA] Change MultiQuery Prompt, Add Hybrid Search (BM25 + Embedding), Cohere Reranker & LLM Chain Filter

This Issue is to track QA efforts on #247

@vatsrahul1001 vatsrahul1001 self-assigned this Jan 9, 2024
@davidgxue
Copy link
Collaborator

davidgxue commented Jan 10, 2024

Summary of Observations

This is just a quick summary from my side, feel free to add more to this @vatsrahul1001

Improvements in Answer Quality and Source Attribution

  • There has been a noticeable enhancement in the quality of answers and the relevance of cited sources across the board. This applies to both questions that previously received satisfactory responses and those that did not.

Expected Degradation of Astro SDK Read Doc Related Questions

  • With the de-ingestion of the Astro SDK Read Docs, some of these related questions have less relevant responses. However, this is expected and not related to the RAG pipeline changes.

Expected Degradation of Astro SDK Read Doc Related Questions

  • With the de-ingestion of the Astro SDK Read Docs, some of these related questions have less relevant responses. However, this is expected and not related to the RAG pipeline changes.

Source Documents Ordering Incorrect in Slack

  • This is expected and already being tracked in a separate issue.

Noteworthy Examples of Answer Improvements (Compared to Previously Incorrect Answers)

  1. Question: "Do you know of Astronomer cosmos?"

    • Previous Inadequate Answer: "I'm sorry, but I don't have information on 'Astronomer Cosmos'..."
    • Revised Accurate Answer: "Yes, Astronomer Cosmos is a powerful tool for running dbt workflows in Apache Airflow. It provides an easy-to-use, flexible solution to running dbt projects in Airflow. Cosmos expands your dbt project into a Task Group or a full DAG (Directed Acyclic Graph), with each step of a model rendered as an Airflow task..."
  2. Question: "What is the default schedule of a dag and how do I turn it off altogether?"

    • Previous Inadequate Answer: "The default schedule of a DAG in Airflow is None, which means the DAG will not be scheduled to run automatically..."
    • Revised Accurate Answer: "The default schedule of a DAG in Airflow is timedelta(days=1), which means it will run once per day. If you want to turn off the scheduling of a DAG altogether, you can set the schedule_interval parameter to None when you're defining your DAG..."
  3. Question: "What is Astro SDK?"

    • Previous Inadequate Answer: "I'm sorry, but there's no such thing as the Astro SDK. Astronomer offers the Astro CLI..."
    • Revised Accurate Answer: "The Astro Python SDK is an open-source tool and Python package for DAG development built and maintained by Astronomer. Its purpose is to simplify the process of writing DAGs in Apache Airflow, particularly for Extract, Load, Transform (ELT) use cases. This allows pipeline authors to focus more on writing business logic in Python, and less on setting Airflow configurations..."
  4. Question: "How to install astronomer-providers"

    • Previous Inadequate Answer: "To install Astronomer's provider packages, you can use pip, the Python package manager. Here is the general command: pip install astronomer-providers-<provider-name>"
    • Revised Accurate Answer: "You can install the astronomer-providers package using pip, Python's package installer. Here are the steps:
      1. To install dependencies for the core provider, run the following command in your terminal:
        pip install astronomer-providers
        
        ...
  5. Question: "What is the latest version for Astronomer Providers?"

    • Previous Inadequate Answer: "I'm sorry, but I don't have the latest information on the version of Astronomer Providers..."
    • Revised Accurate Answer: "The latest version for Astronomer Providers is 1.18.4."

Difference in Source Documents Count

  • The process of sourcing documents has been refined. Previously, rewording a user query three times (each pulling 4 documents) would yield between 4 to 12 source documents, depending on duplicates.
  • The current system employs a reranker that consistently returns 10 documents. These are subsequently reviewed by a GPT-3.5 model to filter out irrelevant documents, resulting in a more focused set of 5 to 10 documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants