# SOTA Q&A over MSMarco text (~9M chunks) in 4 minutes on Laptop

With ThirdAI we can build a Q&A system on MSMarco, the largest BEIR Benchmark — achieving SoTA accuracy in 4 minutes using just a Laptop.

In [1]:
!pip3 install pip==20.0.2
!pip3 install "thirdai>=0.7.40"
!pip3 install beir

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting beir
  Downloading beir-2.0.0.tar.gz (53 kB)
[K     |████████████████████████████████| 53 kB 2.8 MB/s eta 0:00:011
Building wheels for collected packages: beir
  Building wheel for beir (setup.py) ... [?25ldone
[?25h  Created wheel for beir: filename=beir-2.0.0-py3-none-any.whl size=63552 sha256=aab5b1e4b70b36305c1a5a2c58fe3cb9d411b18160e6885d16fc42a079cfa8bf
  Stored in directory: /home/pratik/.cache/pip/wheels/1c/14/96/c606ede3c10e9300ef771a6183af09d389459195ff5f854862
Successfully built beir
Installing collected packages: beir
Successfully installed beir-2.0.0


### 1. Getting Started

First let's import the library and download the dataset.

In [2]:
from thirdai import neural_db as ndb
from thirdai import demos, licensing
import pandas as pd
import time

import os
if "THIRDAI_KEY" in os.environ:
    licensing.activate(os.environ["THIRDAI_KEY"])
else:
    licensing.activate("")  # Enter your ThirdAI key here

# Downloads msmarco and puts all of the documents into a csv file.
documents, train_queries, test_queries, _ = demos.download_beir_dataset("msmarco")

[nltk_data] Downloading package punkt to /home/pratik/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


RuntimeError: The license was found to be invalid: The licensing check failed with the response HTTP error code 400 and body {"meta":{"id":"c87b4d49-5b2e-497b-9ace-d1f930ab9d42"},"errors":[{"title":"Bad request","detail":"cannot be blank","source":{"pointer":"/meta/key"}}]}

### 2. Building the System

In [None]:
qna_model = ndb.NeuralDB(low_memory=True)

start = time.perf_counter()

qna_model.insert([
	ndb.CSV(
        documents,
        id_column="DOC_ID",        # Indicates which column stores the document ids.
        strong_columns=["TITLE"],  # Indicates which column contains the title.
        weak_columns=["TEXT"],     # Indicates which column contains the text.
)])

end = time.perf_counter()
print(f"Completed in {end-start:.3f} s")

### 3. Evaluation

In [None]:
def parse_labels(labels):
    return list(map(int, labels.split(":")))


def evaluate(qna_model, test_queries):
    test_df = pd.read_csv(test_queries)
    test_df["DOC_ID"] = test_df["DOC_ID"].map(parse_labels)

    true_positives = 0
    start = time.perf_counter()
    for _, row in test_df.iterrows():
        result = qna_model.search(row["QUERY"], top_k=5)
        if len(result) and result[0].id in row["DOC_ID"]:
            true_positives += 1
    end = time.perf_counter()

    precision = true_positives / len(test_df)
    print(f"precision@1={precision:.3f}")
    avg_time = (end - start) / len(test_df) * 1000
    print(f"average query time: {avg_time:.3f} ms")

In [None]:
evaluate(qna_model=qna_model, test_queries=test_queries)

So how does this compare to other systems?

| Model/System | precision@1 |
| --- | --- |
| Elastic Search  | 0.65 |
| Google T5-Base | 0.56 |
| Open AI Ada | 0.81 |
| ThirdAI | 0.79 |

### 4. Domain Specialization

While this system already achieves a comparable accuracy to the best Embedding Models, what if we want to improve it further? One of the defining characteristics of successful production search systems is their ability to continually improve based on user interactions. For example, say a company uses the custom acronym IDD to mean “initial design document”. Since this acronym doesn't appear in the training data used to create the LLMs in the search system, user queries like the “summarize the IDD for project xyz” will fail since the system doesn't understand the acronym used. With domain specialization the system can adapt to understand these relationships and answer queries like this correctly. These user interactions allow the underlying system to learn patterns/trends in user preferences that aren't present in the raw documents. 

In [None]:
train_df = pd.read_csv(train_queries)
train_df["DOC_ID"] = train_df["DOC_ID"].map(parse_labels)

start = time.perf_counter()

qna_model.supervised_train_with_ref_ids(
    queries=train_df["QUERY"].to_list(), labels=train_df["DOC_ID"].to_list()
)

end = time.perf_counter()
print(f"finetuned in {end-start:.3f} s")

Now if we rerun the evaluation we see that the precision@1 has improved to 0.814 just by feeding sample user interactions into the system. 

In [None]:
evaluate(qna_model=qna_model, test_queries=test_queries)

### Energy Efficient AI

Another point of emphasis should be the environmental impact. While new advances in AI are undoubtably impressive and revolutionary, the energy requirements are enormous. The next generation of Nvidia H100 GPUs in production are alone projected to surpass the energy usage of a small nation. Finding energy efficient alternatives is essential as this technology continues to develop. A system like ThirdAI that uses only a fraction of the computing resources and requires no hardware accelerators offers a path to significantly less energy usage when deploying generative AI systems.