# Using `txtai` for retrieval

Quote from the [website](https://github.com/neuml/txtai): txtai is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.

`txtai` is very easy to use and a powerful framework for retrieval. 
Compared to `langchain`, the API is really stable. However, dependencies 
do not always work well. For this course, I had to manually downgrade
`faiss-cpu` to get it working.

`txtai` is not as complete as `langchain`, so you have to perform some
operations manually. Therefore, we work with the already existing 
sentences. Embeddings can be performed by `txtai` though.

## Load data (from previous notebook)

In [None]:
import json
with open("sentences.json") as f:
    sentences = json.load(f)

In [None]:
len(sentences)

## Index sentences

In [None]:
from txtai.embeddings import Embeddings

embeddings = Embeddings({"path": "sentence-transformers/multi-qa-MiniLM-L6-cos-v1", "content": True})

In [None]:
embeddings.index(sentences)

## Query

In [None]:
query = "Is the climate crisis worse for poorer countries?"
res = embeddings.search(query, 100)
res[0:10]

In [None]:
import pandas as pd
pd.set_option('display.max_colwidth', 0)
df = pd.DataFrame(res)
df

**Identical** results to our first notebook!

In [None]:
from txtai.pipeline import Similarity
similarity = Similarity('valhalla/distilbart-mnli-12-3')

In [None]:
pd.DataFrame([(res[x]["id"], res[x]["text"], res[x]["score"], score) 
 for x, score in similarity(query, [t["text"] for t in res])], 
            columns=["id", "text", "bi-score", "cross-score"]).set_index("id").\
    style.background_gradient(cmap='coolwarm')