# Extractive Question Answering

Language models are good at generating text, but generations are not always accurate. One way to increase accuracy is to provide context for answering a question within the prompt.

First, we'll load some context.

In [1]:
import languagemodels as lm

python_info = lm.get_wiki("Python")
print(python_info)

Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule.Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0. Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.Python consistently ranks as one of the most popular programming languages.


We can now prompt the model to answer the question using the context.

In [2]:
print(lm.do(f"Answer from the context: Who created Python? {python_info}"))

Guido van Rossum created Python.


# Embeddings

Language models are capable of answering questions based on a context, but we now need a way to provide them with appropriate context.

One solution to this is to have a large amount of available context and retrieve only the meaningful bits when answering a question. Embeddings are a tool to achieve this.

Embeddings provide a way to map a numeric vector to the meaning of some input. In the case of language models, embeddings are derived from documents.

## Semantic Search

Once we have mapped vectors to our documents, we can search for similar documents by meaning. If we've constructed our embedding model appropriately, documents that answer questions will be near the questions themselves in vector space.

The math to achieve that is out of scope of this example, but the languagemodels package provides a few simple helper functions to facilated a document store capable of semantic search.

In [3]:
# Load some programming language documents
for topic in ['Python', 'Javascript', 'C++', 'SQL', 'HTML']:
    doc = lm.get_wiki(topic)
    lm.store_doc(doc)

In [4]:
# Perform semantic search
lm.get_doc_context("Who created Python?")

'It is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.\n\nPython is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule.\n\n18, released in 2020, was the last release of Python 2.Python consistently ranks as one of the most popular programming languages.'

In [5]:
# Put everything together to answer a general question about one of the languages
question = "What technologies are often associated with JS?"

context = lm.get_doc_context(question)

lm.do(f"Answer from the context: {question} {context}")

'JavaScript is often associated with HTML and CSS.'