# Building a Knowledge Management System (KMS) using Contextual Answers

Contextual Answers is a plug-and-play engine that organizations can seamlessly embed into their digital assets to enhance the efficiency and accuracy of information queries. The engine provides businesses with LLM technology to instantaneously answer user questions about any organizational data. The answers will be based solely on this data, and will be backed by the proper context from the organizational knowledge base.

The Contextual Answers package we are offering is the full solution: a library to store your files, a retrieval mechanism that fetches the most relevant contexts from your organizational knowledge base and a model that provides an answer based on this context. This process is done in a Retrieval Augmented Generation (RAG) way: for every question, the system retrieves the most relevant segments of text from organizational data. Then, based on those segments, the system will answer the question. Moreover, if the answer is not within the knowledgebase, the system will indicate this instead of providing a false answer.

As an example, we will use these capabilities to create an efficient knowledge management system (KMS). Ready to get started?

## Imports and settings

In [1]:
import ai21
import json

### API key
In order to run this notebook, you will need an API key for AI21 Studio. How can you get it?

Create a free account at [AI21 Studio](https://studio.ai21.com). You can see your API key in the *Account* tab.

In [2]:
# TODO: fill in your API key from your AI21 Studio account
YOUR_API_KEY = ""
ai21.api_key = YOUR_API_KEY
assert ai21.api_key != "", "You must provide an API key!"

## The files

In this example, our knowledgebase contains three policies of a company called ExampleTech: Working from abroad, Hybrid work guidelines, and IT security. Since this is a simple demo, we will use these variables to save the paths for the files:

In [3]:
DIR_PREFIX = "docs/"
HYBRID_GUIDELINES_PATH = DIR_PREFIX + "Hybrid Work Guidelines.txt"
IT_SECURITY_PATH = DIR_PREFIX + "IT Organisational Security Policy.txt"
WORKING_ABROAD_PATH = DIR_PREFIX + "Working From Abroad - Guidelines.txt"

## Upload a document

The first step in building our KMS is to upload our company's documents to the Library. To get a feel, let's upload a single document:

In [4]:
hybrid_file_id = ai21.Library.Files.upload(file_path=HYBRID_GUIDELINES_PATH)

Let's take a look at all of the files we currently have in the Library:

In [5]:
def print_all_files():
    files = ai21.Library.Files.list()
    if not files:
        print("There are no files in this Library")
        return
    for file in files:
        print(json.dumps(file.__dict__['values'], indent=4))

In [6]:
print_all_files()

{
    "fileId": "9af8608c-ed71-4aba-b7e4-c65afffadb02",
    "name": "Hybrid Work Guidelines.txt",
    "path": null,
    "fileType": "text/plain",
    "sizeBytes": 1764,
    "labels": [
        ""
    ],
    "publicUrl": null,
    "createdBy": "3c04139b-ec3a-4102-a0dc-0f7f83724ca7",
    "creationDate": "2023-08-21",
    "lastUpdated": "2023-08-21",
    "status": "PROCESSED",
    "errorCode": null,
    "errorMessage": null
}


As you can see, every file has a unique ID and other associated fields. You can see the full documentation [here](https://docs.ai21.com/reference/manage-library-ref).

## Manage your Library

Simply uploading your files won't be enough to maintain a comprehensive library; There are several actions you still need to take.
For instance, a certain document may become outdated and necessitate replacement. In such situations, you must delete the outdated file and then re-upload the updated version. Since all operations within the document library are linked to the document ID, we can utilize a specific function to conveniently retrieve the ID based on the document's name.

In [7]:
def get_file_id(file_name):
    result = ai21.Library.Files.list()
    file_idx = [r.name for r in result].index(file_name)
    file_id = [r.fileId for r in result][file_idx]
    return file_id

Now, imagine that your company has changed the amount of days you can work from home. This means we need to update the Hybrid work guidelines. First, we need to delete it and then upload the updated version:

In [None]:
file_id = get_file_id(file_name = HYBRID_GUIDELINES_PATH.split("/")[-1])
ai21.Library.Files.delete(resource_id = file_id)

While looking at the files we currently have in our library, we can see that the deletion worked:

In [9]:
print_all_files()

There are no files in this Library


### Upload all the files

You can upload a file as it is, store it in a directory (for those who like working with directories) or add labels. This can help you organize your filing system, while focusing your questions on a subset of documents. In addition, every file can be associated with a public URL.
Let's upload our files with different options:

In [10]:
## Simple upload
ai21.Library.Files.upload(file_path=WORKING_ABROAD_PATH)

## Upload with labels
ai21.Library.Files.upload(file_path=HYBRID_GUIDELINES_PATH,
                          labels=['Hybrid', 'WFH'])

## Upload with public URL
ai21.Library.Files.upload(file_path=IT_SECURITY_PATH,
                          path="IT",
                          publicUrl="https://www.exampletech.com/it")

'de4eda11-c2f0-4582-8694-1637f45c6bc4'

Now, looking at our library, we can see the relevant properties as well:

In [11]:
print_all_files()

{
    "fileId": "de4eda11-c2f0-4582-8694-1637f45c6bc4",
    "name": "IT Organisational Security Policy.txt",
    "path": "IT",
    "fileType": "text/plain",
    "sizeBytes": 2873,
    "labels": [
        ""
    ],
    "publicUrl": "https://www.exampletech.com/it",
    "createdBy": "3c04139b-ec3a-4102-a0dc-0f7f83724ca7",
    "creationDate": "2023-08-21",
    "lastUpdated": "2023-08-21",
    "status": "PROCESSING",
    "errorCode": null,
    "errorMessage": null
}
{
    "fileId": "93ccf9cc-4d56-45f3-9326-eff8bf4d9cdf",
    "name": "Hybrid Work Guidelines.txt",
    "path": null,
    "fileType": "text/plain",
    "sizeBytes": 1764,
    "labels": [
        "Hybrid",
        "WFH"
    ],
    "publicUrl": null,
    "createdBy": "3c04139b-ec3a-4102-a0dc-0f7f83724ca7",
    "creationDate": "2023-08-21",
    "lastUpdated": "2023-08-21",
    "status": "PROCESSED",
    "errorCode": null,
    "errorMessage": null
}
{
    "fileId": "e7b62d19-f96c-48b6-a1a8-8a62225a04c3",
    "name": "Working From Abr

## Ask a question

Now that we have all our files in the knowledgebase, it's time to ask some questions.
The question is used as a query for a retrieval mechanism, which searches over the entire knowledge base and retrieves the most relevant contexts.

With rapid changes occurring in work environments lately, a common question from employees is about working remotely:

In [12]:
response = ai21.Library.Answer.execute(question="How many days can I work from home?")
print(response.answer)

Employees can choose between working from the office or from home two days a week.


However, if the answer to the question is not in any of the documents, the model will indicate that by returning an empty response (`Null`). For instance, if we will ask the following question:

In [13]:
response = ai21.Library.Answer.execute(question="What's my meal allowance when working from home?")
print(response.answer)

None


You can use the following function to present the answer in a more palatable way. It does:

1. Utilising the `answerInContext` field to return a pre-determined answer in case you cannot answer the question based on the given knowledgebase.

2. Prints all the relevant sources that the system retrieved and used to build the context (optional).

In [14]:
def present_answer(full_response, presentSources=True):
    if not full_response.answerInContext:
        print("The answer is not in the documents")
    else:
        print("Answer: ")
        print(full_response.answer)
        if presentSources:
            print("\n\nSources:\n\n")
            for source in full_response.sources:
                print("===============================================\n")
                print("From the document: " + source['name'])
                for highlight in source['highlights']:
                    print("\n----------------------------------------------\n")
                    print(highlight)
                print("\n")

In [15]:
present_answer(response)

The answer is not in the documents


### Focus your search

If you have a large collection of documents and files, it can be helpful to refine your retrieval process. By using the labels or paths you assigned to each document in the upload process, you can narrow down your process and achieve more accurate results, ultimately saving time. We provide several options for that purpose:

1. Search within a specific path in your library: Focus your search on a particular location within your knowledge base.
2. Search only for documents with specific labels: Filter your search to include only documents that have been assigned certain labels.
3. Search within a designated group of documents: Specify the document IDs of a particular set of files, allowing the model to perform the search exclusively within that group.

Let's see an example to a question:

In [16]:
response = ai21.Library.Answer.execute(question="My computer got stolen. What should I do?")

In [17]:
present_answer(response)

Answer: 
You must immediately report theft, loss, or unauthorized disclosure of Tech’s proprietary information.

Please follow the guidelines listed below.


Sources:



From the document: IT/IT Organisational Security Policy.txt

----------------------------------------------

1. You should never leave your laptop unattended outside of the office (for example, on the Wolt stand).

2. Keep your computer locked whenever it isn't in use.

3. Use the company’s chosen password manager to save all your passwords.

----------------------------------------------

General Use and Ownership ExampleTech’s proprietary information, stored on electronic and computing devices, remains its sole property.

Protecting this information is your responsibility, and you must immediately report theft, loss, or unauthorized disclosure of it.

Please follow the guidelines listed below.

----------------------------------------------

11. Do not store personal files on the company computers (except in folders 

Imagine that your knowledgebase contains thousands or millions of documents. As this question is very IT related, it's a safe bet to assume that the answer is probably in the _IT_ path:

In [18]:
response = ai21.Library.Answer.execute(question="My computer got stolen. What should I do?",
                                       path="IT")

present_answer(response)

Answer: 
You must immediately report theft, loss, or unauthorized disclosure of Tech’s proprietary information.

Please follow the guidelines listed below.


Sources:



From the document: IT/IT Organisational Security Policy.txt

----------------------------------------------

1. You should never leave your laptop unattended outside of the office (for example, on the Wolt stand).

2. Keep your computer locked whenever it isn't in use.

3. Use the company’s chosen password manager to save all your passwords.

----------------------------------------------

General Use and Ownership ExampleTech’s proprietary information, stored on electronic and computing devices, remains its sole property.

Protecting this information is your responsibility, and you must immediately report theft, loss, or unauthorized disclosure of it.

Please follow the guidelines listed below.

----------------------------------------------

11. Do not store personal files on the company computers (except in folders 

However, focusing the search in the wrong group of documents may lead to an answer not being found, like here:

In [19]:
response = ai21.Library.Answer.execute(question="My computer got stolen. What should I do?",
                                       labels=["WFH"])

present_answer(response)

The answer is not in the documents


You can see the full API specifications [here](https://docs.ai21.com/reference/contextual-answers-api-ref).