<img src="https://imagedelivery.net/Dr98IMl5gQ9tPkFM5JRcng/3e5f6fbd-9bc6-4aa1-368e-e8bb1d6ca100/Ultra" alt="Image description" width="160" />

Introduction to Contextual AI Datasets endpoint. 

The Contextual APIs provide a simple interface to our state-of-the-art Contextual Language Models (CLMs). Use this guide to learn the basics of how to create your first agent programmatically. In this demo, we will be interacting with the dataset API.

To run this notebook interactively, you can open it in Google Colab:

<a target="_blank" href="https://colab.research.google.com/github/ContextualAI/ContextualAI-Examples/blob/main/python/dataset-api-example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Client Setup
To begin, you will need an API key to securely access the API. Please contact Contextual's sales team to get your API key.

In [154]:
CONTEXTUAL_API_KEY="key-..."

In [25]:
import os
from contextual import ContextualAI

# create a client
client = ContextualAI(
    api_key=CONTEXTUAL_API_KEY,
)

# test the API Key
try:
    response = create_agent_output = client.agents.list()
    print("Valid API Key.")
except Exception as e:
    print(f"Invalid API Key: {e}")
    

Valid API Key.


### Create an Agent

You will need to first create an agent. An agent allows to group together a dataset and queries, as well as add in prompt guidelines. Everything you'll do following this will reference this agent. 

In [29]:
# Create an agent with name 'My First Agent'
try:
    create_agent_output = client.agents.create(
        name="My First Agent"
    )
    print(create_agent_output.model_dump_json())
    agent_id = create_agent_output.id
except Exception as e:
    print(f"Encountered error: {e}")

{"id":"938f5c23-43c7-46f6-8290-0d4d4f83801f","datastore_ids":["f7d4838d-afcd-408b-b796-70a7830aa284"]}


### Create a Evaluation Dataset

In [40]:
dataset = [
  {
      "prompt": "What is the concept of 'noumena' according to Kant?",
      "knowledge": [
        "Noumena are \"things-in-themselves\" - the true, fundamental nature of reality that exists independently of human perception and understanding. According to Kant, we can never directly experience or know noumena.",
        "Kant contrasts noumena with phenomena (things as they appear to us). While we can observe and understand phenomena through our senses and mental categories, the underlying noumena remain forever inaccessible to human cognition."
      ],
      "reference": "According to Immanuel Kant, the concept of \"noumena\" (singular: \"noumenon\") refers to things as they are in themselves, independent of human perception or the conditions under which humans experience them."
  },
  {
      "prompt": "How does photosynthesis work in plants?",
      "knowledge": [
          "Photosynthesis is the process by which plants convert light energy into chemical energy stored in glucose and other organic compounds.",
          "During photosynthesis, plants take in carbon dioxide from the air and water from the soil. Using sunlight, they transform these ingredients into glucose and oxygen.",
          "The process occurs in the chloroplasts, specifically using the green pigment chlorophyll, which gives plants their green color."
      ],
      "reference": "Photosynthesis is the process where plants convert sunlight into energy. Plants use chlorophyll in their chloroplasts to transform carbon dioxide and water into glucose and oxygen using solar energy. This process is essential for producing both food for the plant and oxygen as a byproduct."
  }
]
with open('dataset.jsonl', 'w') as f:
    for item in dataset:
        json_line = json.dumps(item)
        f.write(json_line + '\n')


In [43]:
# generate a random name for our dataset
def generate_dataset_name():
  return f"dataset_{''.join(random.choices(string.ascii_lowercase, k=3))}"


# now call the create dataset API
try:
    with open('dataset.jsonl', 'rb') as file:
        create_dataset_ouput = client.agents.datasets.evaluate.create(
            agent_id=agent_id,
            file=file,
            dataset_name=generate_dataset_name(),
            dataset_type="evaluation_set"
        )

        print(create_dataset_ouput.model_dump_json())
        dataset_name = create_dataset_ouput.name
except Exception as e:
    print(f"Encountered error: {e}")

dataset_name

{"name":"dataset_tpq","type":"evaluation_set","version":"0000000001va3a0f644"}


'dataset_tpq'

### Append to the Evaluation Dataset

In [54]:
# now call the update dataset API
try:
    with open('dataset.jsonl', 'rb') as file:
        create_dataset_ouput = client.agents.datasets.evaluate.update(
            agent_id=agent_id,
            file=file,
            dataset_name=dataset_name,
            dataset_type="evaluation_set"
        )

        pprint(json.loads(create_dataset_ouput.model_dump_json()))
        dataset_name = create_dataset_ouput.name
except Exception as e:
    print(f"Encountered error: {e}")

dataset_name

{'name': 'dataset_tpq',
 'type': 'evaluation_set',
 'version': '0000000003vc1e155d4'}


'dataset_tpq'

### Get Evaluation Dataset Metadata

In [53]:
# now call the update dataset API
try:
    dataset_metadata = client.agents.datasets.evaluate.metadata(
        agent_id=agent_id,
        dataset_name=dataset_name,
    )

    pprint(json.loads(dataset_metadata.model_dump_json()))
except Exception as e:
    print(f"Encountered error: {e}")

{'created_at': '2025-01-15T23:39:04.249944Z',
 'num_samples': 4,
 'schema': {'guideline': 'text',
            'knowledge': 'text',
            'prompt': 'text',
            'reference': 'text',
            'response': 'text'},
 'schema_': {'guideline': 'text',
             'knowledge': 'text',
             'prompt': 'text',
             'reference': 'text',
             'response': 'text'},
 'status': 'validated',
 'type': 'evaluation_set',
 'version': '0000000002ve2b250ad'}


In [149]:
# now call the retrieve dataset
try:
    dataset = client.agents.datasets.evaluate.retrieve(
        agent_id=agent_id,
        dataset_name=dataset_name
    )
    dataset.write_to_file("downloaded_dataset.jsonl")

except Exception as e:
    print(f"Encountered error: {e}")

In [150]:
with open("downloaded_dataset.jsonl") as f:
    print(f.read())

{"prompt": "What is the concept of 'noumena' according to Kant?", "reference": "According to Immanuel Kant, the concept of \"noumena\" (singular: \"noumenon\") refers to things as they are in themselves, independent of human perception or the conditions under which humans experience them.", "response": "", "guideline": "", "knowledge": "[\"Noumena are \\\"things-in-themselves\\\" - the true, fundamental nature of reality that exists independently of human perception and understanding. According to Kant, we can never directly experience or know noumena.\", \"Kant contrasts noumena with phenomena (things as they appear to us). While we can observe and understand phenomena through our senses and mental categories, the underlying noumena remain forever inaccessible to human cognition.\"]"}
{"prompt": "How does photosynthesis work in plants?", "reference": "Photosynthesis is the process where plants convert sunlight into energy. Plants use chlorophyll in their chloroplasts to transform carb

### Delete Evaluation Dataset

In [153]:
# now call the retrieve dataset
try:
    dataset = client.agents.datasets.evaluate.delete(
        agent_id=agent_id,
        dataset_name=dataset_name
    )
    print(f"Deleted dataset {dataset_name}...")
except Exception as e:
    print(f"Encountered error: {e}")

Encountered error: Error code: 500 - {'detail': 'Failed to delete dataset'}
