<img src="https://imagedelivery.net/Dr98IMl5gQ9tPkFM5JRcng/3e5f6fbd-9bc6-4aa1-368e-e8bb1d6ca100/Ultra" alt="Image description" width="160" />

Introduction to Contextual AI Platform

The Contextual APIs provide a simple interface to our state-of-the-art Contextual Language Models (CLMs). Use this guide to learn the basics of how to create your first application programmatically. In this demo, we will be creating a RAG application for financial documents.

To run this notebook interactively, you can open it in Google Colab:

<a target="_blank" href="https://colab.research.google.com/github/ContextualAI/ContextualAI-Examples/blob/main/python/dataset-api-example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Datasets
To begin, you will need an API key to securely access the API. Please contact Contextual's sales team to get your API key.

In [None]:


import requests
import json
import string
from typing import Dict, Optional
from pathlib import Path
import random

# Configuration
API_TOKEN = 'YOUR_API_TOKEN_HERE'  # Replace with your actual API token
BASE_URL = 'https://api.contextual.ai/v1'


def get_headers(content_type: str = "application/json") -> Dict[str, str]:
    """
    Generate headers for API requests

    Args:
        content_type: Content type for the request

    Returns:
        Dictionary containing request headers
    """
    return {
        "accept": "application/json",
        "Content-Type": content_type,
        "Authorization": f"Bearer {API_TOKEN}"
    }



### Create an Application

You will need to first create an application.

In [None]:
def create_application():

  url = f"{BASE_URL}/applications"

  payload = {
      "name": "string",
      "description": "string",
      "system_prompt": "string"
  }
  headers = {
      "accept": "application/json",
      "content-type": "application/json",
      "Authorization": f"Bearer {API_TOKEN}",
  }

  response = requests.post(url, json=payload, headers=headers)
  return json.loads(response.text)


response = create_application()
APPLICATION_ID = response['application_id']
APPLICATION_ID

<Response [200]>


'053329f7-276c-4021-91c0-b6c6893f8458'

### Create a Dataset

In [None]:
dataset = [
  {
  "prompt": "What is the concept of 'noumena' according to Kant?",
  "knowledge": [
    "Noumena are \"things-in-themselves\" - the true, fundamental nature of reality that exists independently of human perception and understanding. According to Kant, we can never directly experience or know noumena.",
    "Kant contrasts noumena with phenomena (things as they appear to us). While we can observe and understand phenomena through our senses and mental categories, the underlying noumena remain forever inaccessible to human cognition."
  ],
  "response": "According to Immanuel Kant, the concept of \"noumena\" (singular: \"noumenon\") refers to things as they are in themselves, independent of human perception or the conditions under which humans experience them."
  },
  {
      "prompt": "How does photosynthesis work in plants?",
      "knowledge": [
          "Photosynthesis is the process by which plants convert light energy into chemical energy stored in glucose and other organic compounds.",
          "During photosynthesis, plants take in carbon dioxide from the air and water from the soil. Using sunlight, they transform these ingredients into glucose and oxygen.",
          "The process occurs in the chloroplasts, specifically using the green pigment chlorophyll, which gives plants their green color."
      ],
      "response": "Photosynthesis is the process where plants convert sunlight into energy. Plants use chlorophyll in their chloroplasts to transform carbon dioxide and water into glucose and oxygen using solar energy. This process is essential for producing both food for the plant and oxygen as a byproduct."
  }
]
with open('dataset.jsonl', 'w') as f:
    for item in dataset:
        json_line = json.dumps(item)
        f.write(json_line + '\n')


In [None]:
def create_dataset(application_id: str, file_path: str, dataset_name: str, dataset_type: str):
   url = f"{BASE_URL}/applications/{application_id}/datasets"
   headers = {
       "accept": "application/json",
       "Authorization": f"Bearer {API_TOKEN}"
   }

   with open(file_path, 'rb') as f:
       files = {
           'file': f,
           'dataset_name': (None, dataset_name),
           'dataset_type': (None, dataset_type)
       }
       response = requests.post(url, headers=headers, files=files)
       return response.json()


def generate_dataset_name():
  return f"dataset_{''.join(random.choices(string.ascii_lowercase, k=3))}"

dataset_name = generate_dataset_name()

# Example usage:
result = create_dataset(
    application_id=APPLICATION_ID,
    file_path="dataset.jsonl",
    dataset_name=dataset_name,
    dataset_type="grounded_generation_train"
)
result

{'version': '0000000001v0a6d6e5',
 'name': 'dataset_mam',
 'type': 'grounded_generation_train'}

### Append to the Dataset

In [None]:
def update_dataset(application_id: str, file_path: str, dataset_name: str, dataset_type: str):
   """
   Update an existing dataset

   Args:
       application_id: ID of the application
       file_path: Path to the file to upload
       dataset_name: Name of the dataset to update
       dataset_type: Type of dataset

   Returns:
       API response as dictionary
   """
   url = f"{BASE_URL}/applications/{application_id}/datasets/{dataset_name}"
   headers = {
       "accept": "application/json",
       "Authorization": f"Bearer {API_TOKEN}"
   }

   with open(file_path, 'rb') as f:
       files = {
           'file': f,
           'dataset_name': (None, dataset_name),
           'dataset_type': (None, dataset_type)
       }
       response = requests.put(url, headers=headers, files=files)
       return response.json()


# Example usage:
result = update_dataset(
    application_id=APPLICATION_ID,
    file_path="dataset.jsonl",
    dataset_name=dataset_name,
    dataset_type="grounded_generation_train"
)
result


{'version': '0000000002v8c756f21',
 'name': 'dataset_mam',
 'type': 'grounded_generation_train'}

### Get Dataset Metadata

In [None]:
def get_dataset_metadata(application_id: str, dataset_name: str) -> Dict:
    """
    Get metadata for a specific dataset

    Args:
        application_id: ID of the application
        dataset_name: Name of the dataset

    Returns:
        API response as dictionary
    """
    url = f"{BASE_URL}/applications/{application_id}/datasets/{dataset_name}/metadata"

    try:
        response = requests.get(url, headers=get_headers())
        response.raise_for_status()
        return response.json()
    except Exception as e:
        print(f"Error retrieving dataset metadata: {str(e)}")
        raise

result = get_dataset_metadata(
    application_id=APPLICATION_ID,
    dataset_name=dataset_name
)
result

{'version': '0000000002v8c756f21',
 'type': 'grounded_generation_train',
 'created_at': '2024-12-04T22:44:11.584291Z',
 'status': 'validated',
 'schema': {'prompt': 'text', 'response': 'text', 'knowledge': 'text'},
 'num_samples': 4}

In [None]:
def get_dataset(application_id: str, dataset_name: str, output_path: Optional[str] = None):
   """
   Download a dataset

   Args:
       application_id: ID of the application
       dataset_name: Name of the dataset to retrieve
       output_path: Optional path to save the downloaded dataset
   """
   url = f"{BASE_URL}/applications/{application_id}/datasets/{dataset_name}"
   headers = {
       "accept": "application/json",
       "Authorization": f"Bearer {API_TOKEN}"
   }

   response = requests.get(url, headers=headers)

   if output_path:
       with open(output_path, 'wb') as f:
           f.write(response.content)
   return response


result = get_dataset(
    application_id=APPLICATION_ID,
    dataset_name=dataset_name,
    output_path="downloaded_dataset.csv"
)

In [None]:
with open("downloaded_dataset.csv") as f:
    print(f.read())

{"prompt": "What is the concept of 'noumena' according to Kant?", "response": "According to Immanuel Kant, the concept of \"noumena\" (singular: \"noumenon\") refers to things as they are in themselves, independent of human perception or the conditions under which humans experience them.", "knowledge": "Noumena are \"things-in-themselves\" - the true, fundamental nature of reality that exists independently of human perception and understanding. According to Kant, we can never directly experience or know noumena. Kant contrasts noumena with phenomena (things as they appear to us). While we can observe and understand phenomena through our senses and mental categories, the underlying noumena remain forever inaccessible to human cognition."}
{"prompt": "How does photosynthesis work in plants?", "response": "Photosynthesis is the process where plants convert sunlight into energy. Plants use chlorophyll in their chloroplasts to transform carbon dioxide and water into glucose and oxygen using

### Delete Dataset

In [None]:
def delete_dataset(application_id: str, dataset_name: str) -> Dict:
    """
    Delete a dataset

    Args:
        application_id: ID of the application
        dataset_name: Name of the dataset to delete

    Returns:
        API response as dictionary
    """
    url = f"{BASE_URL}/applications/{application_id}/datasets/{dataset_name}"

    try:
        response = requests.delete(url, headers=get_headers())
        response.raise_for_status()
        return response.json()
    except Exception as e:
        print(f"Error deleting dataset: {str(e)}")
        raise

result = delete_dataset(
    application_id=APPLICATION_ID,
    dataset_name=dataset_name
)


In [None]:
result

{}