Below is an example that demonstrates how to call OpenAI's Chat Completions API and retrieve both the response details and the accompanying debug and usage information that can be useful for our AI Assistant application. This example not only shows the minimal response data (such as the chat message, model, and usage statistics) but also emphasizes the debug headers (like processing time, API version, and request ID) that you can log for troubleshooting and performance monitoring.

### Example Code to Call the Chat Completions API

```python
import os
import json
import requests
from dotenv import load_dotenv

# Load environment variables from .env
load_dotenv()

def get_env_var(var: str):
    value = os.getenv(var)
    if value is None:
        raise ValueError(f"{var} not found in environment variables.")
    return value

# Retrieve our OpenAI API key from the environment
openai_api_key = get_env_var("OPENAI_API_COURSE_KEY")

# Define the endpoint for chat completions
chat_completion_url = "https://api.openai.com/v1/chat/completions"

# Create a sample payload for a chat completion request
payload = {
    "model": "gpt-4o",  # You can change this to the model of our choice (e.g., "gpt-4o", "gpt-4o-mini", etc.)
    "messages": [
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Hello, how can you help me today?"}
    ],
    # Optional parameters:
    "temperature": 0.7,
    "top_p": 1.0,
    "n": 1,  # Number of choices
    "max_completion_tokens": 150  # Upper bound for generated tokens
}

headers = {
    "Authorization": f"Bearer {openai_api_key}",
    "Content-Type": "application/json"
}

# Make the POST request to the Chat Completions API
response = requests.post(chat_completion_url, headers=headers, json=payload, timeout=30)

if response.ok:
    chat_completion = response.json()
    print("Chat Completion Response:")
    print(json.dumps(chat_completion, indent=4))
    
    # Extract and display useful debug and rate limiting headers
    headers_to_log = [
        "openai-organization",
        "openai-processing-ms",
        "openai-version",
        "x-request-id",
        "x-ratelimit-limit-requests",
        "x-ratelimit-limit-tokens",
        "x-ratelimit-remaining-requests",
        "x-ratelimit-remaining-tokens",
        "x-ratelimit-reset-requests",
        "x-ratelimit-reset-tokens"
    ]
    
    debug_info = {header: response.headers.get(header) for header in headers_to_log}
    print("\nDebug/Rate Limiting Headers:")
    print(json.dumps(debug_info, indent=4))
else:
    print(f"Failed to get chat completion: {response.status_code} {response.text}")
```

### What Information Is Retrieved?

When you call the chat completions endpoint, you get a response that includes:

- **Chat Completion Object Details:**
  - **id:** A unique identifier for the chat completion (e.g., `"chatcmpl-123"`).
  - **object:** The object type (always `"chat.completion"`).
  - **created:** Unix timestamp for when the chat completion was created.
  - **model:** The model used to generate the response.
  - **choices:** An array containing the generated message(s) along with metadata such as the `finish_reason`.
  - **usage:** Detailed token usage, including prompt tokens, completion tokens, and total tokens.
  - **service_tier:** The service tier used (e.g., `"default"`).

- **Debug/Rate Limiting Information (from HTTP Headers):**
  - **openai-organization:** The organization associated with the request.
  - **openai-processing-ms:** Time taken to process our request in milliseconds.
  - **openai-version:** The API version used.
  - **x-request-id:** A unique ID for the request (useful for debugging).
  - **x-ratelimit-*** headers: (These might be null for certain endpoints, but are useful for endpoints where rate limiting data is provided.)

### Dynamically Pulling in a Special OpenAI Configuration

For our AI Assistant, you might want to dynamically configure parameters based on the model selected. For example, if a particular model supports multi-modal outputs or has a different context window, you could maintain a configuration dictionary (or fetch additional details from a secondary source) and merge that into our request payload. Here’s a simple example:

```python
# Example: Additional configuration per model
MODEL_CONFIG = {
    "gpt-4o": {
        "max_completion_tokens": 150,
        "temperature": 0.7,
        "top_p": 1.0,
        "presence_penalty": 0.0,
        "frequency_penalty": 0.0,
        "stop": None
    },
    "gpt-4o-mini": {
        "max_completion_tokens": 100,
        "temperature": 0.5,
        "top_p": 0.9,
        "presence_penalty": 0.0,
        "frequency_penalty": 0.0,
        "stop": None
    }
}

selected_model = "gpt-4o"  # This can be set dynamically
model_specific_config = MODEL_CONFIG.get(selected_model, {})

# Merge the base payload with the model-specific configuration
payload.update(model_specific_config)
payload["model"] = selected_model
```

This approach allows our AI Assistant to dynamically adapt to different OpenAI models based on their capabilities and our application’s needs.

### Summary

- **Automatic Data Retrieval:**  
  You get the basic chat completion response (including the generated messages and usage stats) directly from the API.
  
- **Debug and Rate Limiting Data:**  
  HTTP headers provide useful information like processing time and request IDs, which help with debugging and monitoring.
  
- **Dynamic Configuration:**  
  By combining the API response with a custom configuration (stored in a dictionary), you can dynamically tailor requests to different OpenAI models to suit our assistant’s requirements.

This setup should give you a robust foundation for integrating and dynamically configuring OpenAI's chat completions in our AI Assistant application.

In [1]:
import os
import json
import requests
from dotenv import load_dotenv

# Load environment variables from .env
load_dotenv()

def get_env_var(var: str):
    value = os.getenv(var)
    if value is None:
        raise ValueError(f"{var} not found in environment variables.")
    return value

# Retrieve our OpenAI API key from the environment
openai_api_key = get_env_var("OPENAI_API_COURSE_KEY")

# Define the endpoint for chat completions
chat_completion_url = "https://api.openai.com/v1/chat/completions"

# Create a sample payload for a chat completion request
payload = {
    "model": "gpt-4o",  # You can change this to the model of our choice (e.g., "gpt-4o", "gpt-4o-mini", etc.)
    "messages": [
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Hello, how can you help me today?"}
    ],
    # Optional parameters:
    "temperature": 0.7,
    "top_p": 1.0,
    "n": 1,  # Number of choices
    "max_completion_tokens": 150  # Upper bound for generated tokens
}

headers = {
    "Authorization": f"Bearer {openai_api_key}",
    "Content-Type": "application/json"
}

# Make the POST request to the Chat Completions API
response = requests.post(chat_completion_url, headers=headers, json=payload, timeout=30)

if response.ok:
    chat_completion = response.json()
    print("Chat Completion Response:")
    print(json.dumps(chat_completion, indent=4))
    
    # Extract and display useful debug and rate limiting headers
    headers_to_log = [
        "openai-organization",
        "openai-processing-ms",
        "openai-version",
        "x-request-id",
        "x-ratelimit-limit-requests",
        "x-ratelimit-limit-tokens",
        "x-ratelimit-remaining-requests",
        "x-ratelimit-remaining-tokens",
        "x-ratelimit-reset-requests",
        "x-ratelimit-reset-tokens"
    ]
    
    debug_info = {header: response.headers.get(header) for header in headers_to_log}
    print("\nDebug/Rate Limiting Headers:")
    print(json.dumps(debug_info, indent=4))
else:
    print(f"Failed to get chat completion: {response.status_code} {response.text}")


Chat Completion Response:
{
    "id": "chatcmpl-BDXMEvOKQnSYFATQSDa0BbxRSDUrN",
    "object": "chat.completion",
    "created": 1742566190,
    "model": "gpt-4o-2024-08-06",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I'm here to assist you with a variety of tasks, such as answering questions, providing information, helping with problem-solving, or offering suggestions on a wide range of topics. How can I assist you today?",
                "refusal": null,
                "annotations": []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 27,
        "completion_tokens": 44,
        "total_tokens": 71,
        "prompt_tokens_details": {
            "cached_tokens": 0,
            "audio_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0,
            "au

The response shows that OpenAI’s Chat Completion endpoint returns a structured JSON object that includes not only the generated message but also detailed usage statistics and debug information. Here’s a breakdown of what you received:

### Chat Completion Response Details
- **id:** `"chatcmpl-BDXMEvOKQnSYFATQSDa0BbxRSDUrN"`  
  A unique identifier for this chat completion.

- **object:** `"chat.completion"`  
  Indicates that the response is a chat completion object.

- **created:** `1742566190`  
  A Unix timestamp for when the completion was created.

- **model:** `"gpt-4o-2024-08-06"`  
  The specific model used to generate this response.

- **choices:**  
  An array containing one or more response choices. In this example, it has one choice:
  - **index:** `0`  
    The position of the choice in the list.
  - **message:**  
    The generated message:
    - **role:** `"assistant"`  
      Indicates the speaker in the conversation.
    - **content:**  
      The actual text response from the assistant.
    - **refusal:** `null`  
      No refusal data is present.
    - **annotations:** `[]`  
      No annotations included.
  - **logprobs:** `null`  
    No log probabilities returned.
  - **finish_reason:** `"stop"`  
    Indicates that the generation stopped naturally.

- **usage:**  
  Detailed token usage for the request:
  - **prompt_tokens:** `27`  
    Tokens used for the prompt.
  - **completion_tokens:** `44`  
    Tokens generated for the response.
  - **total_tokens:** `71`  
    Total tokens consumed.
  - **prompt_tokens_details:** and **completion_tokens_details:**  
    Provide further breakdown, though here they are mostly zeros.

- **service_tier:** `"default"`  
  Indicates the service tier used for processing the request.

- **system_fingerprint:** `"fp_6ec83003ad"`  
  A fingerprint for the backend configuration that generated the response.

### Debug/Rate Limiting Headers
- **openai-organization:** `"user-ue08pmd83ul7gjg1su5tgani"`  
  The organization associated with the API request.

- **openai-processing-ms:** `"861"`  
  The time (in milliseconds) it took to process the request.

- **openai-version:** `"2020-10-01"`  
  The API version used.

- **x-request-id:** `"req_4d55e0b8ecd8e46447c0701e1f8b6051"`  
  A unique request ID, useful for debugging.

- **Rate Limiting Headers:**  
  These headers provide information on our rate limits:
  - **x-ratelimit-limit-requests:** `"500"`
  - **x-ratelimit-limit-tokens:** `"30000"`
  - **x-ratelimit-remaining-requests:** `"499"`
  - **x-ratelimit-remaining-tokens:** `"29981"`
  - **x-ratelimit-reset-requests:** `"120ms"`
  - **x-ratelimit-reset-tokens:** `"38ms"`

### How This Information Can Be Used in Our AI Assistant Application

1. **Dynamic Configuration:**  
   You can adjust parameters like `temperature`, `max_completion_tokens`, and others dynamically based on the model’s capabilities and our application's needs. For example, you might use different configurations for a model that is more expensive or has a larger context window.

2. **Usage Monitoring and Cost Management:**  
   The `usage` section provides token counts that can help you monitor costs, since billing is often based on token usage. You can log these stats and even set up alerts if usage spikes.

3. **Debugging and Support:**  
   The debug headers (especially `x-request-id`) are invaluable when you need to troubleshoot issues with OpenAI support or log performance metrics for our application.

4. **Rate Limit Management:**  
   The rate limit headers let you know how many requests or tokens you have remaining in the current window, allowing you to manage our application's load and avoid hitting limits.

5. **Backend Consistency:**  
   The `system_fingerprint` can be used to track backend configuration changes. If you need determinism in our responses (for instance, when using a seed), this fingerprint helps you verify that the backend hasn’t changed unexpectedly.

### Next Steps

- **Enrich Metadata:**  
  Consider combining this API response data with additional model metadata from our internal configurations (e.g., context window sizes, pricing tiers) to have a complete picture of each model's capabilities.

- **Logging and Analytics:**  
  Integrate the logging of usage and debug headers into our application’s monitoring system to track performance over time.

- **Dynamic Parameter Tuning:**  
  Use the token usage statistics to dynamically adjust parameters such as `max_completion_tokens` for future requests, ensuring cost efficiency while maintaining response quality.

This comprehensive view of the chat completion response and its associated headers is key to building an AI Assistant that is both efficient and cost-effective, while also providing a robust foundation for debugging and performance monitoring.