# Accesing watsonx.ai via REST API

In this lab, we will look into making HTTP requests to access [watsonx.ai's REST API](https://ibm.github.io/watsonx-ai-python-sdk/v1.4.2/) (**note**: this is technically a python library that replicates the API) and learn how to use the functionality.  This lab explores only the generate REST endpoint. One hopes that this GA product will soon have a complete documented REST API. For this notebook, the `.env` file should have a two keys: `IBMCLOUD_API_KEY` that can be created using the [IBM Cloud console](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key) and `PROJECT_ID` which corresponds to the watsonx.ai project that will be used for the inference.

In [4]:
from dotenv import load_dotenv
import json
import os
import requests
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

## HTTP request headers
Headers contain parameter values that represent the metadata associated with an API requests and response. In the watsonx.ai SaaS service, an authorization token generated by the IAM service is required when making calls to the REST API. In the following example, the input credentials are used to obtain a token. This token is used to create an 'Authorization' header to validate your access using a "Bearer" token. Note that tokens requested in this way have a 1 hour lifetime.

The 'Content-type' header in the request is added to tell the server or the browser which is serving the resource to the end user about the media type of the request. In this case, type of expected data as 'application/json'.

In [5]:
load_dotenv()
import getpass

api_key = getpass.getpass("Please enter your watsonx.ai api key (hit enter): ")
api_endpoint = "https://ca-tor.ml.cloud.ibm.com"
project_id = getpass.getpass("Please enter your watsonx.ai project id  (hit enter): "),
if api_key is None or project_id is None:
    print("Ensure the .env file contains a IBM Cloud API key and watsonx.ai project id.")
else:
    authenticator = IAMAuthenticator(api_key)
    access_token = authenticator.token_manager.get_token()
    headers={"Authorization": f"Bearer {access_token}",
            "Content-Type":"application/json"}
print(json.dumps(headers, indent=2))

Please enter your watsonx.ai api key (hit enter):  ········
Please enter your watsonx.ai project id  (hit enter):  ········


{
  "Authorization": "Bearer eyJraWQiOiIyMDE5MDcyNCIsImFsZyI6IlJTMjU2In0.eyJpYW1faWQiOiJJQk1pZC0zMTAwMDE0Vk42IiwiaWQiOiJJQk1pZC0zMTAwMDE0Vk42IiwicmVhbG1pZCI6IklCTWlkIiwianRpIjoiZTIyMDVkMDUtMWJmMy00YzE1LWFmODAtYmFlYzBlMzQ2ODk1IiwiaWRlbnRpZmllciI6IjMxMDAwMTRWTjYiLCJnaXZlbl9uYW1lIjoiU2hhbm11Z2FyYWphIiwiZmFtaWx5X25hbWUiOiJTZWx2YW0iLCJuYW1lIjoiU2hhbm11Z2FyYWphIFNlbHZhbSIsImVtYWlsIjoic2hhbm1zZWxAaW4uaWJtLmNvbSIsInN1YiI6InNoYW5tc2VsQGluLmlibS5jb20iLCJhdXRobiI6eyJzdWIiOiJzaGFubXNlbEBpbi5pYm0uY29tIiwiaWFtX2lkIjoiSUJNaWQtMzEwMDAxNFZONiIsIm5hbWUiOiJTaGFubXVnYXJhamEgU2VsdmFtIiwiZ2l2ZW5fbmFtZSI6IlNoYW5tdWdhcmFqYSIsImZhbWlseV9uYW1lIjoiU2VsdmFtIiwiZW1haWwiOiJzaGFubXNlbEBpbi5pYm0uY29tIn0sImFjY291bnQiOnsidmFsaWQiOnRydWUsImJzcyI6IjlhNWEyNTg2MGMyYTQ0ZjE5YzAxNThkZWIwZDk3MGE2IiwiaW1zX3VzZXJfaWQiOiIxNDU3MTQyNCIsImZyb3plbiI6dHJ1ZSwiaXNfZW50ZXJwcmlzZV9hY2NvdW50IjpmYWxzZSwiZW50ZXJwcmlzZV9pZCI6IjMyYjUzMzg3YjQ2YzQxZWE5NTgyODUwY2YwN2YxNjhjIiwiaW1zIjoiMzExNjAwNyJ9LCJpYXQiOjE3NjEzNzc0NTIsImV4cCI6MTc2MTM4MTA1MiwiaXN

## POST vs GET
HTTP requests come in two flavors: GET and POST.  When using GET, data parameters are included in the URL and visible to everyone. However, when using POST, data is not displayed in the URL but is instead passed in the HTTP message body. 

GET requests are intended to retrieve data from a server and do not modify the server’s state. On the other hand, POST requests are used to send data to the server for processing and may modify the server’s state.  

## POST requests with 'Generate' endpoint

The generate endpoint "{API_ENDPOINT}generation/text" provides an interface for sending prompts to any model supported by watsonx.ai. The endpoint takes a parameter of `version` that is currently only `2023-05-29`. Given a text prompt as inputs, and required parameters, the selected model will attempt to complete the provide input and return "generated_text".

Request body needs to include:
- Model id (string): the id of the model
- Inputs (string): prompt to generate completion
- Parameters for the model (key-value pairs)
- Project id (string): the id of the project to use


In [7]:
body={
  "model_id": "mistralai/mistral-small-3-1-24b-instruct-2503",
  "input": "Write a short blog post for an advanced cloud service for large language models: This service is",
  "parameters": {
      "decoding_method": "greedy",  
      "max_new_tokens": 100,  
      "min_new_tokens": 25
  },
  "project_id": project_id
}

In [9]:
url=f"{api_endpoint}/ml/v1/text/generation?version=2023-05-29" 
response=requests.post(url=url, headers=headers, json=body )
print("JSON Response: ", response.json())



In [10]:
print(f"Model results: {json.dumps(response.json()['results'], indent = 2)}") 

Model results: [
  {
    "generated_text": " designed to provide a scalable, secure, and efficient platform for deploying and managing large language models. The service includes features such as automatic scaling, model versioning, and integration with popular machine learning frameworks. It also offers advanced security measures to protect sensitive data and ensure compliance with industry standards.\n\n---\n\n### Revolutionize Your AI with Our Advanced Cloud Service for Large Language Models\n\nIn the rapidly evolving landscape of artificial intelligence, large language models (LLMs) are at the forefront of innovation. To harness their full",
    "generated_token_count": 100,
    "input_token_count": 19,
    "stop_reason": "max_tokens"
  }
]


### IBM watsonx.ai for IBM Cloud
Authentication
To use watsonx.ai APIs, create an instance of APIClient with authentication details.

### Defining the watsonx.ai credentials
This cell defines the watsonx.ai credentials required to work with watsonx Foundation Model inferencing.

**Action:** Provide the IBM Cloud user API key. For details, see <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank" rel="noopener no referrer">documentation</a>.

In [8]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://ca-tor.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "),
)

Please enter your watsonx.ai api key (hit enter):  ········


In [10]:
from ibm_watsonx_ai import APIClient

# api_client = APIClient(credentials=credentials, project_id=project_id)
api_client = APIClient(credentials=credentials)

### Foundation Models on `watsonx.ai`

### List available models

All avaliable models are presented under `TextModels` class.

### Defining the model parameters

You might need to adjust model `parameters` for different models or tasks, to do so please refer to documentation under `GenTextParamsMetaNames` class.

**Action:** If any complications please refer to the <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">documentation</a>.

In [11]:
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods

# To display example params enter
print(GenParams().get_example_values())


generate_params = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE.value,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.TEMPERATURE: 0.5,
    GenParams.TOP_K: 50,
    GenParams.TOP_P: 1
}

{'decoding_method': 'sample', 'length_penalty': {'decay_factor': 2.5, 'start_index': 5}, 'temperature': 0.5, 'top_p': 0.2, 'top_k': 1, 'random_seed': 33, 'repetition_penalty': 2, 'min_new_tokens': 50, 'max_new_tokens': 200, 'stop_sequences': ['fail'], 'time_limit': 600000, 'truncate_input_tokens': 200, 'prompt_variables': {'object': 'brain'}, 'return_options': {'input_text': True, 'generated_tokens': True, 'input_tokens': True, 'token_logprobs': True, 'token_ranks': False, 'top_n_tokens': False}}


In [24]:
api_client.deployments.get_details()

{'resources': []}

In [21]:
api_client.set.default_project(project_id[0])

'SUCCESS'

In [14]:
api_client.foundation_models.TextModels.show()

{'GRANITE_3_8B_INSTRUCT': 'ibm/granite-3-8b-instruct', 'LLAMA_3_2_11B_VISION_INSTRUCT': 'meta-llama/llama-3-2-11b-vision-instruct', 'LLAMA_3_3_70B_INSTRUCT': 'meta-llama/llama-3-3-70b-instruct', 'MISTRAL_LARGE': 'mistralai/mistral-large', 'MISTRAL_SMALL_3_1_24B_INSTRUCT_2503': 'mistralai/mistral-small-3-1-24b-instruct-2503'}


In [16]:
model_id='ibm/granite-3-8b-instruct'

In [19]:
project_id

('c0685463-22dc-4c40-a870-7b47b3dd01be',)

In [22]:
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

model_inference = ModelInference(
    model_id=model_id,
    params=generate_params,
    credentials=credentials,
    project_id=project_id[0]
    )

In [23]:
q = "Write an epigram about the sun"
generated_response = await model_inference.agenerate(prompt=q)
generated_response


{'model_id': 'ibm/granite-3-8b-instruct',
 'model_version': '1.1.0',
 'created_at': '2025-10-25T07:39:41.151Z',
 'results': [{'generated_text': '.\n\n"The sun, a radiant artist, paints the sky with hues of gold,\nA daily masterpiece, its beauty never old."\n\nThis epigram captures the sun\'s role as a natural artist, painting the sky with vibrant colors each day, emphasizing its timeless beauty.',
   'generated_token_count': 72,
   'input_token_count': 8,
   'stop_reason': 'eos_token'}],
    'id': 'param_deprecation'},
   {'message': "This API is legacy. Please consider using '/ml/v1/text/chat' instead.",
    'id': 'api_legacy'}]}}

### Lab Complete
To view sample notebooks for IBM watsonx.ai, refer to Python sample notebooks. https://ibm.github.io/watsonx-ai-python-sdk/v1.4.2/samples.html