# DAY 1 : LLMs

## Calling OCI models

### Supported models (https://docs.oracle.com/en-us/iaas/Content/generative-ai/chat-models.htm) 

### Cohere
- cohere.command-latest
- cohere.command-plus-latest
- cohere.command-a-03-2025

### llama
- meta.llama-3.1-405b-instruct
- meta.llama-3.3-70b-instruct
- meta.llama-4-maverick-17b-128e-instruct-fp8
- meta.llama-4-scout-17b-16e-instruct

### Openai
-  openai.gpt-4.1
-  openai.gpt-4o
-  openai.gpt-5
-  openai.gpt-4o-search-preview

### Groq
- xai.grok-3
- xai.grok-4


Questions: use #generative-ai-users  or ##igiu-innovation-lab slack channels 

if you have errors running sample code reach out for help in #igiu-ai-learning

In [2]:
# set up the  variables


import json,os
from langchain_oci.chat_models import ChatOCIGenAI

#####
#make sure your sandbox.json file is setup for your environment. You might have to specify the full path depending on  your `cwd` 
# you can also try making your cwd for jupyter match your workspace python code: 
# vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
# change from ${fileDirname} to ${workspaceFolder}
#####

SANDBOX_CONFIG_FILE = "sandbox.json"

LLM_MODEL = "cohere.command-a-03-2025" 
PREAMBLE = """
         You always answer in a one stanza poem.
"""
MESSAGE = """
         why is the sky blue?
"""

llm_service_endpoint= "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

In [None]:
scfg = None
# read the sandbox config 
with open(os.path.expanduser(SANDBOX_CONFIG_FILE), 'r') as f:
                scfg=  json.load(f)

llm_client = ChatOCIGenAI(
    model_id= LLM_MODEL,
    service_endpoint= llm_service_endpoint,
    compartment_id= scfg['oci']['compartment'],
    auth_file_location= scfg["oci"]["configFile"],
    auth_profile= scfg["oci"]["profile"],
    model_kwargs={
        "temperature":0.7, # higer value means more random, default = 0.3
        "max_tokens": 500, # max token to generate, can lead to incomplete responses
        "preamble_override": PREAMBLE, # Not supported by openai / grok / meta models
        "is_stream": False,
        "seed": 7555, # makes the best effort to make answer determininstic , not gaureented 
        "top_p": 0.7, # ensures only tokens with toptal probabely of p are considered, max value = 0.99, min 0.01, default 0.75
        "top_k": 1, # Different from 0 for meta models. Ensures that only top k tokens are considered, 0 turns it off, max = 500
        "frequency_penalty": 0.0 # Not supported by openai / grok models. Reduces the repeatedness of tokens max value 1.9=0, min 0,0
    }
)

### Response predictability

change teh sees and see if same seed leads to same response

In [4]:
response = llm_client.invoke(MESSAGE)
print(response.content)

The sky wears blue, a cloak of light,  
Scattered sunbeams, a celestial sight.  
Short waves of blue, they dance and play,  
While longer hues take leave by day.  
Rayleigh's whisper, a truth untold,  
Paints the heavens in azure and gold.  
So gaze above, let wonder unfold,  
The sky is blue, a story to hold.


In [None]:
llm_client.model_kwargs['seed'] = 6000
response = llm_client.invoke(MESSAGE)
print(response.content)

The sky wears blue, a cloak of light,  
Scattered sunbeams, a celestial sight.  
Short waves of blue, they dance and play,  
While longer hues slip away.  
Rayleigh's whisper, a gentle hand,  
Paints the heavens, a vast expanse.  
So look above, and wonder why,  
The sky is blue, as it meets the eye.


In [7]:
llm_client.model_kwargs['seed'] = 7555
response = llm_client.invoke(MESSAGE)
print(response.content)

The sky wears blue, a gentle hue,  
Scattered light, a sunlit view.  
Short waves dance, in air’s embrace,  
Painting heavens, a boundless space.  
Rayleigh’s whisper, soft and true,  
Tells the tale of skies in blue.


###  controlling the output length.  
change the length and see the stop reason

In [None]:
llm_chat_request.max_tokens = 60
llm_response = llm_client.chat(chat_detail)
print (llm_response.data.chat_response.text)
print (llm_response.data.chat_response.finish_reason)

### Exercise 1
 *  Try different models
 *  Play with messages  and preamble
    * Zero shot
    * N shot 
    * Output format  ( freeform, bullet, json, csv, sql etc)
    * input format (text, csv, json)
    * Different languages
* Try language analysis
    * Proof reading
    * Entity extraction
    * Summarization
    * Sentiment analysis
* Data generation
    * SQL insert
    * Json
    * CSV
* Try guard rails



## History  support
for conversational bots, its important for LLM to remember the conversation so far. It is typically done via history
history comprises of messages & the roles ofr teh entity associated with the message

In this example we are hard coding teh history. yuo will typically add it based on actual conversation. Note history is typically added in pairs. One for question asked by user. other for answer tp that question by LLM

In [8]:
# optionally update the PREAMBLE 
# PREAMBLE = " Answer the questions in a professional tone, based on thw conversation history"

previous_chat_message = oci.generative_ai_inference.models.CohereUserMessage(message="Tell me something about Oracle.")
previous_chat_reply = oci.generative_ai_inference.models.CohereChatBotMessage(message="Oracle is one of the largest vendors in the enterprise IT market and the shorthand name of its flagship product. The database software sits at the center of many corporate IT")
llm_chat_request.chat_history = [previous_chat_message, previous_chat_reply]

In [9]:
# ask the question 
llm_chat_request.seed = None
#llm_chat_request.message = "Where is Oracle's HQ?"
llm_chat_request.message = "what is its flagship product?"  # LLM knows what "it" means due to us adding history records

llm_response = llm_client.chat(chat_detail)


In [None]:
# Print result
print("**************************Chat Result**************************")
llm_text = llm_response.data.chat_response.text
print(llm_text)

### Exercise 2 
Create an conversational bot 

![image.png](attachment:image.png)

##  JSON format

### note this is only availble for 08-2024 models
### good habit its to specify JSON generation in preamble 

read restrictions here: https://docs.cohere.com/docs/structured-outputs-json#schema-constraints


In [None]:
# change top the supported model
chat_detail.serving_mode = OnDemandServingMode(model_id="cohere.command-r-08-2024")
llm_chat_request.response_format = CohereResponseJsonFormat()
llm_chat_request.preamble_override = "Answer in json format"
llm_chat_request.message = "list 3 Latest novels from indian authors "

llm_chat_request.max_tokens = 500
llm_response = llm_client.chat(chat_detail)
llm_text = llm_response.data.chat_response.text
print (llm_text)

  


Here we specify a simple JSON. note the scheme only allows for one entry.
so even thouhg we ask for multiple answer will be just one

In [None]:
response_schema = {
            "type": "object",
            "required": ["title", "author", "publication_year"],
            "properties": {
                "title": {"type": "string"},
                "author": {"type": "string"},
                "publication_year": {"type": "integer"},
            },
        }

llm_chat_request.response_format = CohereResponseJsonFormat(schema = response_schema)
llm_response = llm_client.chat(chat_detail)
llm_text = llm_response.data.chat_response.text
print (llm_text)


### nested schema with array

Here we allso for an array so the answer will be multiple entries
also see how we were able to break name into first & last name

In [None]:
response_schema = {
            "type": "object",
            "required": ["authors"],
            "properties" : {
                "authors" : {
                        "type" : "array",
                        "items" : {
                                "type" : "object",
                                "required": ["title", "author", "publication_year"],
                                "properties": {
                                        "title": {"type": "string"},
                                        "author": {
                                                "type": "object",
                                                "required" : ["fname", "lname"],
                                                "properties" : {
                                                        "fname" : {"type":"string"},       
                                                        "lname" : {"type":"string"}
                                                }
                                        },
                                        "publication_year": {"type": "integer"}
                                }
                        }        
                }
            }
        }

llm_chat_request.response_format = CohereResponseJsonFormat(schema = response_schema)
llm_chat_request.message = "list three most popular novels from indian authors last 2022"
llm_response = llm_client.chat(chat_detail)
llm_text = llm_response.data.chat_response.text
print (llm_text)

### Exercise three
 * Create a program to
    * Create 2 stanza poem about a famous Indian personality
    * Ask llm extract set of entities
    * Specify the output schema with fixed fields

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

## Streaming response

OCI support SSE for streaming the response each token as its generated 

In [None]:

#ask the question 
llm_chat_request.preamble_override = "Answer as a poem" 
llm_chat_request.message = "why is sky blue"
llm_chat_request.is_stream = True 
llm_chat_request.response_format = CohereResponseTextFormat()

# geerate response
llm_response = llm_client.chat(chat_detail)

# stream twh output
import json 
for event in llm_response.data.events():
    res = json.loads(event.data)
    if 'finishReason' in res.keys():
        print(f"\nFinish reason: {res['finishReason']}")
        break
    if 'text' in res:
        print(res['text'], end="", flush=True)
print("\n")

