
# Databricks LLM Test
This test notebook:
- Creates a client object with Databricks using a user token
- Prompts an LLM of a user's choice (`SERVING_MODEL`)
- Returns the content of the message

Notes: 
- [How to get your Databricks token](https://docs.databricks.com/en/dev-tools/auth/pat.html) 
- Compute cluster: `CDSI ML Cluster`
- The `SERVING_MODEL` must be a serving endpoint in Databricks. This process in done in the [Databricks UI](https://msk-mode-test.cloud.databricks.com/ml/endpoints/)
- When testing is completed, MAKE SURE THE `CDSI ML Cluster` IS NO LONGER RUNNING. Unexpected costs will result even if cluster is idle!

!['Compute cluster'](https://github.com/clinical-data-mining/msk_cdm/blob/main/docs/reference/images/compute_cluster.png?raw=true)
    

In [1]:
from openai import OpenAI


In [7]:
DATABRICKS_TOKEN = '<YOUR-TOKEN-HERE>'
SERVING_MODEL = 'meta_llama_3_8b_instruct_cdm'
MAX_TOKENS = 256
ENDPOINT_URL = 'https://msk-mode-test.cloud.databricks.com/serving-endpoints'


In [12]:
USER_PROMPT = "What are the sites of disease based on this piece of text: Since CT scan of DATE, Predominantly upper lobe and superior segment lower lobe ground glass nodules are not seen on this chest radiograph"

In [9]:
# Create client with Databricks
client = OpenAI(
    api_key=DATABRICKS_TOKEN,
    base_url=ENDPOINT_URL
)


In [10]:
def llm_prompt(
        prompt: str,
        serving_model=SERVING_MODEL,
        max_tokens=MAX_TOKENS
):
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "You are an AI assistant"
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        model=serving_model,
        max_tokens=max_tokens
    )
    
    return chat_completion.choices[0].message.content 
    

In [13]:
print(llm_prompt(prompt=USER_PROMPT))

The sites of disease mentioned in the text are:

* Upper lobe
* Superior segment of the lower lobe
