- Model specifications: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-1.5-pro
- How to use the API: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#python-openai_2
    - Examples: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#sample-requests
- Pricing: https://cloud.google.com/vertex-ai/generative-ai/pricing

In [2]:
import numpy as np
import pandas as pd
from TabuLLM.embed import TextColumnTransformer
from TabuLLM.cluster import SphericalKMeans
df = pd.read_csv('../../data/raw.csv')
embeddings = TextColumnTransformer(
    type = 'st'
    , embedding_model_st = 'sentence-transformers/all-MiniLM-L6-v2'
).fit_transform(df.loc[:, ['diagnoses']])
n_clusters = 10
cluster_labels = SphericalKMeans(n_clusters=n_clusters).fit_predict(embeddings)
#assert np.array_equal(np.unique(cluster_labels), np.arange(0, n_clusters + 0))



In [6]:
from TabuLLM.explain import generate_prompt
preamble = '''
The following is a list of 830 pediatric cardiopulmonary bypass (CPB) surgeries. Text lines represent procedures performed on each patient. 
These CPB surgeries have been grouped into 10 groups, according to their planned procedures. 
Please suggest group labels that are representative of their members, and also distinct from each other:
'''
prompt = generate_prompt(
    text_list = list(df['diagnoses'])
    , cluster_labels = cluster_labels
    , preamble = preamble
)

In [7]:
import vertexai
from vertexai.generative_models import GenerativeModel

# TODO(developer): Update and un-comment below line
import os
from dotenv import load_dotenv
load_dotenv()
google_project_id = os.getenv('VERTEXAI_PROJECT')
google_location = os.getenv('VERTEXAI_LOCATION')

vertexai.init(project=google_project_id, location=google_location)

model = GenerativeModel("gemini-1.5-flash-001")

response = model.generate_content(
    #"What's a good name for a flower shop that specializes in selling bouquets of dried flowers?"
    prompt
)

print(response.text)

## Suggested Group Labels for Pediatric CPB Surgeries:

Here are some potential labels for the 10 groups, aiming for clarity and distinction:

**Group 1: Complex Right Ventricular Outflow Tract Obstruction**
* This group features a variety of conditions primarily affecting the right ventricle's outflow, including pulmonary atresia, double outlet right ventricle, and conduit failures. It also includes some associated anomalies like TGA and MAPCAs.

**Group 2: Transposition of the Great Arteries with Concordant Atrioventricular Connections**
* This group is well-defined and focuses on a specific type of transposition with intact ventricular septum, occasionally featuring additional anomalies like pulmonary stenosis and tricuspid atresia.

**Group 3: Valve Defects and Obstructions**
* This group is broad, encompassing congenital and acquired valve abnormalities, primarily affecting the aortic and mitral valves, with frequent occurrences of regurgitation, stenosis, and prolapse.

**Group 4