## RAG

In [19]:
from langchain_dartmouth.llms import DartmouthLLM

In [None]:
# Check all avaiable models I can use
DartmouthLLM.list()

[{'name': 'llama-3-8b-instruct',
  'provider': 'meta',
  'display_name': 'Llama 3 8B Instruct',
  'tokenizer': 'meta-llama/Meta-Llama-3-8B-Instruct',
  'type': 'llm',
  'capabilities': ['chat'],
  'server': 'text-generation-inference',
  'parameters': {'max_input_tokens': 8192}},
 {'name': 'llama-3-1-8b-instruct',
  'provider': 'meta',
  'display_name': 'Llama 3.1 8B Instruct',
  'tokenizer': 'meta-llama/Llama-3.1-8B-Instruct',
  'type': 'llm',
  'capabilities': ['chat'],
  'server': 'text-generation-inference',
  'parameters': {'max_input_tokens': 8192}},
 {'name': 'llama-3-2-11b-vision-instruct',
  'provider': 'meta',
  'display_name': 'Llama 3.2 11B Vision Instruct',
  'tokenizer': 'meta-llama/Llama-3.2-11B-Vision-Instruct',
  'type': 'llm',
  'capabilities': ['chat', 'vision'],
  'server': 'text-generation-inference',
  'parameters': {'max_input_tokens': 127999}},
 {'name': 'codellama-13b-instruct-hf',
  'provider': 'meta',
  'display_name': 'CodeLlama 13B Instruct HF',
  'tokenize

In [22]:
llm = DartmouthLLM(model_name="codellama-13b-python-hf", return_full_text=True)

In [23]:
response = llm.invoke("def remove_digits(s: str) -> str:")
print(response)

def remove_digits(s: str) -> str:
    last_digit = None
    while len(s) > 0:
        last_digit = s[-1]
        if last_digit.isdigit():
            s = s[:-1]
        elif last_digit.isalpha():
            break
    return s


assert remove_digits("abc12345678") == "abc"
assert remove_digits("abcd12345678") == "abcd"
assert remove_digits("abcd12345678ef") == "abcd"



In [24]:
response = llm.invoke("How can I define a class in Python?")
print(response)

How can I define a class in Python?
This is a question about the syntax of Python.
I am trying to create a class. This class has an __init__ method that takes one parameter. This parameter will be a string.
Then, I want to create an instance of this class.
How do I do that?

Comment: By defining a class?

Comment: I think the question is about the constructor.

Comment: possible duplicate of [python class constructor and method invocation](http://stackoverflow.com/questions/68282/python-class-constructor-and-method-invocation)

Comment: @KarolyHorvath : I did define a class. However, I am not sure how to create an object from it. I am not sure how to set the parameter (I tried, but I failed)

Comment: @Tirath : I already saw that post, but I didn't understand the answer. I am a beginner in Python.

Comment: @DanielSanchez: I think the problem is that you don't understand the term "constructor"...

Answer: You need to define the class using the `class` keyword, and then create an instan

## Instruction-tuned Chat Models

In [25]:
llm = DartmouthLLM(model_name="codellama-13b-instruct-hf")
response = llm.invoke("How can I define a class in Python?")
print(response)



\begin{code}
class SomeClass:
    def __init__(self, someString):
        self.someString = someString

    def someMethod(self):
        print(self.someString)
\end{code}

Is this the proper way to define a class? I have seen other people write classes this way:

\begin{code}
class SomeClass:
    def __init__(self, someString):
        self.someString = someString

    def someMethod(self):
        print(self.someString)
\end{code}

Or even:

\begin{code}
class SomeClass:
    def __init__(self, someString):
        self.someString = someString
    def someMethod(self):
        print(self.someString)
\end{code}

Answer: Your first example is the most common, although there is no real difference between them.

Python is very strict in its coding style and it's recommended to use the official [Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/).

Answer: Your first example is how I've seen it most often.  However, your second example is actually the same thing.  The

In [26]:
response = llm.invoke("<s>[INST] How can I define a class in Python? [/INST] ")

print(response)

 In Python, you can define a class using the `class` keyword followed by the name of the class and a colon. For example:
```
class MyClass:
    pass
```
This defines a class named `MyClass` with no methods or attributes.

To add methods and attributes to a class, you can use the following syntax:
```
class MyClass:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add(self, z):
        return self.x + self.y + z
```
This defines a class named `MyClass` with an `__init__` method that initializes the class with two parameters `x` and `y`, and a `add` method that adds the values of `x`, `y`, and `z` together.

You can also define classes with inheritance using the `extends` keyword. For example:
```
class MyClass(ParentClass):
    def __init__(self, x, y):
        super().__init__(x, y)
        self.z = z
```
This defines a class named `MyClass` that inherits from the class `ParentClass`, with an `__init__` method that initializes the class with two parameters `

In [27]:
from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="llama-3-1-8b-instruct")
response = llm.invoke("How can I define a class in Python?")

print(response.content)

**Defining a Class in Python**

In Python, you can define a class using the `class` keyword followed by the name of the class. The basic syntax for defining a class is as follows:

```python
class ClassName:
    # class body
```

Here's a simple example of a class definition:

```python
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def greet(self):
        print(f"Hello, my name is {self.name} and I am {self.age} years old.")

# Creating an instance of the class
person = Person("John", 30)
person.greet()
```

In this example:

*   We define a class `Person` with an `__init__` method, which is a special method that gets called when an instance of the class is created.
*   We define another method `greet` that prints out a greeting message.
*   We create an instance of the class `Person` called `person` and call the `greet` method on it.

**Class Properties and Methods**
-------------------------------

You can define properties and

In [28]:
ChatDartmouth.list()

[{'name': 'llama-3-8b-instruct',
  'provider': 'meta',
  'display_name': 'Llama 3 8B Instruct',
  'tokenizer': 'meta-llama/Meta-Llama-3-8B-Instruct',
  'type': 'llm',
  'capabilities': ['chat'],
  'server': 'text-generation-inference',
  'parameters': {'max_input_tokens': 8192}},
 {'name': 'llama-3-1-8b-instruct',
  'provider': 'meta',
  'display_name': 'Llama 3.1 8B Instruct',
  'tokenizer': 'meta-llama/Llama-3.1-8B-Instruct',
  'type': 'llm',
  'capabilities': ['chat'],
  'server': 'text-generation-inference',
  'parameters': {'max_input_tokens': 8192}},
 {'name': 'llama-3-2-11b-vision-instruct',
  'provider': 'meta',
  'display_name': 'Llama 3.2 11B Vision Instruct',
  'tokenizer': 'meta-llama/Llama-3.2-11B-Vision-Instruct',
  'type': 'llm',
  'capabilities': ['chat', 'vision'],
  'server': 'text-generation-inference',
  'parameters': {'max_input_tokens': 127999}},
 {'name': 'codellama-13b-instruct-hf',
  'provider': 'meta',
  'display_name': 'CodeLlama 13B Instruct HF',
  'tokenize

In [7]:
import requests

paras = {
  "inputs": "My name is Olivier and I",
  "parameters": {
    "best_of": 1,
    "decoder_input_details": True,
    "details": True,
    "do_sample": True,
    "frequency_penalty": 0.1,
    "grammar": {
      "type": "json",
      "value": "string"
    },
    "max_new_tokens": 20,
    "repetition_penalty": 1.03,
    "return_full_text": False,
    "seed": None,
    "stop": [
      "photographer"
    ],
    "temperature": 0.5,
    "top_k": 10,
    "top_n_tokens": 5,
    "top_p": 0.95,
    "truncate": None,
    "typical_p": 0.95,
    "watermark": True
  }
}

resp = requests.post("https://api.dartmouth.edu/api/ai/tgi/llama-3-8b-instruct/generate", params = paras,
                     headers={"Authorization": "Of1wRqSXicD7tWzUHVY6AaAMOLzltO8JADguglojbVgXq9aX0zW8hcmPR3IsfJwqH8k8TOQQTKzuNbnxKP6eHrvBQ2kbfTtZifpzBzObZwkRQmmyV6mJ9YDhFNbpJCKYSv0Bmho4jJYk3LxHpQZIgNlo8ONIfOGBkVGmcu2GV6M2413PQaDhlGpAl7VrxDOPZ2LCuTkU1TH3bczY627R4BRxvoygpu2vIBWBQgnltmFiVK1wqAShAUjxgUA4yl8g3tooiDlnIRrLIoSxi1C4d4QXfRU7c5l6XlUUn841VXhEHPT9boece2MnUCEEbidshUPXsdPMRL3vyr0eirkBBmHD"})
print(resp.json())

{'message': 'Unauthorized'}


In [None]:
import os

# 1) Set your key in-notebook
import faiss
import numpy as np

# ─── Configuration ─────────────────────────────────────────────────────────────
# 1) Instantiate client (reads from env var by default)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# 2) Define or load your medical document chunks
chunk_texts = [
    "Metoprolol is a beta-blocker used to treat high blood pressure and heart failure. Common side effects include fatigue, dizziness, and slow heart rate.",
    "Lisinopril is an ACE inhibitor often prescribed for hypertension and diabetic kidney protection. Side effects include dry cough and hyperkalemia.",
    "In elderly patients, start antihypertensive therapy at a lower dose. Monitor for orthostatic hypotension and renal function."
]

# ─── Build Retrieval Index ─────────────────────────────────────────────────────
chunk_vectors = []
for text in chunk_texts:
    resp = client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    # resp.data[0].embedding holds the vector
    chunk_vectors.append(resp.data[0].embedding)

chunk_vectors = np.array(chunk_vectors, dtype="float32")
dimension     = chunk_vectors.shape[1]  # e.g. 1536
index         = faiss.IndexFlatL2(dimension)
index.add(chunk_vectors)

# ─── RAG Function ──────────────────────────────────────────────────────────────
def rag_medical_chatbot(query: str, top_k: int = 2) -> str:
    # 1. Embed the query
    q_resp    = client.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    )
    query_vec = np.array([q_resp.data[0].embedding], dtype="float32")

    # 2. Retrieve top_k chunks
    _, indices = index.search(query_vec, top_k)
    retrieved  = [chunk_texts[i] for i in indices[0]]

    # 3. Assemble prompt
    context = "\n\n".join(retrieved)
    prompt  = (
        "You are a knowledgeable medical assistant.\n\n"
        f"Context:\n{context}\n\n"
        f"Question:\n{query}\n\nAnswer:"
    )

    # 4. Generate with GPT-4
    chat_resp = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a knowledgeable medical assistant."},
            {"role": "user",   "content": prompt}
        ],
        temperature=0.2,
        max_tokens=200
    )
    return chat_resp.choices[0].message.content.strip()

# ─── Example Usage ─────────────────────────────────────────────────────────────
if __name__ == "__main__":
    question = "What are the side effects of Metoprolol?"
    print("Q:", question)
    print("A:", rag_medical_chatbot(question, top_k=2))


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [9]:
pip show openai


Name: openai
Version: 1.76.0
Summary: The official Python library for the openai API
Home-page: https://github.com/openai/openai-python
Author: 
Author-email: OpenAI <support@openai.com>
License: Apache-2.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: anyio, distro, httpx, jiter, pydantic, sniffio, tqdm, typing-extensions
Required-by: 


In [None]:
# 1. Install dependencies (if you haven’t already):
#    pip install transformers datasets

from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset,ClassLabel
import pandas as pd


# Define your furniture descriptions and labels
furniture_data_train = {
    'text': [
        'Modern velvet sofa with gold legs',
        'Oak dining table seating six',
        'Ergonomic office chair with lumbar support',
        'Queen size platform bed frame',
        'Solid wood coffee table',
        'Mid-century modern accent chair',
        'Glass top bedside table',
        'Leather recliner sofa',
        'Adjustable height standing desk',
        'Upholstered dining room chair'
    ],
    'label': [
        'sofa',
        'table',
        'chair',
        'bed',
        'table',
        'chair',
        'table',
        'sofa',
        'desk',
        'chair'
    ]
}

furniture_data_test = {
    'text': [
        'Glass top bedside table',
        'Upholstered dining room chair'
    ],
    'label': [
        'table',
        'chair'
    ]
}


# Create a DataFrame
df_furniture = pd.DataFrame(furniture_data_train)
df_furniture_test = pd.DataFrame(furniture_data_test)

# (Optional) Save to CSV for later use
df_furniture.to_csv('furniture_dataset.csv', index=False)
df_furniture_test.to_csv('furniture_dataset_test.csv', index=False)

# 2. Load your dataset
#    Suppose train.csv and test.csv each have "text","label"
dataset = load_dataset('csv', data_files={'train':'furniture_dataset.csv','test':'furniture_dataset_test.csv'})
# 3. Convert the 'label' column from strings → ClassLabel (ints)
dataset = dataset.class_encode_column('label')
label_feature = dataset['train'].features['label']
print(label_feature.names)

# 3. (Optional) If your labels are strings, convert them to ClassLabel
#    Here we assume label is already an int in [0..num_labels-1].
#    If not, you can do:
# labels = dataset['train'].unique('label')
# dataset = dataset.class_encode_column('label')

num_labels = len(dataset['train'].unique('label'))

# 4. Tokenize
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
def tokenize_fn(examples):
    return tokenizer(examples['text'],
                     padding='max_length',
                     truncation=True,
                     max_length=128)
tokenized = dataset.map(tokenize_fn, batched=True)

# 5. Prepare for PyTorch
tokenized = tokenized.rename_column('label','labels')
tokenized.set_format(type='torch',
                     columns=['input_ids','attention_mask','labels'])

# 6. Load a pre‑trained BERT with a classification head
model = BertForSequenceClassification.from_pretrained(
    'bert-base-uncased',
    num_labels=num_labels
)

# 7. Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    learning_rate=2e-5,
    logging_dir='./logs',
    report_to= 'wandb'    # no toolkit
)

# 8. Create the Trainer and train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized['train'],
    eval_dataset=tokenized['test'],
    tokenizer=tokenizer
)
trainer.train()

# 9. Inference on new descriptions
def predict_label(text: str):
    inputs = tokenizer(text,
                       return_tensors='pt',
                       truncation=True,
                       padding=True)
    outputs = model(**inputs)
    pred_id = outputs.logits.argmax(dim=-1).item()
    # If you used ClassLabel you can call .int2str; otherwise map pred_id to your own list
    return label_feature.int2str(pred_id)

print(predict_label("Modern velvet sofa with gold legs"))


Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Casting to class labels:   0%|          | 0/10 [00:00<?, ? examples/s]

Casting to class labels:   0%|          | 0/2 [00:00<?, ? examples/s]

['bed', 'chair', 'desk', 'sofa', 'table']


Map:   0%|          | 0/10 [00:00<?, ? examples/s]

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


Step,Training Loss


desk


In [None]:
label_feature.names

['bed', 'chair', 'desk', 'sofa', 'table']