In [2]:
import torch
import wandb
from transformers import AutoTokenizer, AutoModelForCausalLM

# Your local model paths
model_paths = {
    "rate01-50": "D:/Projects/ByteMentor/models/rate01-50",
    "rate2e-4-500": "D:/Projects/ByteMentor/models/rate2e-4-500",
    "rate2e-2-10000_checkpoint-5000": "D:/Projects/ByteMentor/models/rate2e-2-10000_checkpoint-5000"
}

# Evaluation prompts
prompts = [
    "What is the core concept of Explorative_Datenanalyse, focusing on data discovery?",
    "What is the principle of distance calculation in Informatik_Programieren?",
    "What is the fundamental principle behind line integral in Lineare_Algebra?",
    "What is the concept of overfitting in machine learning?"
]

# Evaluation function with W&B logging
def evaluate_and_log(model_dir, run_name, prompts):
    print(f"Running evaluation for: {run_name}")
    wandb.init(project="gemma-comparison", name=run_name)

    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForCausalLM.from_pretrained(model_dir).cuda()
    model.eval()

    # Create a W&B Table to log all prompt-output pairs
    table = wandb.Table(columns=["Prompt", "Output"])

    for prompt in prompts:
        inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens=100)
        decoded = tokenizer.decode(output[0], skip_special_tokens=True)

        print(f"\nPrompt: {prompt}\nOutput: {decoded}\n")

        table.add_data(prompt, decoded)

    # Log the full table once at the end
    wandb.log({"Prompt-Output Table": table})
    wandb.finish()

# Run evaluation for each model
for run_name, path in model_paths.items():
    evaluate_and_log(path, run_name, prompts)


Running evaluation for: rate01-50


Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.21s/it]



Prompt: What is the core concept of Explorative_Datenanalyse, focusing on data discovery?
Output: What is the core concept of Explorative_Datenanalyse, focusing on data discovery?

**Explorative_Datenanalyse** is a research area that focuses on the **discovery of patterns and relationships** in **unstructured and semi-structured data**. This involves a **multi-step process** that includes data cleaning, data transformation, data exploration, and data mining.

**Key concepts in data discovery include:**

* **Data exploration:** The process of **discovering and understanding the data** through visual analysis, data mining, and statistical techniques.
* **Data mining:**


Prompt: What is the principle of distance calculation in Informatik_Programieren?
Output: What is the principle of distance calculation in Informatik_Programieren?

The principle of distance calculation in Informatik_Programieren is based on the idea that the distance between two points can be calculated based on the di

[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


Running evaluation for: rate2e-4-500


Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.06s/it]



Prompt: What is the core concept of Explorative_Datenanalyse, focusing on data discovery?
Output: What is the core concept of Explorative_Datenanalyse, focusing on data discovery?

**Explorative_Datenanalyse** is a data science approach that focuses on **uncovering and understanding data** through a **creative and iterative process**. It emphasizes the **discovery of patterns and relationships** in data, rather than just analyzing pre-defined variables or hypotheses.

At the core of this approach is the concept of **data discovery**, which involves the following steps:

* **Data identification:** Identifying the data sources and types you want to explore.
* **Data exploration:** Brows


Prompt: What is the principle of distance calculation in Informatik_Programieren?
Output: What is the principle of distance calculation in Informatik_Programieren?

The principle of distance calculation in Informatik_Programieren states that the distance between two points is the number of steps it wou

[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


Running evaluation for: rate2e-2-10000_checkpoint-5000


Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.09s/it]



Prompt: What is the core concept of Explorative_Datenanalyse, focusing on data discovery?
Output: What is the core concept of Explorative_Datenanalyse, focusing on data discovery?


Prompt: What is the principle of distance calculation in Informatik_Programieren?
Output: What is the principle of distance calculation in Informatik_Programieren?


Prompt: What is the fundamental principle behind line integral in Lineare_Algebra?
Output: What is the fundamental principle behind line integral in Lineare_Algebra?


Prompt: What is the concept of overfitting in machine learning?
Output: What is the concept of overfitting in machine learning?



[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


In [3]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set the model you want to test
model_path = "D:/Projects/ByteMentor/models/rate2e-4-500"  # Change to any of your model folders

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path).cuda()
model.eval()

# Define your prompt
prompt = "What is the concept of overfitting in machine learning?"

# Tokenize and generate output
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=100)

# Decode and print result
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Prompt: {prompt}\n\nOutput: {response}")


Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.64s/it]


Prompt: What is the concept of overfitting in machine learning?

Output: What is the concept of overfitting in machine learning?

Overfitting is when a machine learning model becomes too closely fit to the training data, rather than generalizing well to new data. This can lead to poor performance on unseen data, as the model has not learned the underlying patterns in the data.

**Key concepts:**

* **Statistical inference:** Machine learning models are trained to make predictions or decisions based on data.
* **Generalization:** The goal of machine learning is to build models that can perform well on unseen data.
*
