# Deep Learning - GPT-like Physics assistant

Authors: Giacomo Zuccarino, Jhon Sebastián Moreno Triana.

Programmed as part of the assignment of the course P2.11_advanced_DL_24_25.

Professors: Alerto Cazzaniga, Cristiano de Nobili

Program: Master in High Performance Computing.

Institution: SISSA/ICTP, Trieste.

---

## 0. Installation, Imports and initial setup

First we need to install the packages that we're going to use. In this case the main packages are `unsloth`, a package for one GPU optimization, `huggingface_hub` and related packages, packages for push, pull and set up the models, and `vllm` a library for LLM inference and serving.

> ⚠️ <font color="GoldenRod"><b>CAUTION</b> </font>
>
> <font color="GoldenRod">Before you run this cells you should have a hugging face account and to generate an authentication token and name it `hf_key`, you can check the [hugging face webpage](https://huggingface.co/docs/hub/security-tokens) related with it.</font>

> ⚡ <font color="Tomato"><b>IMPORTANT</b> </font>
>
> <font color="Tomato" >You need to run this section, or at list the installation part for make the entire notebook works.</font>


In [1]:
# Installing the unsloth and vllm packages
%%capture
!pip install --no-deps unsloth vllm

In [2]:
# Installing all the dependencies for the CPT and Fine tuning
%%capture
import os, sys, re, requests; modules = list(sys.modules.keys())
for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft "trl==0.15.2" triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer

# vLLM requirements - vLLM breaks Colab due to reinstalling numpy
f = requests.get("https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt").content
with open("vllm_requirements.txt", "wb") as file:
    file.write(re.sub(rb"(transformers|numpy|xformers)[^\n]{1,}\n", b"", f))
!pip install -r vllm_requirements.txt

Then we can use a hugging face authentication key for login and use hugging face models.

In [3]:
# Loading Hugging face's auth key
from google.colab import userdata
hf_api_key=userdata.get('hf_key')

In [4]:
# Login in to hugging face account
from huggingface_hub import login
from unsloth import FastModel
import torch

login(hf_api_key)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 05-07 12:30:44 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 05-07 12:30:44 [__init__.py:239] Automatically detected platform cuda.


For using the model in forward steps we define the prompt here, is convenient for just load the train model and use it.

In [5]:
# Convenient prompt definition for use the model without problems
prompt = """Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

## Train and save the model

The following section contains the training of the model and then push it to the hugging face repository.

> ⚡ <font color="Tomato"><b>IMPORTANT</b> </font>
>
> <font color="Tomato">Just run this section if you want to re-]train the model or to change the training datasets!
>
> If you just want to load the model and use it you can go directly to the next section named [Using the model](#using-the-model)</font>

### 1.0. General Set up

For the exercise, we decide to use the model `unsloth/gemma-3-4b-it-unsloth-bnb-4bit` for the starting point. In this subsection we load the model and the LoRA adapters.

So first we load the base model.

In [6]:
# Model parameters definitions
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
load_in_8bit = False # For use 8bit quantization. A bit more accurate, uses 2x memory

# Loading the model from hugging face
model, tokenizer = FastModel.from_pretrained(
    # Using gemma 3, 4b and 4 bit LLM for training
    model_name = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit", #**
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    load_in_8bit = load_in_8bit,
)

==((====))==  Unsloth 2025.4.7: Fast Gemma3 patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.56G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/70.0 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json:   0%|          | 0.00/1.16M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/670 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update a small amount of parameters.

In [7]:
model = FastModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj", "lm_head"], # Add for continual pretraining
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 172,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)



Unsloth: Making `base_model.model.vision_tower.vision_model` require gradients


### 2.0. Continued-Pre Training (CPT)

For the CPT we choose the `gallen881/arxiv-physics` dataset for give some scientific lexic and context. In this section we load the dataset, define a trainer and train the base model using the dataset.

We load the dataset in the following cell.

In [8]:
from datasets import load_dataset

# Loading the dataset
dataset = load_dataset("gallen881/arxiv-physics", split = "train[:5000]") #**
EOS_TOKEN = tokenizer.eos_token
# Formating the dataset
def formatting_prompts_func(examples):
    return { "text" : [example + EOS_TOKEN for example in examples["answer"]] }
dataset = dataset.map(formatting_prompts_func, batched = True,)

# Split the dataset into training and testing sets
dataset_dict = dataset.train_test_split(test_size=0.05)

train_dataset = dataset_dict['train']
eval_dataset = dataset_dict['test']

README.md:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

(…)-00000-of-00001-5bba4a271402bdbb.parquet:   0%|          | 0.00/11.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/30231 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

In [9]:
# Check the dataset output
dataset[:1]

{'question': ['What is the energy motion problem in a stationary einstein universe?'],
 'answer': ['The energy motion problem in a stationary Einstein universe arises due to the fact that, in this type of universe, matter is uniformly distributed throughout an infinite space, leading to the absence of any gravitational field. However, it is well-known that a stationary particle in Newtonian mechanics conserves both energy and momentum, which is not the case in the general theory of relativity. The challenge is to reconcile the concept of conservation of energy and momentum in this scenario. One possible solution is the use of a non-static reference frame that includes the total mass of the universe, which leads to the appearance of "pseudo-forces" that balance out the energy and momentum.'],
 'text': ['The energy motion problem in a stationary Einstein universe arises due to the fact that, in this type of universe, matter is uniformly distributed throughout an infinite space, leading t

For the training we can define a trainer, the next cell is the definition of it.

In [10]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

# Define the trainer using unsloth.
trainer = UnslothTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,

    args = UnslothTrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 8,

        # num_train_epochs = 1,
        max_steps = 20,
        #evaluation_strategy="steps",
        #eval_steps=5,
        warmup_ratio=0.1,

        learning_rate = 2e-5,
        embedding_learning_rate = 1e-5, #set embedding_learning_rate to be a learning rate at least 2x or 10x smaller
                                        #than learning_rate to make continual pretraining work!

        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.00,
        lr_scheduler_type = "cosine", # We can experiment with "cosine" and "constant"
        seed = 172,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/4750 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/250 [00:00<?, ? examples/s]

**Showing the memory status**

In [11]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
4.809 GB of memory reserved.


Then, we can finally train the model with the new dataset

In [12]:
# pre-train model
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 4,750 | Num Epochs = 1 | Total steps = 20
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 74,049,536/4,000,000,000 (1.85% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.0102
2,1.8437
3,2.0113
4,1.8231
5,1.9738
6,1.949
7,2.0778
8,2.006
9,1.89
10,1.9337




**Show final memory and time stats after training**

In [13]:
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

188.2798 seconds used for training.
3.14 minutes used for training.
Peak reserved memory = 5.23 GB.
Peak reserved memory for training = 0.421 GB.
Peak reserved memory % of max memory = 13.221 %.
Peak reserved memory for training % of max memory = 1.064 %.


### 3.0. Fine Tuning Training

For the fine tuning we choose the `Akul/alpaca_physics_dataset` dataset, from hugging face hub. In this section we are going to follow a similar path to the CPT step, but we are going to use a diferent format way for the model.

We load de dataset and use the first 5000 datapoints.

In [14]:
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
# Formating the prompts for this particular model
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
# load the dataset and use the formating function.
dataset = load_dataset("Akul/alpaca_physics_dataset", split = "train[:5000]")
dataset = dataset.map(formatting_prompts_func, batched = True,)

# Split the dataset into training and testing sets
dataset_dict = dataset.train_test_split(test_size=0.005)

train_dataset = dataset_dict['train']
eval_dataset = dataset_dict['test']

alpaca_physics_dataset.csv:   0%|          | 0.00/50.9M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/19999 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Define the trainer for the fine tuning stage.

In [15]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = True, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        #evaluation_strategy="steps",
        eval_steps=5,
        warmup_ratio=0.1,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 30,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 172,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/4975 [00:00<?, ? examples/s]

Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/25 [00:00<?, ? examples/s]

Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!


Finally, train the model with the fine tuning.

In [16]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 4,975 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 74,049,536/4,000,000,000 (1.85% trained)


Step,Training Loss
1,1.3231
2,1.5534
3,1.322
4,1.2458
5,1.0974
6,0.9803
7,0.9794
8,0.9509
9,1.037
10,0.8818




### 4.0. Inference and push

Now, we are ready to use the model 😈. Lets give some prompts to the model.


> ⚠️ <font color="GoldenRod"><b>CAUTION</b> </font>
>
> <font color="GoldenRod">This section can alterate the current status of your model in hugging face.</font>

In the following one we are asking for physics unrelated questions and passing just an instruction.

In [17]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "Give me a bulleted list of the past 10 Masters Tournament Champions.",  # instruction
            "",  # input
            "",  # output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
Give me a bulleted list of the past 10 Masters Tournament Champions.

### Input:


### Response:
* 1934 - Bobby Jones
* 1935 - Bobby Jones
* 1936 - Ralph Giffen
* 1937 - Vernon McAlister
* 1938 - Cary Grant
* 1939 - Robert McQuiston
* 1941 - Gene Sarazen
* 1942 - Hank Hanahan
* 1943 - Sam Snead
* 1944 - Ed Opinski
<end_of_turn>


In the next one We ask a complicated task without instruction and just an input.


In [18]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "",  # instruction
            "How can I calculate the Hydrogen atom energy using the Schrodinger equation?",  # input
            "",  # output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:


### Input:
How can I calculate the Hydrogen atom energy using the Schrodinger equation?

### Response:
The Schrödinger equation is a fundamental equation in quantum mechanics that describes the behavior of particles in a system.  It can be written as:

  HΨ = EΨ

where:
- H is the Hamiltonian operator, which represents the total energy of the system
- Ψ is the wavefunction of the system, which describes the probability distribution of the particle's state
- E is the energy eigenvalue, which is the energy of the system in a particular state

To find the energy eigenvalues, one must solve the Schrödinger equation for a given potential energy V(r, t) that describes the forces acting on the system. In the case of a single hydrogen atom, the potential energy is the Coulomb potential between the electron and the

In the next one we ask the previous one question but giving an instruction to the model.

In [19]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "You are a helpful physics assistant",  # instruction
            "How can I calculate the Hydrogen atom energy using the Schrodinger equation?",  # input
            "",  # output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a helpful physics assistant

### Input:
How can I calculate the Hydrogen atom energy using the Schrodinger equation?

### Response:
To calculate the energy of a Hydrogen atom using the Schrödinger equation, we can solve the time-independent Schrödinger equation for the non-relativistic case.
The potential energy for an electron is given by the potential caused by the nucleus:
$$
V(r) = -e/r
$$
where e is the electronic charge and r is the distance between the electron and the proton.
For the 1s state, we solve the time-independent Schrödinger equation with the potential:
$$
\hat{H} \psi_{1s}(r) = E_{1s} \psi_{1s}(r)
$$
where $\hat{H}$ is the Hamiltonian operator and $E_{1s}$ is the energy of the 1s state.
The 1s wave function is a spherical harmonic function:
$$
\psi_{1s}(r) = \frac{1}{\sqrt{2 \pi}} 

We test with a diferent context, in this case black holes context.

In [20]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "You are a helpful physics assistant",  # instruction
            "Can you explain to me what is the no hair theorem?",  # input
            "",  # output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a helpful physics assistant

### Input:
Can you explain to me what is the no hair theorem?

### Response:
The no-hair theorem, in physics, is the assertion that a macroscopic black hole can only be described by a black hole that has no angular momentum and no electrical charge. To put it simply, a black hole can only have "smooth" or "naked" spacetime around it, meaning it has no angular momentum or electrical charge. It can only come in the form of an uncharged, non-rotating black hole. The theorem is a consequence of the fact that the event horizon of a black hole cannot be a closed surface. In other words, a black hole's event horizon must have a "proper" boundary, such that any particles or information that enter the event horizon cannot come back out, and it must be a one-way surface. In the con

#### 4.1. Push the model to hugging face hub

We are going to save the model and/or push to hugging face hub.

> ⚠️ <font color="GoldenRod"><b>CAUTION</b> </font>
>
> <font color="GoldenRod">Take care of the state of the following cells you are going to save the model locally and push to hugging face. This can overwrite the current state of your model.</font>

First we save it into memory.

In [21]:
# use only to save
if False: # Change to true if you want to save the model in memory
    model.save_pretrained("gemma-3b-physics-instruct-alpaca-v2")
    tokenizer.save_pretrained("gemma-3b-physics-instruct-alpaca-v2")

Now we want to push the model to hugging face.

In [22]:
from huggingface_hub import HfApi

# Using the save model in the memory for pushing it
if False: # Change to True if you want to push the model to hugging face, the model has to be saved first
    HfApi().upload_folder(
        folder_path="gemma-3b-physics-instruct-alpaca-v2",
        repo_id="Jh0mpis/gemma-3b-physics-instruct-alpaca-v2",
        commit_message="Second version of the model changing the geophysics model for fine-tuning"
    )

<a name="using-the-model"></a>
## Using the model

Now, we want to use the pre-trained model from hugging face and try to get some question and answer interaction.

> 📝 <font color="DodgerBlue"><b>NOTE</b></font>
>
> <font color="DodgerBlue">This section assumes that you have a hugging face model, however if you have a local saved model it can be loaded changing the _hugging face path_ to the _model path_.</front>

### 5.0. Loading and using the model again

We are going to load the model and do some question to check the answers. That's all.

In [23]:
from unsloth import FastModel

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
load_in_8bit = False # [NEW!] A bit more accurate, uses 2x memory

model, tokenizer = FastModel.from_pretrained(
    model_name = "Jh0mpis/gemma-3b-physics-instruct-alpaca-v2",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    load_in_8bit = load_in_8bit
)

==((====))==  Unsloth 2025.4.7: Fast Gemma3 patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!




adapter_model.safetensors:   0%|          | 0.00/1.64G [00:00<?, ?B/s]

In [24]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "",  # instruction
            "How can I calculate the event horizon in Schwarszchild black hole?",  # input
            "",  # output
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:


### Input:
How can I calculate the event horizon in Schwarszchild black hole?

### Response:
### Response:
To calculate the event horizon of a Schwarschild black hole, we can use the following formula:

r_H = (GM^2)/(c^4)

where:

* r_H is the radius of the event horizon
* G is the gravitational constant (6.6743 × 10^-11 m^3 kg^-1 s^-2)
* M is the mass of the black hole
* c is the speed of light (2.998 × 10^8 m s^-1)

To calculate the radius of the event horizon, simply plug in the values for the mass of the black hole and the speed of light into the formula:

In this case, we can use the equation for the radius of the event horizon in a Schwarschild black hole, which is:

r = (2*GM)/c^2

Where:

* r is the radius of the event horizon
* G is the gravitational constant (6.6743 × 10^-11 m^3 kg^-1 s^-2)
* M i

In [25]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "You are a helpful physics assistant",  # instruction
            "How can I calculate the event horizon in Schwarszchild black hole?",  # input
            "",  # output
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a helpful physics assistant

### Input:
How can I calculate the event horizon in Schwarszchild black hole?

### Response:
### Response:
The event horizon of a Schwarzschild black hole is a theoretical boundary in spacetime beyond which events cannot affect an object due to the black hole's intense gravity. It's defined by the radius of the event horizon, also known as the Schwarzschild radius, which is determined by the black hole's mass.

The formula for calculating the event horizon of a Schwarzschild black hole is:

R_H = (2 G M^2 / c^2)^(1/3)

Where:
G is the gravitational constant (6.674 × 10^-11 m^3 kg^-1 s^-2)
M is the mass of the black hole (in kilograms)
c is the speed of light (299,792,458 m/s)

Let's derive the Schwarzschild radius:

1. Consider a point with mass M at the center of a spher

In [26]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "You are a helpful physics assistant",  # instruction
            "¿Como puedo calcular el horizonte de eventos en un agujero negro de tipo Schwarszchild?",  # input
            "",  # output
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a helpful physics assistant

### Input:
¿Como puedo calcular el horizonte de eventos en un agujero negro de tipo Schwarszchild?

### Response:
### Response:
El horizonte de eventos en un agujero negro de tipo Schwarszchild es la región alrededor de la singularidad donde la gravedad es tan fuerte que nada, ni siquiera la luz, puede escapar. La distancia a este horizonte es conocida como el radio de Schwarszchild, y puede calcularse utilizando la siguiente fórmula:

r_s = (2GM)/(c^2)

donde:
r_s es el radio de Schwarszchild
G es la constante gravitacional universal (6.674 x 10^-11 m^3 kg^-1 s^-2)
M es la masa del agujero negro
c es la velocidad de la luz (2.998 x 10^8 m/s)

Para calcular el horizonte de eventos en un agujero negro de tipo Schwarszchild, primero debes conocer la masa del agujero negro (

In [27]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "",  # instruction
            "What is spin?",  # input
            "",  # output
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:


### Input:
What is spin?

### Response:
Spin is a fundamental concept in quantum mechanics, describing the intrinsic angular momentum of a particle. It's a purely quantum property, meaning that it doesn't have a corresponding classical counterpart. It's described in terms of an intrinsic angular momentum and its magnetic moment. It plays a critical role in many quantum phenomena, such as the behavior of electrons in materials, the quantum computing paradigm, and the development of new technologies. In essence, spin is an intrinsic form of angular momentum associated with elementary particles, which has a fundamental impact on many aspects of physics and technology.
<end_of_turn>


In [28]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "You are a helpful physics assistant",  # instruction
            "What is spin?",  # input
            "",  # output
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a helpful physics assistant

### Input:
What is spin?

### Response:
Spin is an intrinsic form of angular momentum possessed by elementary particles. It is a fundamental property of a particle, like mass and charge, which is not easily measurable but is intrinsic to the particle. In quantum mechanics, spin is represented by an angular momentum that is quantized, meaning it can only take on specific values. For example, an electron has a spin of 1/2, meaning its spin angular momentum can only have values of +1/2 or -1/2.  This intrinsic angular momentum is what makes it a quantum property. Spin is also essential for other fundamental properties of the universe, such as its interactions with magnetic and electric fields.
<end_of_turn>


In [29]:
FastModel.for_inference(model)  # Enable faster inference
inputs = tokenizer(
    [
        prompt.format(
            "You are a helpful physics assistant",  # instruction
            "Che cos'e lo spin?",  # input #Some tokens like accents can give errors
            "",  # output
        )
    ], return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True)
outputs = tokenizer.decode(outputs[0])
print(outputs)

<bos>Below is an instruction that describes a task,
paired with an input that provides further context.
Write a response that appropriately completes the request.

### Instruction:
You are a helpful physics assistant

### Input:
Che cos'e lo spin?

### Response:
The spin is a quantum mechanical property that can be associated with particles or other fundamental physical objects, such as nuclei and electrons. It's an intrinsic angular momentum that has no classical analog. 

The spin angular momentum is characterized by a quantization of an intrinsic angular momentum. The spin of a particle is a quantum mechanical property that is not related to the particle's actual rotation in space. Instead, it represents an intrinsic angular momentum possessed by the particle. It's a fundamental property, like mass or charge, and it has no classical counterpart. 

The spin can be described by the "spin quantum number" which is a property that can take on any integer value or half-integer value. Spin