## Importing Dependancies

In [None]:
%%capture
!pip install unsloth "xformers==0.0.28.post2"
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# !pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git
# Install Flash Attention 2 for softcapping support
import torch
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install --no-deps packaging ninja einops "flash-attn>=2.6.3"

In [None]:
torch.cuda.get_device_capability()  #Check if we're using a quality GPU (A100) usually > or == to 8

(8, 9)

In [None]:
import warnings
warnings.filterwarnings('ignore')  # suppress  warnings
from transformers import TextStreamer # generated response in real time

In [None]:
from unsloth import FastLanguageModel
from IPython.display import Markdown, display
import torch
max_seq_length = 2048
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage.

# 4bit pre quantized models  supported for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2-9b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: If you want to finetune Gemma 2, install flash-attn to make it faster!
To install flash-attn, do the below:

pip install --no-deps --upgrade "flash-attn>=2.6.3"
==((====))==  Unsloth 2024.11.5: Fast Gemma2 patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/6.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/46.4k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

#### Inference Before Finetunig

In [None]:
prompt_template='''
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: {problem} \n
Response: {response}
'''

In [None]:

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Find the ordered pair  (𝑥,𝑦)that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5",
        response="",
    )
], return_tensors = "pt").to("cuda")
streamer = TextStreamer(tokenizer)
outputs = model.generate(**inputs, streamer=streamer,max_new_tokens = 1024, use_cache = True)
display(Markdown(tokenizer.batch_decode(outputs)[0]))

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Find the ordered pair  (𝑥,𝑦)that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5 

Response: 
The ordered pair that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5 is  (𝑥,𝑦)=(1,2).

Explanation:
To find the ordered pair that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5, we can use the method of substitution.

First, we can solve the second equation for  𝑥:
3𝑦−𝑥=5
𝑥=3𝑦−5

Next, we can substitute this expression for  𝑥 into the first equation:
7𝑥−50𝑦=3
7(3𝑦−5)−50𝑦=3
21𝑦−35−50𝑦=3
−29𝑦=38
𝑦=−38/29

Now that we have found the value of  𝑦, we can substitute it back into the expression for  𝑥:
𝑥=3𝑦−5
𝑥=3(−38/29)−5
𝑥=−119/29

Therefore, the ordered pair that satisfies the equations  7𝑥−50𝑦=3 and

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Find the ordered pair  (𝑥,𝑦)that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5 

Response: 
The ordered pair that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5 is  (𝑥,𝑦)=(1,2).

Explanation:
To find the ordered pair that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5, we can use the method of substitution.

First, we can solve the second equation for  𝑥:
3𝑦−𝑥=5
𝑥=3𝑦−5

Next, we can substitute this expression for  𝑥 into the first equation:
7𝑥−50𝑦=3
7(3𝑦−5)−50𝑦=3
21𝑦−35−50𝑦=3
−29𝑦=38
𝑦=−38/29

Now that we have found the value of  𝑦, we can substitute it back into the expression for  𝑥:
𝑥=3𝑦−5
𝑥=3(−38/29)−5
𝑥=−119/29

Therefore, the ordered pair that satisfies the equations  7𝑥−50𝑦=3 and  3𝑦−𝑥=5 is  (𝑥,𝑦)=(−119/29,−38/29).<eos>

In [None]:

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum?",
        response="",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, streamer=streamer, max_new_tokens = 1024, use_cache = True)
display(Markdown(tokenizer.batch_decode(outputs)[0]))

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum? 

Response: 
Sean's sum is 62,450 and Julie's sum is 45,150. 
Sean's sum divided by Julie's sum is 1.38. 

Explanation: 
To find Sean's sum, we can use the formula for the sum of an arithmetic series: 
Sum = (n/2) * (first term + last term) 
In this case, n = 250 (since we are adding up all the odd integers from 1 to 499, inclusive), the first term is 1, and the last term is 499. 
Sum = (250/2) * (1 + 499) = 62,450 
To find Julie's sum, we can use the same formula: 
Sum = (n/2) * (first term + last term) 
In this case, n = 300 

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum? 

Response: 
Sean's sum is 62,450 and Julie's sum is 45,150. 
Sean's sum divided by Julie's sum is 1.38. 

Explanation: 
To find Sean's sum, we can use the formula for the sum of an arithmetic series: 
Sum = (n/2) * (first term + last term) 
In this case, n = 250 (since we are adding up all the odd integers from 1 to 499, inclusive), the first term is 1, and the last term is 499. 
Sum = (250/2) * (1 + 499) = 62,450 
To find Julie's sum, we can use the same formula: 
Sum = (n/2) * (first term + last term) 
In this case, n = 300 (since we are adding up all the integers from 1 to 300, inclusive), the first term is 1, and the last term is 300. 
Sum = (300/2) * (1 + 300) = 45,150 
To find Sean's sum divided by Julie's sum, we simply divide Sean's sum by Julie's sum: 
62,450 / 45,150 = 1.38 
Therefore, Sean's sum divided by Julie's sum is 1.38.<eos>

In [None]:

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="What is prob of picking 2 h and 2 c when four letters picked without replacement from {h: 5, c: 12}",
        response="",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs,  streamer=streamer, max_new_tokens = 1024, use_cache = True)
display(Markdown(tokenizer.batch_decode(outputs)[0]))

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: What is prob of picking 2 h and 2 c when four letters picked without replacement from {h: 5, c: 12} 

Response: 
The probability of picking 2 h and 2 c when four letters are picked without replacement from {h: 5, c: 12} is 0.125.

Explanation: 
To solve this problem, we can use the concept of combinations. We need to find the total number of ways to pick four letters from the given set, and then find the number of ways to pick two h and two c.

Total number of ways to pick four letters from the given set: 
There are 17 letters in the set, and we need to pick four of them. The total number of ways to do this is given by the combination formula:

C(17, 4) = 17! / (4! * (17 - 4)!) = 2380

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: What is prob of picking 2 h and 2 c when four letters picked without replacement from {h: 5, c: 12} 

Response: 
The probability of picking 2 h and 2 c when four letters are picked without replacement from {h: 5, c: 12} is 0.125.

Explanation: 
To solve this problem, we can use the concept of combinations. We need to find the total number of ways to pick four letters from the given set, and then find the number of ways to pick two h and two c.

Total number of ways to pick four letters from the given set: 
There are 17 letters in the set, and we need to pick four of them. The total number of ways to do this is given by the combination formula:

C(17, 4) = 17! / (4! * (17 - 4)!) = 2380

Number of ways to pick two h and two c: 
There are five h and twelve c in the set. We need to pick two h and two c. The number of ways to do this is given by the product of the number of ways to pick two h and the number of ways to pick two c:

C(5, 2) * C(12, 2) = 10 * 66 = 660

Probability of picking two h and two c: 
The probability of picking two h and two c is the ratio of the number of ways to pick two h and two c to the total number of ways to pick four letters:

P(2 h and 2 c) = 660 / 2380 = 0.277

However, the question asks for the probability of picking two h and two c when four letters are picked without replacement. This means that we need to subtract the probability of picking two h and two c from the probability of picking four letters without replacement.

Probability of picking four letters without replacement: 
The probability of picking four letters without replacement is given by the product of the probabilities of picking each letter:

P(4 letters without replacement) = 17/17 * 16/16 * 15/15 * 14/14 = 1

Probability of picking two h and two c when four letters are picked without replacement: 
The probability of picking two h and two c when four letters are picked without replacement is the difference between the probability of picking four letters without replacement and the probability of picking two h and two c:

P(2 h and 2 c when 4 letters are picked without replacement) = P(4 letters without replacement) - P(2 h and 2 c) = 1 - 0.277 = 0.723

Therefore, the probability of picking two h and two c when four letters are picked without replacement from {h: 5, c: 12} is 0.723.<eos>

In [None]:

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation",
        response="",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs,  streamer=streamer, max_new_tokens = 1024, use_cache = True)
display(Markdown(tokenizer.batch_decode(outputs)[0]))

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation 

Response: 
To find the optimal values of a, b, and c in terms of autocorrelation, we can use the following steps:

1. Define the mean square error (MSE) as the expected value of the squared difference between the predicted value and the observed value:

MSE = E[(𝑋𝑡+𝑇̂ − 𝑋𝑡+𝑇)^2]

2. Substitute the stationary predictive model into the MSE expression:

MSE = E

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation 

Response: 
To find the optimal values of a, b, and c in terms of autocorrelation, we can use the following steps:

1. Define the mean square error (MSE) as the expected value of the squared difference between the predicted value and the observed value:

MSE = E[(𝑋𝑡+𝑇̂ − 𝑋𝑡+𝑇)^2]

2. Substitute the stationary predictive model into the MSE expression:

MSE = E[(𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 − 𝑋𝑡+𝑇)^2]

3. Expand the squared term and simplify:

MSE = E[(𝑎^2𝑋𝑡−1^2 + 𝑏^2𝑋𝑡−3^2 + 𝑐^2𝑍𝑡^2 + 2𝑎𝑏𝑋𝑡−1𝑋𝑡−3 + 2𝑎𝑐𝑋𝑡−1𝑍𝑡 + 2𝑏𝑐𝑋𝑡−3𝑍𝑡 − 2𝑎𝑋𝑡−1𝑋𝑡+𝑇 − 2𝑏𝑋𝑡−3𝑋𝑡+𝑇 − 2𝑐𝑍𝑡𝑋𝑡+𝑇 + 𝑋𝑡+𝑇^2)]

4. Take the expected value of each term:

MSE = 𝑎^2E[𝑋𝑡−1^2] + 𝑏^2E[𝑋𝑡−3^2] + 𝑐^2E[𝑍𝑡^2] + 2𝑎𝑏E[𝑋𝑡−1𝑋𝑡−3] + 2𝑎𝑐E[𝑋𝑡−1𝑍𝑡] + 2𝑏𝑐E[𝑋𝑡−3𝑍𝑡] − 2𝑎E[𝑋𝑡−1𝑋𝑡+𝑇] − 2𝑏E[𝑋𝑡−3𝑋𝑡+𝑇] − 2𝑐E[𝑍𝑡𝑋𝑡+𝑇] + E[𝑋𝑡+𝑇^2]

5. Use the fact that the stationary predictive model is unbiased, i.e., E[𝑋𝑡+𝑇̂] = E[𝑋𝑡+𝑇], to simplify the expression:

MSE = 𝑎^2E[𝑋𝑡−1^2] + 𝑏^2E[𝑋𝑡−3^2] + 𝑐^2E[𝑍𝑡^2] + 2𝑎𝑏E[𝑋𝑡−1𝑋𝑡−3] + 2𝑎𝑐E[𝑋𝑡−1𝑍𝑡] + 2𝑏𝑐E[𝑋𝑡−3𝑍𝑡] − 2𝑎E[𝑋𝑡−1𝑋𝑡+𝑇] − 2𝑏E[𝑋𝑡−3𝑋𝑡+𝑇] − 2𝑐E[𝑍𝑡𝑋𝑡+𝑇] + E[𝑋𝑡+𝑇^2]

6. Use the fact that the stationary predictive model is unbiased, i.e., E[𝑋𝑡+𝑇̂] = E[𝑋𝑡+𝑇], to simplify the expression:

MSE = 𝑎^2E[𝑋𝑡−1^2] + 𝑏^2E[𝑋𝑡−3^2] + 𝑐^2E[𝑍𝑡^2] + 2𝑎𝑏E[𝑋𝑡−1𝑋𝑡−3] + 2𝑎𝑐E[𝑋𝑡−1𝑍𝑡] + 2𝑏𝑐E[𝑋𝑡−3𝑍𝑡] − 2𝑎E[𝑋𝑡−1𝑋𝑡+𝑇] − 2𝑏E[𝑋𝑡−3𝑋𝑡+𝑇] − 2𝑐E[𝑍𝑡𝑋𝑡+𝑇] + E[𝑋𝑡+𝑇^2]

7. Use the fact that the stationary predictive model is unbiased, i.e., E[𝑋𝑡+𝑇̂] = E[𝑋𝑡+𝑇], to simplify the expression:

MSE = 𝑎^2E[𝑋𝑡−1^2] + 𝑏^2E[𝑋𝑡−3^2] + 𝑐^2E[𝑍𝑡^2] + 2𝑎𝑏E[𝑋𝑡−1𝑋𝑡−3] + 2𝑎𝑐E[𝑋𝑡−1𝑍𝑡] + 2𝑏𝑐E[𝑋𝑡−3𝑍𝑡] − 2𝑎E[𝑋𝑡−1𝑋𝑡+𝑇] − 2𝑏E[𝑋𝑡−3𝑋𝑡+𝑇] − 2𝑐E[𝑍𝑡𝑋𝑡+𝑇] + E[𝑋𝑡+𝑇^2]

8. Use the fact that the stationary predictive model is unbiased, i.e., E[𝑋𝑡+𝑇̂] = E[𝑋𝑡+𝑇], to simplify the expression:

MSE = 𝑎^2E[𝑋𝑡−1^2] + 𝑏^2E[𝑋𝑡−3^2] + 𝑐^2E[𝑍𝑡^2] + 2𝑎𝑏E[𝑋𝑡−1𝑋𝑡−3] + 2𝑎𝑐E

In [None]:

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Let i(s) = -1214*s**2 - 3*s + 301 Let p(n) = -2429*n**2 - 7*n + 600. Let m(r) = -5*i(s) + 2*p(n) What is the second derivative of m(r) with respect to r",
        response="",
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs,  streamer=streamer,max_new_tokens = 1024, use_cache = True)
display(Markdown(tokenizer.batch_decode(outputs)[0]))

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Let i(s) = -1214*s**2 - 3*s + 301 Let p(n) = -2429*n**2 - 7*n + 600. Let m(r) = -5*i(s) + 2*p(n) What is the second derivative of m(r) with respect to r 

Response: 
The second derivative of m(r) with respect to r is -1214*i(s) - 3*p(n) + 301.<eos>


<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Let i(s) = -1214*s**2 - 3*s + 301 Let p(n) = -2429*n**2 - 7*n + 600. Let m(r) = -5*i(s) + 2*p(n) What is the second derivative of m(r) with respect to r 

Response: 
The second derivative of m(r) with respect to r is -1214*i(s) - 3*p(n) + 301.<eos>

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 4, # used lora rank of 4
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",  #layers to adjust weights
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.10.7 patched 42 layers with 42 QKV layers, 42 O layers and 42 MLP layers.



### Data Prep


In [None]:
# template for formatting data used to finetune the model

model_template='''
<bos><start_of_turn>user
{problem}  <end_of_turn>
<start_of_turn>model {response}
'''

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    problems = examples["problem"]
    solutions = examples["solution"]
    texts = []
    for problem,solution in zip(problems,solutions):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = model_template.format(problem=problem,response=solution) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("AI-MO/NuminaMath-CoT", split = "train")  #Numina Math Data
dataset = dataset.map(formatting_prompts_func, batched = True,)

README.md:   0%|          | 0.00/2.72k [00:00<?, ?B/s]

train-00000-of-00005.parquet:   0%|          | 0.00/247M [00:00<?, ?B/s]

train-00001-of-00005.parquet:   0%|          | 0.00/247M [00:00<?, ?B/s]

train-00002-of-00005.parquet:   0%|          | 0.00/247M [00:00<?, ?B/s]

train-00003-of-00005.parquet:   0%|          | 0.00/247M [00:00<?, ?B/s]

train-00004-of-00005.parquet:   0%|          | 0.00/247M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/166k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/859494 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/859494 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer).

In [None]:
#split data to train and valiadtion
from sklearn.model_selection import train_test_split

train_valid_data=dataset.train_test_split(test_size=0.2, seed=42)
test_data=train_valid_data['test']
train_data=train_valid_data['train']

In [None]:
len(train_data), len(test_data) #check size

(687595, 171899)

In [None]:
# select a subset from both partitions
train_subset=train_data.select(range(100000))
test_subset=test_data.select(range(5000))

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# setup the Trainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_subset,
    eval_dataset = test_subset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 10,
        learning_rate = 5e-5,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "wandb", # Use this for WandB etc

        eval_accumulation_steps = 4, # Adjust this value as needed

        per_device_eval_batch_size = 1,
    ),
)

Map (num_proc=2):   0%|          | 0/100000 [00:00<?, ? examples/s]

Map (num_proc=2):   0%|          | 0/5000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
7.582 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 10
 "-____-"     Number of trainable parameters = 13,504,512


**** Unsloth: Please use our fixed gradient_accumulation_steps by updating transformers, TRL and Unsloth!
`pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git`


[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


AUTOTUNE bmm(32x519x256, 32x256x519)
  triton_bmm_52 0.1362 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=4, num_warps=8
  triton_bmm_43 0.1454 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_48 0.1556 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=4, num_warps=8
  triton_bmm_44 0.1597 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_51 0.1618 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=3, num_warps=4
  triton_bmm_41 0.1659 m

Step,Training Loss
1,1.1697
2,1.2603
3,1.1298
4,1.3467
5,1.6067
6,1.0834
7,1.293
8,1.0559
9,0.9226
10,0.9673


The loss is being minimized as the number of eppoch increase meaning the model is perfoming as expected

In [None]:

# Evaluate the model's performance
eval_results = trainer.evaluate()
print(f"Evaluation results: {eval_results}")

# Access specific metrics
accuracy = eval_results.get("eval_loss", None)  # Replace "eval_accuracy" with your actual metric name
if accuracy:
  print(f"Accuracy: {accuracy}")
else:
  print("Accuracy metric not found in evaluation results.")

AUTOTUNE bmm(16x270x256, 16x256x270)
  triton_bmm_157 0.0287 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_158 0.0307 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_162 0.0338 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=4, num_warps=8
  bmm 0.0348 ms 82.4% 
  triton_bmm_154 0.0358 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=5, num_warps=8
  triton_bmm_155 0.0358 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=5, num_warps=

AUTOTUNE bmm(16x256x256, 16x256x256)
  bmm 0.0195 ms 100.0% 
  triton_bmm_196 0.0215 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_200 0.0236 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=4, num_warps=8
  triton_bmm_195 0.0246 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=16, BLOCK_M=64, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=2, num_warps=4
  triton_bmm_203 0.0246 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=3, num_warps=4
  triton_bmm_199 0.0256 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=False, GROUP_M=8, num_stages=3, num_warp

Evaluation results: {'eval_loss': 1.010375738143921, 'eval_runtime': 960.2512, 'eval_samples_per_second': 5.207, 'eval_steps_per_second': 5.207, 'epoch': 0.0016}
Accuracy metric not found in evaluation results.


Also eval loss is close to one indicating model response is close to the ground truth

In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

220.7955 seconds used for training.
3.68 minutes used for training.
Peak reserved memory = 15.295 GB.
Peak reserved memory for training = 7.713 GB.
Peak reserved memory % of max memory = 38.659 %.
Peak reserved memory for training % of max memory = 19.495 %.


<a name="Inference"></a>
### Inference after finetuning


In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="What is prob of picking 2 h and 2 c when four letters picked without replacement from {h: 5, c: 12}", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: What is prob of picking 2 h and 2 c when four letters picked without replacement from {h: 5, c: 12} 

Response: 
The probability of picking 2 h and 2 c when four letters are picked without replacement from {h: 5, c: 12} can be calculated using the concept of combinations.

Step 1: Determine the total number of ways to pick four letters from the given set.
There are a total of 5 + 12 = 17 letters in the set. We need to pick four letters without replacement, so the total number of ways to pick four letters is:

C(17, 4) = 17! / (4! * (17 - 4)!) = 17! / (4! * 13!) = 2380

Step 2: Determine the number of ways to pick 2 h and 2 c.
We need to pick 2 h and 2 c from the given set. The number of ways to pick 2 h from 5 h is:

C(5, 2) = 5! / (2! * (5 - 2)!) = 5! / (2! * 3!) = 10

The number of ways to pick 2 c from 12 c is:

C(12, 2) = 12! / (2! * (12 - 2)!) = 12! / (2! * 10!) = 66

Step 3: Calculate the probability.
The probability of picking 2 h and 2 c is the ratio of the number of ways to pick 2 h and 2 c to the total number of ways to pick four letters:

P(2 h and 2 c) = (C(5, 2) * C(12, 2)) / C(17, 4) = (10 * 66) / 2380 = 660 / 2380 = 0.276

Therefore, the probability of picking 2 h and 2 c when four letters are picked without replacement from {h: 5, c: 12} is approximately 0.276 or 27.6%.<eos>

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Find the ordered pair  (𝑥,𝑦)that satisfies the equations  7𝑥−50𝑦=3and  3𝑦−𝑥=5", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Find the ordered pair  (𝑥,𝑦)that satisfies the equations  7𝑥−50𝑦=3and  3𝑦−𝑥=5 

Response: 
To find the ordered pair (x,y) that satisfies the equations 7x - 50y = 3 and 3y - x = 5, we can use the method of substitution.

Step 1: Solve the second equation for x.
3y - x = 5
x = 3y - 5

Step 2: Substitute the expression for x into the first equation.
7x - 50y = 3
7(3y - 5) - 50y = 3

Step 3: Simplify and solve for y.
21y - 35 - 50y = 3
-29y = 38
y = -38/29

Step 4: Substitute the value of y back into the expression for x.
x = 3y - 5
x = 3(-38/29) - 5
x = -114/29 - 145/29
x = -259/29

Step 5: Write the ordered pair (x,y).
(x,y) = (-259/29, -38/29)

Therefore, the ordered pair that satisfie

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum?", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum? 

Response: 
Sean's sum is the sum of all odd integers from 1 to 499, inclusive. We can use the formula for the sum of an arithmetic series to find this sum:

Sum = (n/2) * (first term + last term)

In this case, n = 250 (since there are 250 odd integers from 1 to 499), the first term is 1, and the last term is 499. Plugging these values into the formula, we get:

Sum = (250/2) * (1 + 499) = 125 * 500 = 62500

Julie's sum is the sum of all integers from 1 to 300, inclusive. We can use the same formula for the sum of an arithmet

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation 

Response: 
To find the optimal values of a, b, and c in terms of autocorrelation, we can use the following steps:

1. Define the mean square error (MSE) between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇:

MSE = E[(𝑋𝑡+𝑇̂ − 𝑋𝑡+𝑇)^2]

2. Substitute the stationary predictive model into the MSE expression:

MSE = E[(𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 − 𝑋𝑡+

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt_template.format(
        problem="Let i(s) = -1214*s**2 - 3*s + 301 Let p(n) = -2429*n**2 - 7*n + 600. Let m(r) = -5*i(s) + 2*p(n) What is the second derivative of m(r) with respect to r", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Let i(s) = -1214*s**2 - 3*s + 301 Let p(n) = -2429*n**2 - 7*n + 600. Let m(r) = -5*i(s) + 2*p(n) What is the second derivative of m(r) with respect to r 

Response: 
The second derivative of m(r) with respect to r is: 
m''(r) = -1214*i''(s) - 2429*p''(n) 
To find i''(s), we need to differentiate i(s) twice: 
i'(s) = -2428*s - 3 
i''(s) = -2428 
To find p''(n), we need to differentiate p(n) twice: 
p'(n) = -4858*n - 7 
p''(n) = -4858 
Substituting these values into the expression for m''(r), we get: 
m''(r) = -1214*(-2428) - 2429*(-4858) 
m''(r) = 2941152 + 11851142 
m''(r) = 14792294 
Therefore, the second derivative of m(r) with respect to r is 14792294.<eos>


<a name="Save"></a>
### Saving lora adapters


In [None]:
# Local saving
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")

In [None]:
from google.colab import userdata
hf_token=userdata.get('HF_TOKEN') # huggingface token with write priviledges

In [None]:
!huggingface-cli login --token {hf_token}  # login to hugging face using the token
!huggingface-cli repo create unsloth-gemma2-9b-version3-100k  # create a repository to store the lora adapters

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful
[90mgit version 2.34.1[0m
[90mgit-lfs/3.0.2 (GitHub; linux amd64; go 1.18.1)[0m

You are about to create [1mKoomemartin/unsloth-gemma2-9b-version3-100k[0m
Proceed? [Y/n] y

Your repo now lives at:
  [1mhttps://huggingface.co/Koomemartin/unsloth-gemma2-9b-version3-100k[0m

You can clone it locally with the command below, and commit/push as usual.

  git clone https://huggingface.co/Koomemartin/unsloth-gemma2-9b-version3-100k



In [None]:
from huggingface_hub import upload_folder

#upload the lora adapters saved locally using the upload_folder function

upload_folder(
    repo_id="Koomemartin/unsloth-gemma2-9b-version3-100k",
    folder_path="/content/lora_model",
    commit_message="Uploaded Finetuned Gemma2-9b Lora Adapters ",
    token=hf_token
)

  0%|          | 0/3 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/54.1M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Koomemartin/unsloth-gemma2-9b-version3-100k/commit/7c14587679c484cfe6af67e4fc416cf09aa92e36', commit_message='Uploaded Finetuned Gemma2-9b Lora Adapters ', commit_description='', oid='7c14587679c484cfe6af67e4fc416cf09aa92e36', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
!zip -r lora_model.zip /content/lora_model  # zip the model and download all the files in the saved dir I uploaded this files to kaggle models

  adding: content/lora_model/ (stored 0%)
  adding: content/lora_model/tokenizer.model (deflated 51%)
  adding: content/lora_model/README.md (deflated 66%)
  adding: content/lora_model/adapter_model.safetensors (deflated 8%)
  adding: content/lora_model/tokenizer.json (deflated 72%)
  adding: content/lora_model/adapter_config.json (deflated 54%)
  adding: content/lora_model/special_tokens_map.json (deflated 76%)
  adding: content/lora_model/tokenizer_config.json (deflated 96%)


## Load the finetuned Model

In [None]:
!pip install -q --upgrade bitsandbytes


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM ,AutoTokenizer

# load the configurations ,base model and the loara adapters from the hugging face repository created above
config = PeftConfig.from_pretrained("Koomemartin/unsloth-gemma2-9b-version3-100k")
base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-2-9b-bnb-4bit",config=config,device_map='cuda:0')

peftmodel = PeftModel.from_pretrained(base_model, "Koomemartin/unsloth-gemma2-9b-version3-100k",config=config)
tokenizer = AutoTokenizer.from_pretrained("Koomemartin/unsloth-gemma2-9b-version3-100k")

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


model.safetensors:   0%|          | 0.00/6.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/54.1M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/46.4k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

In [None]:
inputs = tokenizer(
[
    prompt_template.format(
        problem="Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum?", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = peftmodel.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Sean adds up all the odd integers from 1 to 499, inclusive. Julie adds up all the integers from 1 to 300, inclusive. What is Sean's sum divided by Julie's sum? 

Response: 
Sean's sum is the sum of all odd integers from 1 to 499, inclusive. We can use the formula for the sum of an arithmetic series to find Sean's sum:

Sum = (n/2) * (first term + last term)

In this case, n = 250 (since there are 250 odd integers from 1 to 499), the first term is 1, and the last term is 499. Plugging these values into the formula, we get:

Sean's sum = (250/2) * (1 + 499) = 125 * 500 = 62500

Julie's sum is the sum of all integers from 1 to 300, inclusive. We can use the same formula for the sum of an

In [None]:

inputs = tokenizer(
[
    prompt_template.format(
        problem="Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation", # instruction
        response="",
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = peftmodel.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<bos>
You are a math assistant. Answer the following math problem with a detailed, step-by-step solution. Be clear and concise in each step. If there are multiple approaches, select the most efficient method. Include any formulas or key concepts used, and provide the final answer at the end.

Instruction: Consider the stationary predictive model: 𝑋𝑡+𝑇̂=𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 where 𝑋𝑡+𝑇̂ denotes the estimated value of 𝑋𝑡+𝑇 at T time steps in the future. By minimising the expected mean square error between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇, Find the optimal values of a, b and c in terms of autocorrelation 

Response: 
To find the optimal values of a, b, and c in terms of autocorrelation, we can use the following steps:

1. Define the mean square error (MSE) between the predicted estimated value 𝑋𝑡+𝑇̂ and the observed value 𝑋𝑡+𝑇:

MSE = E[(𝑋𝑡+𝑇̂ − 𝑋𝑡+𝑇)^2]

2. Substitute the stationary predictive model into the MSE expression:

MSE = E[(𝑎𝑋𝑡−1 + 𝑏𝑋𝑡−3 + 𝑐𝑍𝑡 − 𝑋𝑡+

**Noted**: The peft model from hugging face responds with the same solution as the finetuned model before saving