## Tech Challenge - Fase 3

Este modelo de LLM √© capaz de responder perguntas sobre produtos da Amazon, com base em um conjunto de dados de t√≠tulos e descri√ß√µes de produtos da base de dados AmazonTitles-1.3MM.

Este modelo foi preparado utilizando o modelo DeepSeek-R1-Distill-Llama-8B, que √© um modelo de linguagem pr√©-treinado com 8 bilh√µes de par√¢metros. A base de dados utilizada foi tratada para que o modelo pudesse entender as perguntas e responder de forma coerente.

Os dados de treinamento est√£o no seguinte formato:

```
{
    "Question": "Pergunta do usu√°rio",
    "Complex_CoT": "Resoning do modelo",
    "Response": "Resposta do modelo"
}
```

Ap√≥s o treinamento, ao ser perguntado sobre um produto, o modelo deve responder de forma coerente e clara, utilizando a l√≥gica e o racioc√≠nio (reasoning) para chegar a uma resposta.


## Bibliotecas utilizadas no projeto

- `unsloth`: Ferramenta para realizar o fine-tuning do LLM ‚Äî O que foi utilizado:
    - `FastLanguageModel` module para otimizar inference e fine-tuning
    - `get_peft_model` Permite habilitar o LoRA (Low-Rank Adaptation) fine-tuning
- `peft`: Suporte ao LoRA-based fine-tuning para LLMs.
- Modules do Hugging FAce:
    - `transformers` Para utilizar nos dados do fine-tuning data e outras tarefas do modelo
    - `trl` Habilita o Supervised Fine-Tuning
    - `datasets` Essa biblioca permite baixar datasets que est√£o dispon√≠veis no  Hugging Face Hub, neste caso, foi utilizado um dataset criado especificamente para esse modelo: https://huggingface.co/datasets/rickwalking/amazon-titles-reasoning
- `torch`: Deep learning framework utilizado no treino
- `wandb`: Realizar tracking do fine-tuning

## Instalando as as depend√™ncias do projeto

In [None]:
%%capture

!pip install unsloth # installa o unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git # Instalar a vers√£o mais recente do Unsloth!

## Realizando o import dos modules necess√°rios para o fine-tuning

In [None]:
# Modules para o fine-tuning
from unsloth import FastLanguageModel
import torch # Import PyTorch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
from unsloth import is_bfloat16_supported # Checks if the hardware supports bfloat16 precision
# Hugging Face modules
from huggingface_hub import login # Lets you login to API
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset # Lets you load fine-tuning datasets
# Import weights and biases
import wandb

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!


## Verificando a disponibilidade de CUDA
Ap√≥s instalar as depend√™ncias, √© necess√°rio verificar a disponibilidade de CUDA.


In [None]:
print(torch.__version__)
print(torch.cuda.is_available())

2.6.0+cu124
True


## Carregar o DeepSeek R1 e o Tokenizer

Nesta etapa, ser√° realizado o download do DeepSeek R1 e o tokenizer, usando o `FastLanguageModel.from_pretrained()`.

**Key parameters**
```py
max_seq_length = 2048  # N√∫mero m√°ximo de tokens por input
dtype = None  # Data type default (auto detect)
load_in_4bit = True  # Habilita o  4-bit quantization ‚Äì uma otimiza√ß√£o de mem√≥ria
```

**Sobre o  4-bit quantization**

√â como comprimir uma imagem de resolu√ß√£o alta para um tamanho menor. O resultado ser√° uma imagem menor, ocupando menos espa√ßo, mas com a mesma qualidade anterior. Essa abordagem realiza um diminui√ß√£o no tamanho do modelo, sem perder a qualidade e precis√£o do LLM.


In [None]:
# Par√¢metros para o modelo
max_seq_length = 2048
dtype = None
load_in_4bit = True

# Carregar o DeepSeek R1 e o tokenizer usando o unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

==((====))==  Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

## Testando o  DeepSeek R1 com uma pergunta de um cliente antes do fine-tuning


### System prompt
O system promp √© definido e inclu√≠ seus placeholders para a pergunta e a resposta do modelo. Este prompt ir√° guiar o modelo a "pensar" passo a passo e providenciar uma resposta correta.

In [None]:
# Define a system prompt under prompt_style
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a Amazon products expert with advanced knowledge in books, cellphones, eletronics, media, and more.
Please answer the customer question.

### Question:
{}

### Response:
<think>{}"""

### Rodando o modelo

Neste passo, iremos testar o DeepSeek R1 Model com uma pergunta de um cliente da Amazon, e printar a resposta.
Este processo tem os seguintes passos:

1. **Definir a pergunta**
2. **Formatar a pergunta seguindo o modelo de prompt (`prompt_style`)**
3. **Realizar a tokeniza√ß√£o e enviar para a  GPU (`cuda`)**
4. **Gerar resposta usando o modelo LLM**,
5. **Realizar o decode do output**

In [None]:
# Enviando uma pergunta relacionada a um produto
question = """Is 'Worship with Don Moen' available on VHS??"""

FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!

# Formatar a pergunta usando o structured prompt (`prompt_style`) e depois tokenizar
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Gerar a resposta
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

# REalizar decode dos tokens de output para linguagem natural
response = tokenizer.batch_decode(outputs)

# Printar a resposta do modelo (after "### Response:")
print(response[0].split("### Response:")[1])


<think>
Okay, so I need to figure out if "Worship with Don Moen" is available on VHS. First, I should consider what VHS is. VHS, or Video Home System, was a popular format for video cassettes before DVD and Blu-ray. Many older movies and videos were released on VHS, but with the rise of digital formats, VHS has become less common.

Now, the question is about a specific product: "Worship with Don Moen." I don't recognize this title immediately, so I should think about what it might be. "Worship" suggests it's related to religious or spiritual content, possibly a video or a series of videos featuring Don Moen, who might be a musician, speaker, or religious leader.

I need to determine if such a product exists on VHS. Since VHS is a physical medium, I should check sources where VHS tapes are sold. Online marketplaces like eBay often have listings for VHS tapes. I can search there using the title "Worship with Don Moen" to see if any listings come up.

Alternatively, I can check if the pr

## Passo a Passo do fine-tuning

### Passo 1 ‚Äî Atualizar o system prompt
Uma pequena modifica√ß√£o no estilo de prompt para processar o dataset, adicionando um terceiro placeholder relacionado ao reasoning. `</think>`

In [None]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a Amazon products expert with advanced knowledge in books, cellphones, eletronics, media, and more.
Please answer the customer question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""


### Passo 2 ‚Äî Carregamento do dataset

Este Dataset √© carregado para realizar o fine-tuning do modelo.
O dataset foi criado a partir de um conjunto de dados de t√≠tulos e descri√ß√µes de produtos da base de dados AmazonTitles-1.3MM, adaptado para o formato de treinamento deste modelo de LLM, o DeepSeek-R1-Distill-Llama-8B.

A gera√ß√£o das perguntas, reasoning e respostas foi gerado com a utiliza√ß√£o de script e um prompt espec√≠fico para o DeepSeek, assim gerando um dataset com 2000 exemplos.
Este detaset est√° hospoedado no Hugging Face: https://huggingface.co/datasets/rickwalking/amazon-titles-reasoning

Abaixo √© poss√≠vel visualizar as primeiras linhas do dataset:

In [None]:
dataset = load_dataset("rickwalking/amazon-titles-reasoning", "default", split="train[0:2000]", trust_remote_code=True)
dataset

Repo card metadata block was not found. Setting CardData to empty.


amazon_titles_reasoning.json:   0%|          | 0.00/785k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1107 [00:00<?, ? examples/s]

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 1107
})

In [None]:
# Mostrar um objeto do dataset
dataset[1]

{'Question': "What should I type into Amazon to find 'Rightly Dividing the Word' paperback edition?",
 'Complex_CoT': "To locate a specific book on Amazon, users typically enter the exact title. Since 'Rightly Dividing the Word' is a distinct title and refers to a paperback edition, typing this title into the search bar will effectively find the desired product. This method works because Amazon's search function prioritizes exact matches when searching by title.",
 'Response': "Type 'Rightly Dividing the Word' into the Amazon search bar."}

In [None]:
EOS_TOKEN = tokenizer.eos_token  # Definindo o EOS_TOKEN para ser adicionado ao final do texto formatado
EOS_TOKEN

'<ÔΩúend‚ñÅof‚ñÅsentenceÔΩú>'

In [None]:
# Definindo fun√ß√£o para formata√ß√£o do prompt
def formatting_prompts_func(batch):  # Recebe batches do dataset
    inputs = batch["Question"]       # Extrai as perguntas
    cots = batch["Complex_CoT"]      # Extrai o chain-of-thought reasoning
    outputs = batch["Response"]      # Extrai a resposta do modelo

    texts = []

    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)

    return {
        "text": texts,
    }

In [None]:
# Update dataset formatting
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)
dataset_finetune["text"][0]

Map:   0%|          | 0/1107 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a Amazon products expert with advanced knowledge in books, cellphones, eletronics, media, and more.\nPlease answer the customer question.\n\n### Question:\nWhat type of story is Autumn Story Brambly Hedge?\n\n### Response:\n<think>\nThe product title 'Autumn Story Brambly Hedge' suggests it is likely a storybook, given the mention of 'Brambly Hedge,' which refers to a children's book series featuring hedgehogs. The context provided mentions it being a research-rich fantasy aimed at small children. Combining these elements, the reasoning leads to the conclusion that it is a fantasy storybook for children.\n</think>\nAutumn Story Brambly Hedge is a research-rich f

### Passo 3 ‚Äî Configurando o modelo usando LoRA

**Uma explica√ß√£o intuitiva do LoRA**
Os modelos de linguagem (LLMs) t√™m milh√µes ou at√© bilh√µes de weights que determinam como eles processam e geram texto. Ao fazer o fine-tuning de um modelo, geralmente atualizamos todos esses pesos, o que requer enormes recursos computacionais e mem√≥ria.

O LoRA (Low-Rank Adaptation - Adapta√ß√£o de Baixo Posto) permite fazer fine-tuning de forma eficiente:

- Em vez de modificar todos os pesos, o LoRA adiciona pequenos adaptadores trein√°veis em camadas espec√≠ficas.
- Esses adaptadores capturam conhecimento espec√≠fico da tarefa enquanto mant√™m o modelo original inalterado.
- Isso reduz o n√∫mero de par√¢metros trein√°veis em mais de 90%, tornando o fine-tuning mais r√°pido e eficiente em termos de mem√≥ria.
- Pense em um LLM como uma f√°brica complexa. Em vez de reconstruir toda a f√°brica para produzir um novo produto, o LoRA adiciona pequenas ferramentas especializadas √†s m√°quinas existentes. Isso permite que a f√°brica se adapte rapidamente sem interromper sua estrutura central.
- Abaixo, usaremos a fun√ß√£o `get_peft_model()`, que significa Parameter-Efficient Fine-Tuning (Fine-Tuning Eficiente em Par√¢metros) ‚Äî esta fun√ß√£o envolve o modelo base (model) com modifica√ß√µes LoRA, garantindo que apenas par√¢metros espec√≠ficos sejam treinados.

In [None]:
# Applicando o LoRA
deep_seek_model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Rank do LoRA: Determina o tamanho dos adaptadores trein√°veis (maior = mais par√¢metros, menor = mais efici√™ncia)
    target_modules=[  # Lista de camadas do transformer onde os adaptadores LoRA ser√£o aplicados
        "q_proj",   # Proje√ß√£o de Query no mecanismo de self-attention
        "k_proj",   # Proje√ß√£o de Key no mecanismo de self-attention
        "v_proj",   # Proje√ß√£o de Value no mecanismo de self-attention
        "o_proj",   # Proje√ß√£o de sa√≠da da camada de aten√ß√£o
        "gate_proj",  # Usado nas camadas feed-forward (MLP)
        "up_proj",    # Parte da rede feed-forward do transformer (FFN)
        "down_proj",  # Outra parte da rede feed-forward do transformer (FFN)
    ],
    lora_alpha=16,  # Fator de escala para atualiza√ß√µes LoRA (valores mais altos permitem maior influ√™ncia das camadas LoRA)
    lora_dropout=0,  # Taxa de dropout para camadas LoRA (0 significa sem dropout, reten√ß√£o total de informa√ß√£o)
    bias="none",  # Especifica se as camadas LoRA devem aprender termos de vi√©s (definir como "none" economiza mem√≥ria)
    use_gradient_checkpointing="unsloth",  # Economiza mem√≥ria recalculando ativa√ß√µes em vez de armazen√°-las (recomendado para fine-tuning com contexto longo)
    random_state=3407,  # Define uma semente para reprodutibilidade, garantindo o mesmo comportamento de fine-tuning em diferentes execu√ß√µes
    use_rslora=False,  # Se deve usar LoRA com Rank Estabilizado (desativado aqui, significando que LoRA de rank fixo √© usado)
    loftq_config=None,  # Quantiza√ß√£o de Fine-Tuning de Baixo Bit (LoFTQ) est√° desativada nesta configura√ß√£o
)

Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Inicializando `SFTTrainer`, para um supervised fine-tuning trainer.

In [None]:
# Inicializa o treinador de fine-tuning ‚Äî Importado usando from trl import SFTTrainer
trainer = SFTTrainer(
    model=deep_seek_model,  # O modelo a ser fine-tunado
    tokenizer=tokenizer,  # Tokenizador para processar entradas de texto
    train_dataset=dataset_finetune,  # Dataset usado para treinamento
    dataset_text_field="text",  # Especifica qual campo no dataset cont√©m o texto de treinamento
    max_seq_length=max_seq_length,  # Define o comprimento m√°ximo de sequ√™ncia para entradas
    dataset_num_proc=2,  # Usa 2 threads da CPU para acelerar o pr√©-processamento de dados

    # Define argumentos de treinamento
    args=TrainingArguments(
        per_device_train_batch_size=2,  # N√∫mero de exemplos processados por dispositivo (GPU) por vez
        gradient_accumulation_steps=4,  # Acumula gradientes por 4 passos antes de atualizar os pesos
        num_train_epochs=2, # Execu√ß√£o completa de fine-tuning
        warmup_steps=5,  # Aumenta gradualmente a taxa de aprendizado nos primeiros 5 passos
        # max_steps=60,  # Limita o treinamento a 60 passos (√∫til para depura√ß√£o; aumente para fine-tuning completo)
        learning_rate=2e-4,  # Taxa de aprendizado para atualiza√ß√µes de peso (ajustada para fine-tuning LoRA)
        fp16=not is_bfloat16_supported(),  # Usa FP16 (se BF16 n√£o for suportado) para acelerar o treinamento
        bf16=is_bfloat16_supported(),  # Usa BF16 se suportado (melhor estabilidade num√©rica em GPUs mais novas)
        logging_steps=10,  # Registra o progresso do treinamento a cada 10 passos
        optim="adamw_8bit",  # Usa otimizador AdamW eficiente em mem√≥ria no modo 8-bit
        weight_decay=0.01,  # Regulariza√ß√£o para prevenir overfitting
        lr_scheduler_type="linear",  # Usa um cronograma linear de taxa de aprendizado
        seed=3407,  # Define uma semente fixa para reprodutibilidade
        output_dir="outputs",  # Diret√≥rio onde os checkpoints do modelo fine-tunado ser√£o salvos
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1107 [00:00<?, ? examples/s]

## Passo 4 ‚Äî Treinando o modelo!

Aqui ser√° iniciado o fine-tuning do modelo.

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,107 | Num Epochs = 2 | Total steps = 276
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mrickwalking1272[0m ([33mrickwalking1272-n-a[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,2.1588
20,0.9654
30,0.9199
40,0.8459
50,0.8564
60,0.8605
70,0.8723
80,0.8372
90,0.8366
100,0.8175


In [None]:
# Finalizar as an√°lises do wandb
wandb.finish()

0,1
train/epoch,‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
train/grad_norm,‚ñà‚ñÑ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ
train/learning_rate,‚ñà‚ñà‚ñá‚ñá‚ñá‚ñá‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ
train/loss,‚ñà‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ

0,1
total_flos,2.439230660063232e+16
train/epoch,1.98917
train/global_step,276.0
train/grad_norm,0.61395
train/learning_rate,0.0
train/loss,0.6845
train_loss,0.81008
train_runtime,2111.9384
train_samples_per_second,1.048
train_steps_per_second,0.131


## Passo 5 ‚Äî Testar o modelo ap√≥s o fine-tuning

As respostas do modelo devem estar mais espec√≠ficas ao dataset que foi utilizado no treinamento

In [None]:
question = """Is 'Worship with Don Moen' available on VHS?"""

FastLanguageModel.for_inference(deep_seek_model)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = deep_seek_model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

# Resposta
print(response[0].split("### Response:")[1])


<think>
The product title is 'Worship with Don Moen,' and the context mentions that it is a VHS tape. Therefore, the answer would be yes.
</think>
Yes<ÔΩúend‚ñÅof‚ñÅsentenceÔΩú>


In [None]:
question = """What type of story is Autumn Story Brambly Hedge?"""

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = deep_seek_model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

# Resposta
print(response[0].split("### Response:")[1])


<think>
The product title is 'Autumn Story Brambly Hedge,' which suggests it is a narrative related to the Brambly Hedge series. The context mentions that it is a charming tale with illustrations by Polly Elkin and a foreword by Judith Kerr, who is known for her work on 'Tattykitly' and other children's books. Therefore, the question seeks to understand the nature of this story, likely inquiring if it is part of a series or if it follows a similar style to other works by Judith Kerr.
</think>
Autumn Story Brambly Hedge is a charming tale with illustrations by Polly Elkin and a foreword by Judith Kerr.<ÔΩúend‚ñÅof‚ñÅsentenceÔΩú>


In [None]:
question = """Is 'Mermaids: Nymphs of the Sea' available as a physical book with special features like vellum pages and metallic ink?"""

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = deep_seek_model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

# Resposta
print(response[0].split("### Response:")[1])


<think>
The product title suggests it is a book about mermaids, which are mythical creatures. The context mentions that the book is out of print or unavailable, indicating it might be a rare or special edition. The description highlights features such as vellum pages and metallic ink, which are often associated with high-quality or limited editions. Therefore, someone looking for this book would likely want to know if it's still available in physical form with those special features.
</think>
Yes, 'Mermaids: Nymphs of the Sea' is a physical book with special features including vellum pages and metallic ink, but it may be out of print or unavailable.<ÔΩúend‚ñÅof‚ñÅsentenceÔΩú>


## Passo 6 ‚Äî Salvar o modelo no Hugging Face

O modelo ser√° enviado para o Hugging Face

In [None]:
new_model_online = "rickwalking/DeepSeek-R1-Products"
deep_seek_model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)
deep_seek_model.push_to_hub_merged(new_model_online, tokenizer, save_method="merged_16bit")

README.md:   0%|          | 0.00/5.19k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/rickwalking/DeepSeek-R1-Products


Unsloth: You are pushing to hub, but you passed your HF username = rickwalking.
We shall truncate rickwalking/DeepSeek-R1-Products to DeepSeek-R1-Products
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 6.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.08 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 34%|‚ñà‚ñà‚ñà‚ñç      | 11/32 [00:00<00:01, 13.18it/s]
We will save to Disk and not RAM now.
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 32/32 [02:25<00:00,  4.55s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving DeepSeek-R1-Products/pytorch_model-00001-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Products/pytorch_model-00002-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Products/pytorch_model-00003-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Products/pytorch_model-00004-of-00004.bin...


  0%|          | 0/4 [00:00<?, ?it/s]

pytorch_model-00004-of-00004.bin:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

pytorch_model-00003-of-00004.bin:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

pytorch_model-00001-of-00004.bin:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00004.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/rickwalking/DeepSeek-R1-Products
