# Intro
In this notebook, we showcase how to fine-tune the Qwen3-1.7B model on AWS Trainium using the Hugging Face Optimum Neuron library.
The goal of this task is Text-to-SQL generation â€” training the model to translate natural language questions into executable SQL queries.

We will fine-tune the model using `optimum.neuron`, save the trained checkpoint, and then deploy it for inference with Optimum-Neuron[vllm], enabling high-performance, low-latency Text-to-SQL execution.

By the end of this notebook, youâ€™ll have a fine-tuned, Trainium-optimized Qwen3 model ready for deployment and real-time inference. This workflow demonstrates how to leverage the Optimum Neuron toolchain to efficiently train and serve large language models on AWS Neuron devices.

For this module, you will be using the [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) dataset which consists of thousands of examples of SQL schemas, questions about the schemas, and SQL queries intended to answer the questions.

*Dataset example 1:*
* *SQL schema/context:* `CREATE TABLE management (department_id VARCHAR); CREATE TABLE department (department_id VARCHAR)`
* *Question:* `How many departments are led by heads who are not mentioned?`
* *SQL query/answer:* `SELECT COUNT(*) FROM department WHERE NOT department_id IN (SELECT department_id FROM management)`

*Dataset example 2:*
* *SQL schema/context:* `CREATE TABLE courses (course_name VARCHAR, course_id VARCHAR); CREATE TABLE student_course_registrations (student_id VARCHAR, course_id VARCHAR)`
* *Question:* `What are the ids of all students for courses and what are the names of those courses?`
* *SQL query/answer:* `SELECT T1.student_id, T2.course_name FROM student_course_registrations AS T1 JOIN courses AS T2 ON T1.course_id = T2.course_id`

By fine-tuning the model over several thousand of these text-to-SQL examples, the model will then learn how to generate an appropriate SQL query when presented with a SQL context and a free-form question.

This text-to-SQL use case was selected so you can successfully fine-tune your model in a reasonably short amount of time (~25 minutes) which is appropriate for this workshop. Although this is a relatively simple use case, please keep in mind that the same techniques and components used in this module can also be applied to fine-tune LLMs for more advanced use cases such as writing code, summarizing documents, creating blog posts - the possibilities are endless!

# Install requirements
This notebook uses [Hugging Face Optimum Neuron](https://github.com/huggingface/optimum-neuron) which works like an interface between the Hugging Face Transformers library and AWS Accelerators including AWS Trainium and AWS Inferentia. We will also install some other libraries like peft, trl etc.


In [36]:
%cd /home/ubuntu/environment/FineTuning/HuggingFaceExample/01_finetuning/assets
%pip install -r requirements.txt

/home/ubuntu/environment/FineTuning/HuggingFaceExample/01_finetuning/assets


Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


# Fine-tuning

In this section, we fine-tune the Qwen3-1.7B model on the Text-to-SQL task using Hugging Face Optimum Neuron. Here are the parameters we are going to pass - 

1. `--nnodes`:	Number of nodes (1 = single node)
2. `--nproc_per_node`: 	Processes per node (usually equals number of devices).
3. `--model_id, --tokenizer_id`:	Model and tokenizer identifiers (from Hugging Face or local path).
4. `--output_dir`:	Directory for saving checkpoints and logs.
5. `--bf16`:	Enables bfloat16 precision for faster, memory-efficient training.
5. `--gradient_checkpointing`:	Saves memory by recomputing activations during backprop.
6. `--gradient_accumulation_steps`:	Steps to accumulate gradients before optimizer update.
7. `--learning_rate`:	Initial training learning rate.
8. `--max_steps`:	Total number of training steps.
9. `--per_device_train_batch_size`:	Batch size per device.
10. `--tensor_parallel_size`:	Number of devices for tensor parallelism.
11. `--lora_r, --lora_alpha, --lora_dropout`:	LoRA hyperparameters â€” rank, scaling, and dropout rate.
12. `--dataloader_drop_last`:	Drops last incomplete batch.
13. `--disable_tqdm`: Disables progress bar.
14. `--logging_steps`:	Log interval (in steps).

In [37]:
!torchrun \
  --nnodes 1 \
  --nproc_per_node 1 \
  finetune_model.py \
  --model_id Qwen/Qwen3-1.7B \
  --tokenizer_id Qwen/Qwen3-1.7B \
  --output_dir ~/environment/ml/qwen \
  --bf16 True \
  --gradient_checkpointing True \
  --gradient_accumulation_steps 1 \
  --learning_rate 5e-5 \
  --max_steps 1000 \
  --per_device_train_batch_size 2 \
  --tensor_parallel_size 2 \
  --lora_r 16 \
  --lora_alpha 32 \
  --lora_dropout 0.05 \
  --dataloader_drop_last True \
  --disable_tqdm True \
  --logging_steps 10

  from .mappings import (
  from .mappings import (
  from .mappings import (
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
2025-11-09 00:48:47.708260: W neuron/nrt_adaptor.cc:53] nrt_tensor_write_hugepage() is not available, will fall back to nrt_tensor_write().
2025-11-09 00:48:47.708293: W neuron/nrt_adaptor.cc:62] nrt_tensor_read_hugepage() is not available, will fall back to nrt_tensor_read().
2025-Nov-09 00:48:47.0710 121345:121389 [0] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):213 CCOM WARN NET/OFI Failed to initialize sendrecv protocol
2025-Nov-09 00:48:47.0720 121345:121389 [0] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):354 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
2025-Nov-09 00:48:47.0730 121345:121389 [0] ncclResult_t nccl_net_ofi_init_no_atexit_f

# Compilation

After completing the fine-tuning process, the next step is to compile the trained model for AWS Trainium inference using the Hugging Face Optimum Neuron toolchain.
Neuron compilation optimizes the model graph and converts it into a Neuron Executable File Format (NEFF), enabling efficient execution on NeuronCores.

In [38]:
!optimum-cli export neuron \
  --model /home/ubuntu/environment/ml/qwen/merged_model \
  --task text-generation \
  --sequence_length 512 \
  --batch_size 1 \
  /home/ubuntu/environment/ml/qwen/compiled_model

  from pkg_resources import get_distribution


  from .mappings import (
  from .mappings import (
  from .mappings import (
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  from .mappings import (
  from .mappings import (
  from .mappings import (
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  component, error = import_nki(config)
  from pkg_resources import get_distribution
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/aws_neuronx_venv_pytorch_latest/lib/python3.10/site-packages/op

# Inference

We will install the Optimum Neuron vllm library.  Then, run inference using the compiled model.

In [39]:
%pip install optimum-neuron[vllm]


Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [40]:
import os
from vllm import LLM, SamplingParams
llm = LLM(
    model="/home/ubuntu/environment/ml/qwen/compiled_model", #local compiled model
    max_num_seqs=1,
    max_model_len=2048,
    device="neuron",
    tensor_parallel_size=2,
    override_neuron_config={})
example1="""
<|im_start|>system
You are a task management assistant. Users conversationally discuss their activities with you. From their input, output a 1-2 sentence summary.<|im_end|>
<|im_start|>user
Summarize what I said<|im_end|>
<|im_start|>assistant
"""
example2="""
<|im_start|>system
You are a task management assistant. Users conversationally discuss their activities with you. From their input, output a 1-2 sentence summary.<|im_end|>
<|im_start|>user
Summarize what I said<|im_end|>
<|im_start|>assistant
"""
example3="""
<|im_start|>system
You are a task management assistant. Users conversationally discuss their activities with you. From their input, output a 1-2 sentence summary.<|im_end|>
<|im_start|>user
Summarize what I said<|im_end|>
<|im_start|>assistant
"""

prompts = [
    example1,
    example2,
    example3
]

sampling_params = SamplingParams(max_tokens=2048, temperature=0.8)
outputs = llm.generate(prompts, sampling_params)

print("#########################################################")

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, \n\n Generated text: {generated_text!r} \n")

ValidationError: 1 validation error for ModelConfig
  Value error, Invalid repository ID or local directory specified: '/home/ubuntu/environment/ml/qwen/compiled_model'.
Please verify the following requirements:
1. Provide a valid Hugging Face repository ID.
2. Specify a local directory that contains a recognized configuration file.
   - For Hugging Face models: ensure the presence of a 'config.json'.
   - For Mistral models: ensure the presence of a 'params.json'.
3. For GGUF: pass the local path of the GGUF checkpoint.
   Loading GGUF from a remote repo directly is not yet supported.
 [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error