<a href="https://colab.research.google.com/github/denis-shema/aiclass/blob/main/trying.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [11]:
def livestock_prompt_template(data: str, num_records: int = 5):
    return f"""You are an expert agricultural assistant helping a machine learning engineer build a high-quality instruction tuning dataset for livestock farming.
Your task is to transform the provided data chunk into diverse question and answer (Q&A) pairs that will be used to fine-tune a language model serving smallholder farmers,
extension officers, and students.

The model will be deployed in rural settings with limited connectivity and must deliver clear, actionable, and locally relevant advice. Your Q&A pairs will help the model
understand how to respond to real-world farming challenges across various livestock domains including cattle, poultry, goats, and pigs.

For each of the {num_records} entries, generate one or two well-structured questions that reflect different aspects of the information in the chunk—such as:

- Disease diagnosis (e.g., symptoms, early signs, transmission)
- Treatment advice (e.g., medications, isolation, sanitation)
- Feeding schedules (e.g., frequency, age-specific diets, supplements)
- Best practices (e.g., housing, hygiene, breeding, record-keeping)

Ensure a mix of longer and shorter questions:
- Short questions: 1–2 sentences, focused and direct
- Long questions: 3–4 sentences, providing context or asking for step-by-step guidance

Each Q&A pair should be concise yet informative, capturing key insights from the data. Avoid repetition and ensure diversity across questions.

Structure your output in JSON format, where each object contains 'instruction', 'input', and 'output' fields. The JSON structure should look like this:

    {{
        "instruction": "Your question here...",
        "input": "",
        "output": "Your answer here..."
    }}

Guidelines:
- Keep the tone clear, neutral, and supportive
- Avoid sensitive, biased, or speculative content
- Use terminology familiar to farmers and extension workers
- Answers should be practical, accurate, and locally applicable
- If the data includes technical terms, explain them simply

Example:

    {{
        "instruction": "What are the symptoms of foot-and-mouth disease in cattle?",
        "input": "",
        "output": "Symptoms include fever, blisters in the mouth and feet, excessive salivation, and lameness."
    }}

    {{
        "instruction": "How do I treat coccidiosis in goats?",
        "input": "Goats are 3 months old and kept in a semi-intensive system.",
        "output": "Administer sulfa-based drugs like sulfadimidine or amprolium as prescribed. Isolate infected goats, clean housing thoroughly, and reduce moisture to prevent reinfection."
    }}

    {{
        "instruction": "What is the best feeding schedule for broiler chickens?",
        "input": "",
        "output": "Feed broiler chickens 3–4 times daily during the first 3 weeks. Afterward, provide continuous access to feed and clean water. Adjust portions based on growth and age."
    }}

By following these guidelines, you'll contribute to a robust and effective dataset that enhances the model's performance in supporting livestock farmers and agricultural advisors.

---

Data
{data}
"""


In [12]:
!pip install docling -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.6/162.6 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m164.5/164.5 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.1/15.1 MB[0m [31m103.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.7/42.7 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.7/63.7 kB[0m [31m6.3 MB/s[0m eta 

In [13]:
!pip install docling -q

In [14]:
!pip install colorama -q

In [15]:
!pip install colorama -q

In [16]:
!pip install langchain_openai -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [17]:
from docling.document_converter import DocumentConverter
from docling.chunking import HybridChunker
from colorama import Fore

import json
from typing import List
from pydantic import BaseModel

class Record(BaseModel):
    instruction: str
    input: str
    output:str

class Response(BaseModel):
    generated: List[Record]

In [18]:
from langchain_openai import ChatOpenAI
from google.colab import userdata

llm = ChatOpenAI(
    api_key = userdata.get("API_key"),
    model ="glm-4.5",
    base_url ="https://open.bigmodel.cn/api/paas/v4/",
    temperature=0.7
)

In [19]:
!pip install prompt-template -q

In [20]:
from langchain_openai import ChatOpenAI
from google.colab import userdata
import json
from typing import List
from pydantic import BaseModel

class Record(BaseModel):
    instruction: str
    input: str
    output:str

class Response(BaseModel):
    generated: List[Record]

def llm_call (data:str, num_records:int = 5) -> dict :
    prompt = livestock_prompt_template(data, num_records=num_records)
    response = llm.invoke(prompt)
    raw_output = response.content

    parsed = None
    try:
        # Attempt to parse the raw output directly
        parsed = json.loads(raw_output)
    except json.JSONDecodeError:
        # If direct parsing fails, try cleaning the output
        cleaned_output = raw_output.strip()
        # Find the start and end of the JSON array
        start_index = cleaned_output.find('[')
        end_index = cleaned_output.rfind(']')

        if start_index != -1 and end_index != -1 and end_index > start_index:
            cleaned_output = cleaned_output[start_index : end_index + 1]
        else:
             # If finding the JSON array fails, try removing common non-JSON prefixes and suffixes
            if cleaned_output.startswith("```json"):
                cleaned_output = cleaned_output[len("```json"):]
            if cleaned_output.endswith("```"):
                cleaned_output = cleaned_output[:-len("```")]
            cleaned_output = cleaned_output.strip()

            # Split the string of JSON objects and join them with a comma
            json_objects = cleaned_output.split('}{')
            cleaned_output = '},{'.join(json_objects)

            # Wrap the cleaned output in square brackets to form a JSON array
            cleaned_output = f"[{cleaned_output}]"


        # Try parsing the cleaned output as a JSON array
        try:
            parsed = json.loads(cleaned_output)
        except json.JSONDecodeError as e:
            print(f"Failed to parse JSON after cleaning and wrapping: {e}")
            print(f"Raw output: {raw_output}")
            print(f"Cleaned output: {cleaned_output}")
            # Do not raise the exception here, let the validation handle it or return empty

    # Ensure that parsed is a list before validation
    if not isinstance(parsed, list):
        parsed = []

    # Ensure that each item in parsed is a dictionary before creating a Record object
    validated = Response(generated=[Record(**item) for item in parsed if isinstance(item, dict)])
    return validated.model_dump()

In [11]:
# converter = DocumentConverter()
# doc = converter.convert("Cattle-Management-Manual.pdf").document

# chunker = HybridChunker()
# chunks = chunker.chunk(dl_doc=doc)

# dataset = {}
# for i, chunk in enumerate(chunks):
#         enriched_text = chunker.contextualize(chunk=chunk)
#         print(Fore.LIGHTMAGENTA_EX + f"Contextualized Tex:\n{enriched_text[:300]}…" + Fore.RESET)
#         data = llm_call(
#             enriched_text
#         )
#         print("Output of llm_call:", data)
#         if data["generated"]: # Check if the 'generated' list is not empty
#             dataset[i] = {"generated":data["generated"], "context":enriched_text}
#         else:
#             print(Fore.YELLOW + f"Skipping chunk {i} as no records were generated." + Fore.RESET)


# with open('dataset3.json','w') as f:
#     json.dump(dataset, f, indent=3)

In [24]:
import json

instructions = []

# Load data from dataset.json
with open('dataset.json', 'r') as f:
    data = json.load(f)

# Extract and reformat instructions
for key, chunk in data.items():
    for pairs in chunk['generated']:
        instruction = pairs['instruction']
        input = pairs['input']
        output = pairs['output']
        context_pair = {
            'instruction': instruction,
            'input': input,
            'output': output
        }
        instructions.append(context_pair)

# Save to training.json
with open('training.json', 'w') as f:
    json.dump(instructions, f, indent=3)

In [22]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.55.4
!pip install --no-deps trl==0.22.2

In [2]:
!pip install -U bitsandbytes --upgrade -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
!pip install transformers -q

In [4]:
!pip install unsloth -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.2/59.2 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m344.5/344.5 kB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m506.3/506.3 kB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m116.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.7/564.7 kB[0m [31m45.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.6/265.6 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.2/117.2 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━

In [5]:
!pip install --upgrade unsloth -q


In [6]:
!pip uninstall unsloth unsloth_zoo -y && pip install --no-deps git+https://github.com/unslothai/unsloth_zoo.git && pip install --no-deps git+https://github.com/unslothai/unsloth.git

Found existing installation: unsloth 2025.10.4
Uninstalling unsloth-2025.10.4:
  Successfully uninstalled unsloth-2025.10.4
Found existing installation: unsloth_zoo 2025.10.4
Uninstalling unsloth_zoo-2025.10.4:
  Successfully uninstalled unsloth_zoo-2025.10.4
Collecting git+https://github.com/unslothai/unsloth_zoo.git
  Cloning https://github.com/unslothai/unsloth_zoo.git to /tmp/pip-req-build-gd1apme6
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth_zoo.git /tmp/pip-req-build-gd1apme6
  Resolved https://github.com/unslothai/unsloth_zoo.git to commit 12299b678150054ea790d91c51bf0960a3174dbe
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth_zoo
  Building wheel for unsloth_zoo (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth_zoo: filename=unsloth_zoo-2025.10.5-p

In [8]:
!pip show unsloth

Name: unsloth
Version: 2025.10.5
Summary: 2-5X faster training, reinforcement learning & finetuning
Home-page: http://www.unsloth.ai
Author: Unsloth AI team
Author-email: info@unsloth.ai
License: 
Location: /usr/local/lib/python3.12/dist-packages
Requires: 
Required-by: 


In [None]:

!pip install transformers accelerate bitsandbytes trl

In [9]:
import unsloth
from unsloth import FastLanguageModel


Please restructure your imports with 'import unsloth' at the top of your file.
  import unsloth


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [7]:

from transformers import AutoModelForCausalLM, AutoTokenizer


In [7]:
# !pip uninstall -y unsloth unsloth-zoo transformers
# !pip install unsloth transformers

Found existing installation: unsloth 2025.10.4
Uninstalling unsloth-2025.10.4:
  Successfully uninstalled unsloth-2025.10.4
Found existing installation: unsloth_zoo 2025.10.4
Uninstalling unsloth_zoo-2025.10.4:
  Successfully uninstalled unsloth_zoo-2025.10.4
Found existing installation: transformers 4.57.1
Uninstalling transformers-4.57.1:
  Successfully uninstalled transformers-4.57.1
Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/cli/main.py", line 78, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pip/_internal/commands/__init__.py", line 114, in create_command
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:],

KeyboardInterrupt: 

In [11]:
# !pip install --upgrade unsloth transformers accelerate -q

In [17]:
# # Load model directly
# from transformers import AutoTokenizer, AutoModelForCausalLM

# tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit")
# model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct-bnb-4bit")
# messages = [
#     {"role": "user", "content": "Who are you?"},
# ]
# inputs = tokenizer.apply_chat_template(
# 	messages,
# 	add_generation_prompt=True,
# 	tokenize=True,
# 	return_dict=True,
# 	return_tensors="pt",
# ).to(model.device)

# outputs = model.generate(**inputs, max_new_tokens=40)
# print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/5.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/271 [00:00<?, ?B/s]

AttributeError: 'Qwen2Attention' object has no attribute 'apply_qkv'

In [16]:
from unsloth import FastLanguageModel
import torch

max_length = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen2.5-7B",
    max_seq_length=max_length,
    load_in_4bit=True
)

==((====))==  Unsloth 2025.10.4: Fast Qwen2 patching. Transformers: 4.57.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

AttributeError: 'NoneType' object has no attribute 'span'

In [10]:
from unsloth import FastLanguageModel
import torch

# Set the maximum sequence length for the model
max_length = 2048

# Load the model and tokenizer with safe defaults
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen2.5-7B",     # Optimized Qwen2.5-7B model
    max_seq_length = max_length,           # Context window size
    dtype = torch.float16,                 # Explicit precision (avoid None)
    load_in_4bit = True                    # Use 4-bit quantization for efficiency
)

# Optional: verify model is loaded
print("Model and tokenizer loaded successfully.")

==((====))==  Unsloth 2025.10.5: Fast Qwen2 patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.54G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Model and tokenizer loaded successfully.


In [25]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

from datasets import load_dataset
dataset = dataset = load_dataset("json", data_files="training.json", split="train")
train_dataset = dataset.map(formatting_prompts_func, batched = True,)

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/687 [00:00<?, ? examples/s]

In [26]:
examples = dataset.select(range(3))
for i, ex in enumerate(examples):
    prompt = alpaca_prompt.format(ex["instruction"], ex["input"], "")
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=200)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print("Instruction:", ex["instruction"])
    print("Model Response:", response.split("### Response:")[-1].strip())


Instruction: What makes Uttarakhand a promising region for dairy farming entrepreneurs?
Model Response: Uttarakhand is a promising region for dairy farming entrepreneurs due to several factors, including:

1. **Abundant Natural Resources**: Uttarakhand is known for its lush green forests, rolling hills, and vast grasslands, which provide ample grazing land for cattle. The state's climate is also conducive to dairy farming, with moderate temperatures and abundant rainfall throughout the year.

2. **Government Support**: The state government has been actively promoting dairy farming through various initiatives such as providing subsidies on fodder, veterinary services, and infrastructure development. This support helps reduce the financial burden on dairy farmers and encourages them to invest in their businesses.

3. **Growing Demand for Milk Products**: With a growing population and increasing awareness about the nutritional benefits of milk products, there is a rising demand for dairy 

In [27]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,

)


Unsloth 2025.10.5 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [28]:
from trl import SFTConfig, SFTTrainer
training_args = SFTConfig(
    output_dir="custom_finetuned_model_qwen",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    # num_train_epochs=10,
    max_steps =100,
    learning_rate=2e-4,
    logging_steps =1,
    weight_decay = 0.001,
    lr_scheduler_type = "linear",
    seed = 3407,
    optim = "adamw_8bit",
    packing = False,
)

In [29]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=max_length,
    args=training_args,
)

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/687 [00:00<?, ? examples/s]

In [30]:
!pip install mlflow -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/8.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m7.7/8.8 MB[0m [31m232.8 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m8.8/8.8 MB[0m [31m225.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m123.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m78.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m73.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.8/147.8 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━

In [35]:
import mlflow
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = max_length,
    packing = False,
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 20,
        max_steps = 100,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0001,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "custom_finetuned_model_qwen_3",
        report_to = "none",
    ),
)

In [36]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 687 | Num Epochs = 2 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 40,370,176 of 7,655,986,688 (0.53% trained)


Step,Training Loss
1,0.6332
2,0.7557
3,0.7324
4,0.7359
5,0.6812
6,0.5979
7,0.699
8,0.658
9,0.51
10,0.6376


TrainOutput(global_step=100, training_loss=0.7576585537195206, metrics={'train_runtime': 433.4973, 'train_samples_per_second': 1.845, 'train_steps_per_second': 0.231, 'total_flos': 3933500625847296.0, 'train_loss': 0.7576585537195206, 'epoch': 1.1627906976744187})

In [48]:
trainer.save_model("custom_finetuned_model_qwen12")
tokenizer.save_pretrained("custom_finetuned_model_qwen12")

('custom_finetuned_model_qwen12/tokenizer_config.json',
 'custom_finetuned_model_qwen12/special_tokens_map.json',
 'custom_finetuned_model_qwen12/vocab.json',
 'custom_finetuned_model_qwen12/merges.txt',
 'custom_finetuned_model_qwen12/added_tokens.json',
 'custom_finetuned_model_qwen12/tokenizer.json')

In [44]:
model, tokenizer = FastLanguageModel.from_pretrained(
    "custom_finetuned_model_qwen",
    max_seq_length=max_length,
    dtype=torch.float16,
    load_in_4bit=True,  # Keep this

)

==((====))==  Unsloth 2025.10.5: Fast Qwen2 patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

In [42]:
from unsloth import FastLanguageModel
import torch

max_length = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    "custom_finetuned_model_qwen",
    max_seq_length=max_length,
    dtype = torch.float16,
    load_in_4bit=True,
)

model.eval()


==((====))==  Unsloth 2025.10.5: Fast Qwen2 patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

In [45]:

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

def generate_response(instruction, input_text=""):
    prompt = alpaca_prompt.format(instruction, input_text, "")
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=300,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("### Response:")[-1].strip()

In [46]:
!pip install huggingface_hub -q

from huggingface_hub import HfApi, notebook_login
notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [49]:
from huggingface_hub import HfApi, HfFolder, Repository
import os

hf_username = "Denis-Shema"
repo_name = "Qwen3-finetuned"
model_dir = "custom_finetuned_model_qwen/"

api = HfApi()


api.create_repo(repo_id=f"{hf_username}/{repo_name}", exist_ok=True)

api.upload_folder(
    folder_path=model_dir,
    repo_id=f"{hf_username}/{repo_name}",
    repo_type="model",
)
print(f"Model uploaded to https://huggingface.co/{hf_username}/{repo_name}")


Model uploaded to https://huggingface.co/Denis-Shema/Qwen3-finetuned


In [None]:
!pip uninstall pyarrow


In [None]:
!pip install pyarrow==19.0.0


In [None]:
!pip uninstall -y pyarrow transformers
!pip install pyarrow==19.0.0 transformers==4.35.2 unsloth


In [None]:
!sudo apt-get update -y
!sudo apt-get install python3.10 python3.10-distutils -y


In [None]:
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
!sudo update-alternatives --config python3


In [None]:
!pip install --upgrade pip
!pip install transformers==4.35.2 unsloth pyarrow==19.0.0


In [None]:
!pip uninstall -y transformers pyarrow
!pip install transformers==4.35.2 pyarrow==19.0.0 unsloth