To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

Read our **[Qwen3 Guide](https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
!pip install unsloth
%pip install sacremoses sacrebleu --quiet
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Collecting unsloth
  Downloading unsloth-2025.5.1-py3-none-any.whl.metadata (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth_zoo>=2025.5.1 (from unsloth)
  Downloading unsloth_zoo-2025.5.1-py3-none-any.whl.metadata (8.0 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.30-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.20-py3-none-any.whl.metadata (10 kB)
Collecting transformers!=4.47.0,==4.51.3 (from unsloth)
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting trl!=0.15.0,!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,<=0.15.2,>=0.7.9 (from unsloth)
  Downloading trl-0.15.2-py3-none-any.whl.metadata (11 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fssp

In [None]:
import pandas as pd
from unsloth import FastLanguageModel
import torch
from datasets import Dataset
from trl import SFTTrainer, SFTConfig
import sacrebleu
from transformers import TextStreamer
from tqdm import tqdm
import requests

def make_tg_report(text) -> None:
    token = ''
    method = 'sendMessage'
    chat_id = 11111111

    _ = requests.post(
            url='https://api.telegram.org/bot{0}/{1}'.format(token, method),
            data={'chat_id': chat_id, 'text': text}
        ).json()

make_tg_report('импорты готовы')

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-05-13 15:09:53.126545: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747148993.351502      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747148993.416261      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
df_corpus_labeled = pd.read_csv('/kaggle/input/karelian-data/parallel_df_merged2.csv')
df_train = df_corpus_labeled[df_corpus_labeled.split=='train'].copy() # 22692 items
df_dev = df_corpus_labeled[df_corpus_labeled.split=='dev'].copy()     # 500 items
df_test = df_corpus_labeled[df_corpus_labeled.split=='test'].copy()  

df_train = pd.concat([df_train, df_dev]).sample(frac=1).reset_index(drop=True)
int_vals = set(df_train['kar']).intersection(set(df_test['kar']))
df_test = df_test[~df_test.kar.isin(int_vals)].reset_index(drop=True)

In [None]:
# %%capture
# import os
# if "COLAB_" not in "".join(os.environ.keys()):
#     !pip install unsloth
# else:
#     # Do this only in Colab notebooks! Otherwise use pip install unsloth
#     !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
#     !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
#     !pip install --no-deps unsloth

### Unsloth

In [None]:
fourbit_models = [
    "unsloth/Qwen3-1.7B-unsloth-bnb-4bit", # Qwen 14B 2x faster
    "unsloth/Qwen3-4B-unsloth-bnb-4bit",
    "unsloth/Qwen3-8B-unsloth-bnb-4bit",
    "unsloth/Qwen3-14B-unsloth-bnb-4bit",
    "unsloth/Qwen3-32B-unsloth-bnb-4bit",

    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/Phi-4",
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit" # [NEW] We support TTS models!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B",
    max_seq_length = 2048,   # Context length - can be longer, but uses more memory
    load_in_4bit = True,     # 4bit uses much less memory
    load_in_8bit = False,    # A bit more accurate, uses 2x memory
    full_finetuning = False, # We have full finetuning now!
    # token = "hf_...",      # use one if using gated models
)

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,           # Choose any number > 0! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,  # Best to choose alpha = rank or rank*2
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,   # We support rank stabilized LoRA
    loftq_config = None,  # And LoftQ
)

<a name="Data"></a>
### Data Prep
Qwen3 has both reasoning and a non reasoning mode. So, we should use 2 datasets:

1. We use the [Open Math Reasoning]() dataset which was used to win the [AIMO](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/leaderboard) (AI Mathematical Olympiad - Progress Prize 2) challenge! We sample 10% of verifiable reasoning traces that used DeepSeek R1, and whicht got > 95% accuracy.

2. We also leverage [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we need to convert it to HuggingFace's normal multiturn format as well.

In [None]:
# train_dataset = load_from_disk("/content/drive/MyDrive/parallel_dataset/train_dataset")
# test_dataset = load_from_disk("/content/drive/MyDrive/parallel_dataset/test_dataset")
# dev_dataset = load_from_disk("/content/drive/MyDrive/parallel_dataset/dev_dataset")
train_dataset = Dataset.from_pandas(df_train.drop(columns=['split']).reset_index(drop=True))
# valid_dataset = load_dataset("/content/drive/MyDrive/parallel_dataset", split = "valid")
# non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k", split = "train")

Let's see the structure of both datasets:

In [None]:
train_dataset

We now convert the reasoning dataset into conversational format:

In [None]:
def generate_translate_sentence(examples):
    rus = examples["rus"]
    kar = examples["kar"]
    conversations = list()
    for rus_sent, kar_sent in zip(rus, kar):
        conversations.append([
            {"role" : "user",      "content" : f"Карельский: {kar_sent}. Русский:"},
            {"role" : "assistant", "content" : rus_sent},
        ])
    return { "conversations": conversations, }

translate_conversations = tokenizer.apply_chat_template(
    train_dataset.map(generate_translate_sentence, batched = True)["conversations"],
    tokenize = False,
    )

In [None]:
# def generate_conversation(examples):
#     problems  = examples["problem"]
#     solutions = examples["generated_solution"]
#     conversations = []
#     for problem, solution in zip(problems, solutions):
#         conversations.append([
#             {"role" : "user",      "content" : problem},
#             {"role" : "assistant", "content" : solution},
#         ])
#     return { "conversations": conversations, }

In [None]:
# reasoning_conversations = tokenizer.apply_chat_template(
#     reasoning_dataset.map(generate_conversation, batched = True)["conversations"],
#     tokenize = False,
# )

Let's see the first transformed row:

In [None]:
translate_conversations[0]

Let's see the first row

Now let's see how long both datasets are:

In [None]:
print(len(translate_conversations))

In [None]:
data = pd.Series(translate_conversations)
data.name = "text"

combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
combined_dataset = combined_dataset.shuffle(seed = 3407)

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [None]:
WARM = 500 
# STEPS = 500

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = combined_dataset,
    eval_dataset = None, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4, # Use GA to mimic batch size!
        warmup_steps = WARM,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = STEPS,
        learning_rate = 2e-4, # Reduce to 2e-5 for long training runs
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        report_to = "none", # Use this for WandB etc
    ),
)

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

In [None]:
make_tg_report('обучение пошло')

In [None]:
trainer_stats = trainer.train()

In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

<a name="Inference"></a>
### Inference
Let's run the model via Unsloth native inference! According to the `Qwen-3` team, the recommended settings for reasoning inference are `temperature = 0.6, top_p = 0.95, top_k = 20`

For normal chat based inference, `temperature = 0.7, top_p = 0.8, top_k = 20`

In [7]:
def get_llm_translate(k_text):
    messages = [
        {"role" : "user", "content" : f"Карельский: {k_text}. Русский:"},
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize = False,
        add_generation_prompt = True, # Must add for generation
        enable_thinking = False, # Disable thinking
    )
    
    inputs = tokenizer(text, return_tensors = "pt")
    resp = model.generate(
        **inputs.to("cuda"),
        max_new_tokens = 256, # Increase for longer outputs!
        temperature = 0.6, top_p = 0.95, top_k = 20, # For non thinking
    )
    response_ids = resp[0][len(inputs.input_ids[0]):].tolist()
    res = tokenizer.decode(response_ids, skip_special_tokens=True)
    return res

In [8]:
KAGGLE_INPUT = '/kaggle/input/'
MODEL_PATH = 'nllb-rus-kar'
KRL = KAGGLE_INPUT + 'karelian-data/'
# df_test_skazki = pd.read_csv(KRL+'df_test_skazki2.csv')
# df_test = df_test_skazki.copy()
# df_test = pd.read_csv('/kaggle/input/karelian-data/parallel_df_merged2.csv')
# df_test = df_test[df_test['split'] == 'test'].copy()
truth = df_test['rus'].values

In [9]:
llm_translates = list()
for _, row in tqdm(df_test.iterrows(), total=len(df_test)):
    llm_translates.append(get_llm_translate(row.kar))
llm_translates = pd.DataFrame(llm_translates)
llm_translates.columns = ['rus']

bleu_calc = sacrebleu.BLEU()
chrf_calc = sacrebleu.CHRF(word_order=2) 
print(bleu_calc.corpus_score(llm_translates['rus'].tolist(), [truth.tolist()]))
print(chrf_calc.corpus_score(llm_translates['rus'].tolist(), [truth.tolist()]))
# llm_translates.to_csv('llm_translates.csv', index=False)

100%|██████████| 1754/1754 [49:57<00:00,  1.71s/it] 


BLEU = 17.69 47.5/22.8/12.6/7.4 (BP = 0.994 ratio = 0.994 hyp_len = 13633 ref_len = 13720)
chrF2++ = 39.70


In [12]:
llm_translates.to_csv('llm_translates_1900.csv', index=False)

In [11]:
print(bleu_calc.corpus_score(llm_translates['rus'][500:].tolist(), [truth[500:].tolist()]))
print(chrf_calc.corpus_score(llm_translates['rus'][500:].tolist(), [truth[500:].tolist()]))

BLEU = 19.33 42.0/24.2/14.6/9.4 (BP = 1.000 ratio = 1.002 hyp_len = 3852 ref_len = 3844)
chrF2++ = 41.98


In [10]:
make_tg_report('перевод LLM готов')

```
BLEU = 9.90 43.0/15.2/6.8/3.0 (BP = 0.925 ratio = 0.928 hyp_len = 4139 ref_len = 4460)
chrF2++ = 29.01      500 warm 3000 steps
BLEU = 6.88 37.7/10.6/4.2/1.8 (BP = 0.925 ratio = 0.928 hyp_len = 4139 ref_len = 4460)
chrF2++ = 24.03     early-stop 50 warm 500 steps
```

```
BLEU = 17.69 47.5/22.8/12.6/7.4 (BP = 0.994 ratio = 0.994 hyp_len = 13633 ref_len = 13720)
chrF2++ = 39.70
```
чисто словарь
```
BLEU = 19.33 42.0/24.2/14.6/9.4 (BP = 1.000 ratio = 1.002 hyp_len = 3852 ref_len = 3844)
chrF2++ = 41.98
```

In [None]:
# llm_translates = pd.DataFrame(llm_translates)
# llm_translates.columns = ['rus']
# llm_translates.to_csv('llm_translates.csv', index=False)

In [None]:
k_text = 'Myöhembäh Afriekakse ruvettih kuččuo tämän manderen kaikkii alovehii, a sit iččiegi mannerdu'
messages = [
    {"role" : "user", "content" : f"Карельский: {k_text}. Русский:"},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
    enable_thinking = False, # Disable thinking
)

inputs = tokenizer(text, return_tensors = "pt")
resp = model.generate(
    **inputs.to("cuda"),
    max_new_tokens = 256, # Increase for longer outputs!
    temperature = 0.6, top_p = 0.95, top_k = 20, # For non thinking
    # streamer = TextStreamer(tokenizer, skip_prompt = True),
)
response_ids = resp[0][len(inputs.input_ids[0]):].tolist()
res = tokenizer.decode(response_ids, skip_special_tokens=True)
print(res)

In [None]:
text

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model_fullepoch")  # Local saving
tokenizer.save_pretrained("lora_model_fullepoch")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [5]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/kaggle/input/qwen_ft/other/default/1", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = True,
    )

==((====))==  Unsloth 2025.5.1: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    Tesla P100-PCIE-16GB. Num GPUs = 1. Max memory: 15.888 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 6.0. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/168k [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.59G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/1.56G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

Unsloth 2025.5.1 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


In [6]:
make_tg_report('модель загрузили')

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)

In [None]:
# Save to 8bit Q8_0
if False:
    model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False:
    model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: # Pushing to HF Hub
    model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: # Pushing to HF Hub
    model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model.gguf` file or `model-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
