To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth?tab=readme-ov-file#-installation-instructions).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

[NEW] Llama-3.1 8b, 70b & 405b are trained on a crazy 15 trillion tokens with 128K long context lengths!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [1]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers==0.0.28" trl peft accelerate bitsandbytes triton #xformers==0.0.28

In [None]:
!pip uninstall xformers
!pip install xformers

Found existing installation: xformers 0.0.27
Uninstalling xformers-0.0.27:
  Would remove:
    /usr/local/lib/python3.10/dist-packages/xformers-0.0.27.dist-info/*
    /usr/local/lib/python3.10/dist-packages/xformers/*
Proceed (Y/n)? y
  Successfully uninstalled xformers-0.0.27
Collecting xformers
  Downloading xformers-0.0.28.post1-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting torch==2.4.1 (from xformers)
  Downloading torch-2.4.1-cp310-cp310-manylinux1_x86_64.whl.metadata (26 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.4.1->xformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.4.1->xformers)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.4.1->xformers)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.

* We support Llama, Mistral, Phi-3, Gemma, Yi, DeepSeek, Qwen, TinyLlama, Vicuna, Open Hermes etc
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
* [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
* [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [8]:
import torch
import gc

# Empty the cache
torch.cuda.empty_cache()

# Run garbage collection
gc.collect()

30

<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [5]:
from datasets import load_dataset

# Load the dataset
dataset = load_dataset('json', data_files={'train': 'data.json'})

# Check the dataset
print(dataset['train'][0])


Generating train split: 0 examples [00:00, ? examples/s]

{'question': 'What is the purpose of the Department of Technical Education, Government of Rajasthan website?', 'answer': 'The purpose of the website is to provide information about technical education in Rajasthan, including admissions, colleges, policies, and various administrative matters related to technical education in the state.'}


In [10]:
qa_prompt_template = """
You are an expert virtual assistant for the Department of Technical Education, Government of Rajasthan. Answer the following question related to colleges, admissions, fees, scholarships, and other academic details. If the question is unrelated to these topics, respond with "I am unable to assist with this question."

### Instruction:
{}

### Response:
{}
"""


EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["question"]
    outputs      = examples["answer"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = qa_prompt_template.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

# from datasets import load_dataset

# ds = load_dataset("kshitij230/Indian-Law", split="train")
# ds = load_dataset("Ghost222/Indian_Law_9Brainz", split="train")
# dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset['train'].map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/184 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [11]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/184 [00:00<?, ? examples/s]

In [12]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
5.984 GB of memory reserved.


In [13]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 184 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 23
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.5639
2,2.4902
3,2.532
4,2.3532
5,2.1153
6,1.7386
7,1.6237
8,1.2888
9,0.9733
10,0.9339


In [14]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

135.7269 seconds used for training.
2.26 minutes used for training.
Peak reserved memory = 6.875 GB.
Peak reserved memory for training = 0.891 GB.
Peak reserved memory % of max memory = 46.616 %.
Peak reserved memory for training % of max memory = 6.041 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

In [17]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    qa_prompt_template.format(
        "Name the government colleges in rajasthan?", # instruction
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 512, use_cache = True)
tokenizer.batch_decode(outputs)

['<|begin_of_text|>\nYou are an expert virtual assistant for the Department of Technical Education, Government of Rajasthan. Answer the following question related to colleges, admissions, fees, scholarships, and other academic details. If the question is unrelated to these topics, respond with "I am unable to assist with this question."\n\n### Instruction:\nName the government colleges in rajasthan?\n\n### Response:\n\nGovernment Polytechnic College, Churu\nGovernment Polytechnic College, Jhunjhunu\nGovernment Polytechnic College, Alwar\nGovernment Polytechnic College, Karauli\nGovernment Polytechnic College, Dungarpur\nGovernment Polytechnic College, Rajsamand\nGovernment Polytechnic College, Dausa\nGovernment Polytechnic College, Jalore\nGovernment Polytechnic College, Barmer\nGovernment Polytechnic College, Bhilwara\nGovernment Polytechnic College, Banswara\nGovernment Polytechnic College, Pali\nGovernment Polytechnic College, Chittorgarh\nGovernment Polytechnic College, Jodhpur\nGo

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "What is fibonacci number", # instruction
        # "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<|begin_of_text|>You are a legal expert specializing in Indian Law. Answer the following question only if it is related to Indian Law. Otherwise, respond with "I am not able to answer this question. Below is an instruction that asks a question about Indian Law. Write a response that appropriately completes the request.

### Instruction:
What is fibonacci number

### Response:
Fibonacci number is a number that is a part of the Fibonacci sequence. The Fibonacci sequence is a sequence of numbers where each number is the sum of the two preceding numbers, starting with 0 and 1. The first few Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "What is a famous tall tower in Paris?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is a famous tall tower in Paris?

### Input:


### Response:
One of the most famous and iconic tall towers in Paris is the Eiffel Tower. Standing at 324 meters (1,063 feet) tall, this wrought iron tower is a symbol of the city and a must-see attraction for tourists from all over the world.<|end_of_text|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [20]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("nishant-verma-7/sih_v1", tokenizer, save_method = "merged_16bit", token = "token")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 5.0 out of 12.67 RAM for saving.


100%|██████████| 32/32 [01:59<00:00,  3.72s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Unsloth: Saving sih_v1/pytorch_model-00001-of-00004.bin...
Unsloth: Saving sih_v1/pytorch_model-00002-of-00004.bin...
Unsloth: Saving sih_v1/pytorch_model-00003-of-00004.bin...
Unsloth: Saving sih_v1/pytorch_model-00004-of-00004.bin...


README.md:   0%|          | 0.00/596 [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?it/s]

pytorch_model-00003-of-00004.bin:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

pytorch_model-00001-of-00004.bin:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

pytorch_model-00004-of-00004.bin:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

pytorch_model-00002-of-00004.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/nishant-verma-7/sih_v1


### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/drive/1T-YBVfnphoVc8E2E854qF3jdia2Ll2W2?usp=sharing)**

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. [**NEW**] We make Phi-3 Medium / Mini **2x faster**! See our [Phi-3 Medium notebook](https://colab.research.google.com/drive/1hhdhBa1j_hsymiW9m-WzxQtgqTH_NHqi?usp=sharing)
10. [**NEW**] We make Gemma-2 9b / 27b **2x faster**! See our [Gemma-2 9b notebook](https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing)
11. [**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/drive/1WZDi7APtQ9VsvOrQSSC5DDtxq159j8iZ?usp=sharing)
12. [**NEW**] We make Mistral NeMo 12B 2x faster and fit in under 12GB of VRAM! [Mistral NeMo notebook](https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>

# Preparint the dataset

In [None]:
import requests
from bs4 import BeautifulSoup
import re
import urllib.parse

# List of file extensions to search for (you can add more if needed)
FILE_TYPES = ['pdf', 'xls', 'xlsx', 'ppt', 'pptx']

def get_all_links(url, visited):
    # Make the request and get the page content
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check for HTTP errors
    except requests.exceptions.RequestException as e:
        print(f"Failed to fetch {url}: {e}")
        return []

    # Parse the content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Get all links on the page
    links = []
    for a_tag in soup.find_all('a', href=True):
        href = a_tag['href']
        # Resolve relative URLs
        full_url = urllib.parse.urljoin(url, href)

        # Avoid revisiting the same links
        if full_url not in visited:
            visited.add(full_url)
            links.append(full_url)

    return links

def find_resources(url, visited):
    # Fetch all links on the current page
    links = get_all_links(url, visited)

    resource_links = []
    for link in links:
        # Check if the link points to a file with a required extension
        if any(link.lower().endswith(ext) for ext in FILE_TYPES):
            resource_links.append(link)
        # Recursively visit internal links
        elif link.startswith(url):  # Ensure the link is part of the main domain
            resource_links.extend(find_resources(link, visited))

    return resource_links

if __name__ == "__main__":
    # The main URL to start with
    main_url = 'https://dte.rajasthan.gov.in'

    # Set of visited links to avoid revisiting the same URLs
    visited_links = set()

    print(f"Scraping {main_url} for resources...")

    # Find all resource links
    resources = find_resources(main_url, visited_links)

    print("\nFound the following resources:")
    for resource in resources:
        print(resource)


Scraping https://dte.rajasthan.gov.in for resources...





Found the following resources:
https://dte.rajasthan.gov.in/assets/docs/About_us/Contact%20Persons%20DTE.pdf
https://dte.rajasthan.gov.in/assets/docs/About_us/Complete%20Prativedan%202023-24.pdf
https://dte.rajasthan.gov.in/assets/docs/Emp_Crnr/Emp%20Transfer%20Policy.pdf
https://dte.rajasthan.gov.in/assets/docs/Admission/FirstYearEngg/7832.pdf
https://dte.rajasthan.gov.in/assets/docs/Admission/Lateral/Date Extend.pdf
https://dte.rajasthan.gov.in/assets/docs/Admission/Lateral/Notification.pdf
https://dte.rajasthan.gov.in/assets/docs/Admission/FirstYearEngg/VIGAPTI%202024-25.pdf
https://dte.rajasthan.gov.in/assets/docs/Admission/FirstYearNonEngg/Vigyapti%20Non-Engg2024-25.pdf
https://dte.rajasthan.gov.in/assets/docs/Circular/NOC.pdf
https://dte.rajasthan.gov.in/assets/docs/Circular/NOCPolicy.pdf
https://dte.rajasthan.gov.in/assets/docs/Circular/MBC_July2018.pdf
https://dte.rajasthan.gov.in/assets/docs/Circular/BeanchMarkDisability.pdf
https://dte.rajasthan.gov.in/assets/docs/Circular/F

In [None]:
import requests
from bs4 import BeautifulSoup
import re
import urllib.parse
import pandas as pd

# List of file extensions to search for
FILE_TYPES = ['pdf', 'xls', 'xlsx', 'ppt', 'pptx']

def get_all_links_and_descriptions(url, visited):
    # Make the request and get the page content
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check for HTTP errors
    except requests.exceptions.RequestException as e:
        print(f"Failed to fetch {url}: {e}")
        return []

    # Parse the content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Get all links with their descriptions
    links = []
    for a_tag in soup.find_all('a', href=True):
        href = a_tag['href']
        description = a_tag.get_text(strip=True)  # Get the text description

        # Resolve relative URLs
        full_url = urllib.parse.urljoin(url, href)

        # Avoid revisiting the same links
        if full_url not in visited:
            visited.add(full_url)
            links.append((full_url, description))

    return links

def find_resources_and_descriptions(url, visited):
    # Fetch all links and descriptions on the current page
    links_with_descriptions = get_all_links_and_descriptions(url, visited)

    resource_links = []
    for link, description in links_with_descriptions:
        # Check if the link points to a file with a required extension
        if any(link.lower().endswith(ext) for ext in FILE_TYPES):
            resource_links.append((link, description))
            if len(resource_links)>500:
                break
        # Recursively visit internal links
        elif 'rajasthan.gov.in' in link:  # Ensure the link is part of the main domain
            resource_links.extend(find_resources_and_descriptions(link, visited))

    return resource_links

if __name__ == "__main__":
    # The main URL to start with
    main_url = 'https://dte.rajasthan.gov.in/'

    # Set of visited links to avoid revisiting the same URLs
    visited_links = set()

    print(f"Scraping {main_url} for resources...")

    # Find all resource links and their descriptions
    resources_with_descriptions = find_resources_and_descriptions(main_url, visited_links)

    # Create a DataFrame with the links and descriptions
    df = pd.DataFrame(resources_with_descriptions, columns=["Link", "Description"])

    # Print or save the DataFrame as required
    print("\nDataFrame of Resources:")
    print(df)

    # You can save the DataFrame to a CSV file
    df.to_csv("all_resources.csv", index=False)
    df.head()


Scraping https://dte.rajasthan.gov.in/ for resources...
Failed to fetch https://egyan.rajasthan.gov.in/: 502 Server Error: Bad Gateway for url: https://egyan.rajasthan.gov.in/




KeyboardInterrupt: 

In [None]:
# get the gov colleges details
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

# Function to scrape the table with id 'webtable1' from the provided URL
def scrape_table(url):
    # Send a GET request to fetch the page content
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code != 200:
        print(f"Failed to fetch the URL: {url}")
        return

    # Parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find the table with id 'webtable1'
    table = soup.find('table', {'id': 'webtable1'})

    # Check if the table exists
    if not table:
        print("Table with ID 'webtable1' not found.")
        return

    # Extract table headers
    headers = []
    for th in table.find_all('th'):
        headers.append(th.get_text(strip=True))

    # Extract table rows
    rows = []
    for tr in table.find_all('tr')[1:]:  # Skip the header row
        cells = tr.find_all('td')
        row = []
        for i, cell in enumerate(cells):
            # Check if it's the "Link" column and contains an <a> tag
            if i == len(cells) - 1:  # Assuming the link is the last column
                a_tag = cell.find('a')
                if a_tag and 'href' in a_tag.attrs:
                    row.append(a_tag['href'])  # Extract the href link
                else:
                    row.append('')  # No link found, append empty string
            else:
                row.append(cell.get_text(strip=True))  # Regular cell text
        rows.append(row)
    rows.pop(32)
    # Create a pandas DataFrame
    df = pd.DataFrame(rows, columns=headers)

    # Save the DataFrame to a CSV file
    df.to_csv('colleges_info_with_links.csv', index=False)
    print("Table data with links saved to 'colleges_info_with_links.csv'")

if __name__ == "__main__":
    # The URL of the webpage containing the table
    url = 'https://dte.rajasthan.gov.in'

    # Call the function to scrape the table and save it as CSV
    scrape_table(url)


Table data with links saved to 'colleges_info_with_links.csv'


In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_placement_data(url):
  """Scrapes placement data from a college link and returns a Pandas DataFrame.

  Args:
    url: The URL of the college website.

  Returns:
    A Pandas DataFrame containing the placement data, or None if not found.
  """
  try:
    # Go to the placement data page
    placement_url = url + "/placement-data"
    response = requests.get(placement_url)
    response.raise_for_status()

    # Parse the HTML content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find the table in the main section
    main_section = soup.find('section', {'id': 'main'})
    if main_section:
      table = main_section.find('table')
      if table:
        return pd.read_html(str(table))[0]
  except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")
  return None

# Example usage
df = pd.read_csv('colleges_info_with_links.csv')
for index, row in df.iterrows():
    college_url = row['Link']
    # college_url = "https://hte.rajasthan.gov.in/college/gpcalwar"
    placement_df = scrape_placement_data(college_url)
    if placement_df is not None:
      # Execute the code yourself to see the output
      print(placement_df.head())
      placement_df.to_csv(f"{row['College Name']}_placement_data.csv", index=False)
    else:
      print("Placement data not found.")

  return pd.read_html(str(table))[0]


                                0                                 1   \
0  Govt. Polytechnic College,Ajmer   Govt. Polytechnic College,Ajmer   
1                Placement 2022-23                 Placement 2022-23   
2                           S. No.      Name of Institute (GPC/GWPC)   
3                                1  Govt. Polytechnic College,Ajmer.   
4                                2   Govt. Polytechnic College,Ajmer   

                                2                                3   \
0  Govt. Polytechnic College,Ajmer  Govt. Polytechnic College,Ajmer   
1                Placement 2022-23                Placement 2022-23   
2         Date of Campus Interview                  Name of Company   
3                          15.7.22                 Tata Power Solar   
4                          19.7.22      Secure Meters Ltd., Udaipur   

                                4                                5   \
0  Govt. Polytechnic College,Ajmer  Govt. Polytechnic College,Ajmer  

  return pd.read_html(str(table))[0]


                                                   0  \
0  Name of Industry/Contact Person (HR)/Address/M...   
1              Krishna Maruty Ltd., Manesar, Gurgaon   
2  RAMKISHORE S VP & HEAD - HUMAN RESOURCES DIVIS...   
3  Sarika Bhardwaj H.R Specialist Hollister Medic...   
4                                     Addverb, Noida   

                                        1  \
0  Venue of Campus/Pool Drive/Date & Time   
1           Gpc Alwar 19/04/2022 10:30 AM   
2                             online mode   
3        GPC, Alwar 07/04/2022 (10:30 AM)   
4                             GPC Bikaner   

                                            2  \
0    Name of Branches Appearing in the Campus   
1                                  Mechanical   
2                                  mechanical   
3  ALL BRANCHES (Pass out+current 2022 batch)   
4                                         NaN   

                                         3  \
0  No. of Students Appearing – Branch wise   
1    

  return pd.read_html(str(table))[0]


     0       1        2        3        4        5        6         7   \
0   NaN     NaN      NaN      NaN      NaN      NaN      NaN       NaN   
1  S.no  Branch   INTAKE   I year   I year  II year  II year  III year   
2   NaN     NaN  Regular  Regular  Regular  Regular  Regular   Regular   
3     1   CIVIL       40       35       35       38       38        30   
4     2   MECH.       40       23       23       32       32        33   

         8      9      10         11         12  13  
0       NaN    NaN    NaN        NaN        NaN NaN  
1  III year  TOTAL  TOTAL  PLACEMENT  PLACEMENT NaN  
2   Regular  TOTAL  TOTAL        NaN        NaN NaN  
3        30    103    103         12         12 NaN  
4        33     88     88          7          7 NaN  
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not foun

  return pd.read_html(str(table))[0]


      0                           1                          2  \
0  S No                Company name  No of students and branch   
1     1  Adani Mundra Solar Pvt Ltd           09(EE-06, ME-03)   
2     2     Mahindra Tractor Jaipur                     03(ME)   
3     3                    BKT Bhuj                    01 (ME)   

                    3          4  
0             Package    Session  
1     15079 per month  2021-2022  
2     12000 per month  2021-2022  
3  2.58 lac per annum  2021-2022  
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.
Placement data not found.


  return pd.read_html(str(table))[0]


       0                   1                   2                     3  \
0  S.No.     NAME OF STUDENT           POST HELD               ADDRESS   
1     1.  Shifa Kosar Ansari       Int. Designer  Sudarshan Shilp,Kota   
2      2         Shagun Jain  Designer Associate     Asian Paints,Kota   
3      3  Prerna Vishwakarma       Int. Designer  Sudarshan Shilp,Kota   
4      4      Deeksha Raghav       Int. Designer  Deal to Dealers,Kota   

                  4  
0  SALARY PER MONTH  
1          15,000/-  
2          20,000/-  
3          11,000/-  
4            8000/-  
Placement data not found.


  return pd.read_html(str(table))[0]


    2020-21 2020-21.1 2020-21.2   2019-20 2019-20.1 2019-20.2   2018-19  \
0  Pass-Out    Placed  Placed %  Pass-Out    Placed  Placed %  Pass-Out   
1        10        05        50        12        08        66        08   

  2018-19.1 2018-19.2  
0    Placed  Placed %  
1        02        25  
Placement data not found.


# Custom model using

In [4]:
!pip install -U bitsandbytes --upgrade

Collecting bitsandbytes
  Downloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl (137.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.43.3


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
# import bitsandbytes as bnb
import torch

model_id = "nishant-verma-7/sih_v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True,
                                            device_map="auto",)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = "What is the capital of France?"
result = pipe(prompt)
print(result[0]['generated_text'])

In [9]:
ans = pipe("Who are you?")

KeyboardInterrupt: 

In [1]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nishant-verma-7/sih_v1")
model = AutoModelForCausalLM.from_pretrained("nishant-verma-7/sih_v1", load_in_8bit=True)

token = tokenizer("What is the capital of France?", return_tensors='pt')
input_ids = token.to('cuda')
output = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]



What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of France. What is the capital of France? Paris is the capital of 

In [2]:
token = tokenizer("how many gov colleges are there in rajasthan?", return_tensors='pt')
input_ids = token.to('cuda')
output = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

how many gov colleges are there in rajasthan??
There are 19 government colleges in Rajasthan. The colleges are affiliated to Rajasthan Technical University, Kota. The colleges offer courses in Engineering, Pharmacy, Architecture, Polytechnic and Management.
The colleges are:
Government Polytechnic College, Alwar
Government Polytechnic College, Banswara
Government Polytechnic College, Bharatpur
Government Polytechnic College, Bhilwara
Government Polytechnic College, Dungarpur
Government Polytechnic College, Dausa
Government Polytechnic College, Dholpur
Government Polytechnic College, Jalore
Government Polytechnic College, Jodhpur
Government Polytechnic College, Jhalawar
Government Polytechnic College, Karauli
Government Polytechnic College, Kota
Government Polytechnic College, Pali
Government Polytechnic College, Rajsamand
Government Polytechnic College, Sirohi
Government Polytechnic College, Sawai Madhopur
Government Polytechnic College, Tonk
Government Polytechnic College, Udaipur
Gov