In this notebook we are going to run local LLM "Llama-8B-Instruct".

We will use UnslothAI for this:  https://github.com/unslothai/

In [15]:
%%capture
!pip install unsloth "xformers==0.0.28.post2"

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

In [16]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 #  5555
dtype = None #
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Shaagun/mistral",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = "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", # You need to get the token from your huggingface account if you want to access Gated models such as Llama-3 from Meta
)

==((====))==  Unsloth 2024.12.3: Fast Mistral patching. Transformers:4.46.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [17]:
# Where is our model?
device = model.device
device

device(type='cuda', index=0)

In [18]:
FastLanguageModel.for_inference(model)# First set the model to the inference mode
#  model.eval()

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32768, 4096, padding_idx=770)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): M

In [19]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

In [20]:
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Write amazing thing about University of New Haven's students", # instruction
        "We are called Chargers! and our mascot is Horse", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to(device)

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
output = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
response_1 = tokenizer.decode(output[0], skip_special_tokens=True)

<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Write amazing thing about University of New Haven's students

### Input:
We are called Chargers! and our mascot is Horse

### Response:
The University of New Haven's students are not only known for their spirit, as represented by their mascot, the Chargers, but they also embody a unique sense of determination and resilience that sets them apart. Just like their fearless mascot, these students charge forward with passion, curiosity, and a relentless pursuit of knowledge. They are a diverse and vibrant community, constantly pushing the boundaries of what is possible and making a significant impact in their respective fields. It's truly amazing to witness the potential and dedication of these Chargers!</s>


In [21]:
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Santrauka", # instruction
        "Prašome paaiškinti apie \"Shaagun Suresh\"",# input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to(device)

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 555)

<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Santrauka

### Input:
Prašome paaiškinti apie "Shaagun Suresh"

### Response:
"Shaagun Suresh yra indijus, jis pradėjo karierą kaip programatorius. Jis turi bachelor's diplomą iš Fakulteto mokslų ir technologijų, o dabar yra vice prezidentas kodėlis 'X'."</s>


In [27]:
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Kiek planetų yra Saulės sistemoje?", # instruction
        "rumpai atsakyk skaičiais.", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to(device)

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
output = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
response_2 = tokenizer.decode(output[0], skip_special_tokens=True)

<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Kiek planetų yra Saulės sistemoje?

### Input:
rumpai atsakyk skaičiais.

### Response:
Saulės sistemoje yra 8 planetų.</s>


# **EVALUATION:**

In [11]:
from sentence_transformers import SentenceTransformer, util


In [30]:

# Reference Text from chat gpt
reference = "University of New Haven's students are proudly known as Chargers, embodying the spirit of drive and determination in everything they do. With a strong sense of community and pride, Chargers are known for their academic excellence, innovative thinking, and commitment to making a positive impact. Represented by their spirited mascot, the horse, they charge forward with unparalleled energy, whether it's in academics, athletics, or community service. At the University of New Haven, students not only strive for success but also inspire those around them to achieve greatness, making the Charger legacy truly remarkable!"

hypothesis = response_1  # Generated response from the model

# Load pre-trained model for semantic similarity
model = SentenceTransformer('all-MiniLM-L6-v2')

# Compute embeddings
embeddings = model.encode([reference, hypothesis])

# Calculate cosine similarity
similarity = util.cos_sim(embeddings[0], embeddings[1])

print(f"Semantic Similarity: {similarity.item():.2f}")


Semantic Similarity: 0.78


In [29]:

# Reference Text from chat gpt
reference = "Saulės sistemoje yra aštuonios."

hypothesis = response_2  # Generated response from the model

# Load pre-trained model for semantic similarity
model = SentenceTransformer('all-MiniLM-L6-v2')

# Compute embeddings
embeddings = model.encode([reference, hypothesis])

# Calculate cosine similarity
similarity = util.cos_sim(embeddings[0], embeddings[1])

print(f"Semantic Similarity: {similarity.item():.2f}")

Semantic Similarity: 0.34


**1**