## Tuning on Train Data

In [6]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

In [7]:
!pip install datasets



In [8]:
from datasets import load_dataset


In [9]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
    max_seq_length = 512,
    device_map = "auto",
    dtype = torch.float16,
    load_in_4bit = True,
    token = "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",
)

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 32,
    lora_dropout = 0.05,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    bias = "none",
    use_gradient_checkpointing = 'unsloth',
    random_state = 42,
    use_rslora = False,
    loftq_config = None
)

Unsloth 2024.8 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [None]:
rag_dataset_prompt = """Below is a context that you answer questions based on, paired with an question that is asked and need to be answered. Write an answer to the question based on the context.

### context:
{}

### question:
{}

### answer:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    contexts = examples["context"]
    questions       = examples["question"]
    answers      = examples["answer"]
    texts = []
    for context, question, answer in zip(contexts, questions, answers):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = rag_dataset_prompt.format(context, question, answer) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

dataset = load_dataset("neural-bridge/rag-dataset-12000", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Downloading readme:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.1M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/5.79M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9600 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2400 [00:00<?, ? examples/s]

Map:   0%|          | 0/9600 [00:00<?, ? examples/s]

In [None]:
training_args = TrainingArguments(
    per_device_train_batch_size=4,  # Increase if memory allows
    gradient_accumulation_steps=8,  # Accumulate gradients to simulate a larger batch size
    warmup_steps=10,  # Slightly more warmup to stabilize training
    max_steps=100,  # Increase steps for better convergence, adjust as necessary
    learning_rate=2e-4,
    fp16=True,  # Use fp16 for mixed precision
    bf16=False,  # Ensure bf16 is off if not supported
    logging_steps=10,  # Reduce logging to avoid overhead
    optim="adamw_8bit",  # Use 8-bit Adam optimizer for efficiency
    weight_decay=0.01,
    lr_scheduler_type="cosine",  # Use cosine decay for smoother learning rate adjustment
    save_total_limit=2,  # Limit the number of saved checkpoints to save disk space
    output_dir="outputs",
    dataloader_num_workers=4,  # Increase if your system has more CPU cores
)

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,  # Keep as necessary
    dataset_num_proc=4,  # Increase number of processes if more CPU cores available
    packing=False,
    args=training_args,
)

Map (num_proc=4):   0%|          | 0/9600 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 9,600 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 8
\        /    Total batch size = 32 | Total steps = 100
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,2.3797
20,2.2375
30,2.1984
40,2.1625
50,2.1375
60,2.1281
70,2.1406
80,2.1594
90,2.2172
100,2.1406


In [None]:
trainer.save_state()

In [None]:
model.push_to_hub("ManaSaleh/unsloth_llama_3.1_lora_model", token = "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")
tokenizer.push_to_hub("ManaSaleh/unsloth_llama_3.1_lora_model", token = "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")

README.md:   0%|          | 0.00/590 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/ManaSaleh/unsloth_llama_3.1_lora_model


In [None]:
test_context = """CHI 2010 Workshop May 7 or 8, 2011 (final date to be announced)
Call for Participation
Large interactive displays are now common in public urban life. Museums, libraries, public plazas, and architectural facades already take advantage of interactive technologies for visual and interactive information presentation. Researchers and practitioners from such varied disciplines as art, architecture, design, HCI, and media theory have started to explore the potential and impact of large display installations in public urban settings.
This workshop aims to provide a platform for researchers and practitioners from different disciplines such as art, architecture, design, HCI, social sciences, and media theory to exchange insights on current research questions in the area. The workshop will focus on to the following topics: how to design large interactive display installations that promote engaging experiences and go beyond playful interaction, how different interaction models shape people’s experience in urban spaces, and how to evaluate their impact.
Workshop Goals & Topics
The goal of this one-day CHI 2011 workshop is to cross-fertilize insights from different disciplines, to establish a more general understanding of large interactive displays in public urban contexts, and to develop an agenda for future research directions in this area. Rather than focusing on paper presentations, this workshop aims to trigger active and dynamic group discussions around the following topics:
Beyond Playful Interaction
A number of studies found that large display installations invite for playful interaction but often fail to convey meaningful experiences related to content. This raises the following questions:
- How can we design installations that endure people’s attention past the initial novelty effect and direct the interest toward the content?
- What design strategies can be applied to promote an active individual and social exploration and discussion of the presented information?
A number of interaction techniques have been explored for large displays in public spaces ranging from interaction via cell phones, to direct-touch or full body interaction. We would like to discuss:
- How do different interaction methods shape people’s experience of large display installations in urban spaces?
- How do interaction methods differ from each other in terms of triggering interaction and engagement with the presented content?
Different quantitative and qualitative methods have been applied to evaluate people’s experience and use of large display installations in public spaces. During the workshop we would like to discuss:
- How can we evaluate the "success" of large display installations in urban spaces?
- How can particular aspects of public large display installations such as engagement be evaluated?
- What kind of evaluation methods are most effective in different progress stages (design phase/installment phase)?
For more details on the workshop please refer to our extended abstract and workshop proposal.
Submission Details
Submit a position paper (maximum 4 pages) to largedisplaysinurbanlife@gmail.com by January 14, 2011 using the CHI extended abstract format. The paper should describe experiences, works in progress, or theories around designing and/or evaluating large interactive displays in public urban settings. We plan to explore approaches and insights from different disciplines to this topic so submissions from art, architecture, design, HCI, media theory, and social science are highly encouraged. We welcome all methodological approaches and techniques centered around the topic of large interactive displays in urban life.
At least one author of each accepted position paper needs to register for the workshop and for one or more days of the CHI conference itself.
Important Dates
Submission Deadline: January 14, 2011
Notification of acceptance: February 11, 2011
Workshop: May 7 or 8, 2011 (final date to be announced)
WORKSHOP ORGANIZERS
Uta Hinrichs is a PhD candidate in computational media design at the Innovations in Visualization (InnoVis) research group of the University of Calgary, Canada, under the supervision of Sheelagh Carpendale. Her research focuses on the design and study of large display interfaces to support lightweight information exploration in walk-up-and-use scenarios
Nina Valkanova is doing her PhD at the interaction group of the Universitat Pompeu Fabra (UPF) in Barcelona, Spain under the supervision of Ernesto Arroyo. Her research interest focuses on the design of urban media facades exploring the intersections between scientific and artistic design knowledge.
Kai Kuikkaniemi is a project manager in Helsinki Institute for Information Technology. He is currently leading a national research project focusing on public displays. His earlier research has focused on exploring novel multiplayer game designs ranging from pervasive gaming to biosignal adaptive gaming.
Giulio Jacucci is a professor at the University of Helsinki at the Dept. of Computer Science and director of the Network Society Programme at the Helsinki Institute for Information Technology. He leads several interactional projects on interaction design and ubiquitous computing, and is co-founder of MultiTouch Ltd. a company commercializing products for multi-touch screens.
Sheelagh Carpendale.
Ernesto Arroyo holds an associate teaching position at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra (UPF) in Barcelona, Spain. He earned his PhD at MIT Media Lab in 2007. His research at the Interactive Technologies Group focuses on interaction design, visualization, and user-centered interfaces, enabling and preserving the fluency of user engagement.
Thanks to Uta Hinrich for sending this my way!"""

test_question = 'What is the main goal of the CHI 2011 workshop on large interactive displays in public urban contexts?'

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    rag_dataset_prompt.format(
        test_context, # instruction
        test_question, # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<|begin_of_text|>Below is a context that you answer questions based on, paired with an question that is asked and need to be answered. Write an answer to the question based on the context.

### context:
CHI 2010 Workshop May 7 or 8, 2011 (final date to be announced)
Call for Participation
Large interactive displays are now common in public urban life. Museums, libraries, public plazas, and architectural facades already take advantage of interactive technologies for visual and interactive information presentation. Researchers and practitioners from such varied disciplines as art, architecture, design, HCI, and media theory have started to explore the potential and impact of large display installations in public urban settings.
This workshop aims to provide a platform for researchers and practitioners from different disciplines such as art, architecture, design, HCI, social sciences, and media theory to exchange insights on current research questions in the area. The workshop will focus 

## Import tuning

In [10]:
import torch
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(model_name="ManaSaleh/unsloth_llama_3.1_lora_model")

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.1.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Unsloth 2024.8 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


## Test Data

In [6]:
testdata = load_dataset("neural-bridge/rag-dataset-12000", split = "test")

Downloading readme:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.1M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/5.79M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9600 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2400 [00:00<?, ? examples/s]

## Embeddings

In [11]:
!pip install sentence_transformers

Collecting sentence_transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.met

In [None]:
from sentence_transformers import SentenceTransformer
embedding = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)



In [None]:
testdata

Dataset({
    features: ['context', 'question', 'answer'],
    num_rows: 2400
})

In [None]:
testdata[0]['context']

'HOUSTON (Jan. 23, 2018) – Fabien Gabel, music director of the Quebec Symphony Orchestra, returns to Houston to lead the Houston Symphony in Ravel’s Daphnis and Chloé on Feb. 2 and 3 at 8 p.m. and Feb. 4 at 2:30 p.m. in Jones Hall.\nRecognized internationally as one of the stars of the new generation, Fabien Gabel is a regular guest of the Houston Symphony and an audience favorite. Known for conducting music with French influences, Gabel leads the Symphony in a program of French and American classics, including the breathtaking musical sunrise from Ravel’s Daphnis and Chloé and Bernstein’s comic operetta Overture to Candide as the Symphony joins other orchestras around the world for Leonard Bernstein at 100, a worldwide celebration of the composer’s 100th birthday. Also on the program is Habanera, a piece by French composer Louis Aubert.\nThe evening’s featured soloist, Colin Currie, is hailed as “the world’s finest and most daring percussionist” (Spectator). He performs regularly with

In [None]:
def embed_contexts(contexts):
    with torch.no_grad():
        embeddings_contexts = embedding.encode(contexts , convert_to_tensor=True)

    return embeddings_contexts

In [None]:
%%time
context_embeddings = embed_contexts(testdata['context'])

CPU times: user 4min 5s, sys: 7.19 s, total: 4min 12s
Wall time: 4min 50s


In [None]:
testdata = testdata.add_column("context_embeddings", context_embeddings.cpu().numpy().tolist())

In [None]:
testdata

Dataset({
    features: ['context', 'question', 'answer', 'context_embeddings'],
    num_rows: 2400
})

In [None]:
print(f"Length of context_embeddings list: {len(testdata['context_embeddings'])}")
print(f"Length of the first embedding (dimensionality): {len(testdata[0]['context_embeddings'])}")

Length of context_embeddings list: 2400
Length of the first embedding (dimensionality): 768


In [12]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
testdata.save_to_disk('/content/drive/MyDrive/test_with_embeddings')

Saving the dataset (0/1 shards):   0%|          | 0/2400 [00:00<?, ? examples/s]

In [13]:
from datasets import load_from_disk
testdata = load_from_disk('/content/drive/MyDrive/test_with_embeddings')

## Qdrant

In [14]:
!pip install qdrant-client

Collecting qdrant-client
  Downloading qdrant_client-1.11.0-py3-none-any.whl.metadata (10 kB)
Collecting grpcio-tools>=1.41.0 (from qdrant-client)
  Downloading grpcio_tools-1.65.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting httpx>=0.20.0 (from httpx[http2]>=0.20.0->qdrant-client)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting portalocker<3.0.0,>=2.7.0 (from qdrant-client)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-tools>=1.41.0->qdrant-client)
  Downloading protobuf-5.27.3-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting grpcio>=1.41.0 (from qdrant-client)
  Downloading grpcio-1.65.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.3 kB)
Collecting httpcore==1.* (from httpx>=0.20.0->httpx[http2]>=0.20.0->qdrant-client)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0

In [15]:
from qdrant_client import QdrantClient
from qdrant_client.http import models

In [7]:
from datasets import load_from_disk
testdata = load_from_disk('/content/drive/MyDrive/test_with_embeddings')

In [16]:
client = QdrantClient(path="/content/drive/MyDrive/qdrant")

In [None]:
client.recreate_collection(
    collection_name="my_collection",
    vectors_config=models.VectorParams(
        size=embedding.get_sentence_embedding_dimension(),
        distance=models.Distance.COSINE
    )
)

  client.recreate_collection(


True

In [None]:
client.upsert(
    collection_name="my_collection",
    points=[
        models.PointStruct(
            id=idx,
            vector=testdata['context_embeddings'][idx],
            payload={
                "context": context,
                "question": testdata["question"][idx]
            }
        ) for idx, context in enumerate(testdata["context"])
    ]
)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

## RAG

In [12]:
!pip install sentence_transformers

Collecting sentence_transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.met

In [22]:
!pip install einops

Collecting einops
  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.8.0


In [23]:
from sentence_transformers import SentenceTransformer
embedding = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)

model.safetensors:   0%|          | 0.00/547M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.19k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

In [17]:
client = QdrantClient(path="/content/drive/MyDrive/qdrant")

In [17]:
def search(query):
  """
  Searches for the most relevant context based on the given query.

  Args:
    query: The query string.

  Returns:
    A tuple containing the most relevant context and its corresponding question.
  """
  with torch.no_grad():
      query_embedding = embedding.encode([query], convert_to_tensor=True)

  results = client.search(
      collection_name="my_collection",
      query_vector=query_embedding[0].tolist(),
      limit=1
  )

  relevant_contexts = [(result.payload["context"], result.payload["question"]) for result in results]

  return relevant_contexts

In [18]:
testdata[54]['question']

'What are some of the benefits of investing in a holiday-let property according to Mark Baker and Lee from OnPoint Mortgages?'

In [24]:
query = "What are some of the benefits of investing in a holiday-let property according to Mark Baker and Lee from OnPoint Mortgages??"
top_results = search(query)

for context, question in top_results:
    print(f"Context: {context}")
    print(f"Question: {question}")

Context: Holiday Lets. Peering through the misty windows on the top deck of the bus from Shanklin to Ventnor on the Isle of Wight, trying to make out the sea between the wind-swept trees and the pretty thatched cottages on a cold damp May day.
This is how Mark Baker (53) recalls the moment when he first began to seriously think about investing in a holiday let.
“We’d had a terrible holiday weather-wise earlier on in the year and we were running out of indoor activities to do with the kids and that’s when we decided to catch the bus and do a tour of the Island,“ says Mark, an IT manager from Chobham in Surrey. “My mind had begun to wander about retirement as I was looking at the houses – I know it’s still a few years off but I do need to start planning for the future. I can’t see myself living on the Island permanently when I retire but yet I am pulled to this beautiful place.
“That’s when I thought about holiday homes and Brexit – my feeling is that more people may start holidaying in 

In [31]:
from unsloth import FastLanguageModel
from transformers import TextStreamer, AutoTokenizer

def rag(query):
    """
    Retrieves the most relevant contexts and generates an answer to the query using Unsloth.

    Args:
        query: The query string.

    Prints:
        The generated answer.
    """
    rag_dataset_prompt = ("Context: {0}\n\nQuestion: {1}\n\nAnswer:")

    top_results = search(query)

    if not top_results:
        print("No relevant contexts found.")
        return

    context, question = top_results[0]

    inputs = tokenizer(
        [
            rag_dataset_prompt.format(
                context,
                query
            )
        ], return_tensors="pt").to("cuda")

    generated_answer = model.generate(**inputs, max_new_tokens=2048)

    answer = tokenizer.decode(generated_answer[0], skip_special_tokens=True)

    print(f"Generated Answer: {answer}")

In [32]:
rag(query)

Generated Answer: Context: Holiday Lets. Peering through the misty windows on the top deck of the bus from Shanklin to Ventnor on the Isle of Wight, trying to make out the sea between the wind-swept trees and the pretty thatched cottages on a cold damp May day.
This is how Mark Baker (53) recalls the moment when he first began to seriously think about investing in a holiday let.
“We’d had a terrible holiday weather-wise earlier on in the year and we were running out of indoor activities to do with the kids and that’s when we decided to catch the bus and do a tour of the Island,“ says Mark, an IT manager from Chobham in Surrey. “My mind had begun to wander about retirement as I was looking at the houses – I know it’s still a few years off but I do need to start planning for the future. I can’t see myself living on the Island permanently when I retire but yet I am pulled to this beautiful place.
“That’s when I thought about holiday homes and Brexit – my feeling is that more people may st

In [None]:
testdata[54]['answer']

'Some of the benefits of investing in a holiday-let property include the ability to offset full mortgage interest repayments against tax, deducting the cost of doing up a property from pre-tax profits, and avoiding council tax and local business rates. Additionally, the property can be used for personal holidays for up to 20 weeks a year. The potential increase in domestic tourism due to Brexit and the success of service accommodation and Airbnb are also seen as advantages.'

## Class

In [1]:
from datasets import load_from_disk
from unsloth import FastLanguageModel
import torch
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from transformers import AutoTokenizer

class Search:
    def __init__(self):
        # Initialize the model and tokenizer
        self.model, self.tokenizer = FastLanguageModel.from_pretrained(model_name="ManaSaleh/unsloth_llama_3.1_lora_model")

        # Prepare the model for inference
        self.model = FastLanguageModel.for_inference(self.model)

        # Move model to GPU
        self.model.to("cuda")

        # Load other required components
        self.testdata = load_from_disk('/content/drive/MyDrive/test_with_embeddings')
        self.client = QdrantClient(path="/content/drive/MyDrive/qdrant")
        self.embedding = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)

    def search(self, query):
        """
        Searches for the most relevant context based on the given query.

        Args:
            query: The query string.

        Returns:
            A tuple containing the most relevant context and its corresponding question.
        """
        with torch.no_grad():
            query_embedding = self.embedding.encode([query], convert_to_tensor=True)

        results = self.client.search(
            collection_name="my_collection",
            query_vector=query_embedding[0].tolist(),
            limit=1
        )

        relevant_contexts = [(result.payload["context"], result.payload["question"]) for result in results]

        return relevant_contexts

    def rag(self, query):
        """
        Retrieves the most relevant contexts and generates an answer to the query using Unsloth.

        Args:
            query: The query string.

        Returns:
            The generated answer as a string.
        """
        rag_dataset_prompt = "Context: {0}\n\nQuestion: {1}\n\nAnswer:"

        top_results = self.search(query)

        if not top_results:
            print("No relevant contexts found.")
            return ""

        context, question = top_results[0]

        inputs = self.tokenizer(
            [rag_dataset_prompt.format(context, query)],
            return_tensors="pt"
        ).to("cuda")

        generated_answer = self.model.generate(**inputs, max_new_tokens=4096)

        answer = self.tokenizer.decode(generated_answer[0], skip_special_tokens=True)

        return answer


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [2]:
searcher = Search()
query = "What are some of the benefits of investing in a holiday-let property according to Mark Baker and Lee from OnPoint Mortgages??"

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.1.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2024.8 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [3]:
answer = searcher.rag(query)
print(f"Generated Answer: {answer}")

Generated Answer: Context: Holiday Lets. Peering through the misty windows on the top deck of the bus from Shanklin to Ventnor on the Isle of Wight, trying to make out the sea between the wind-swept trees and the pretty thatched cottages on a cold damp May day.
This is how Mark Baker (53) recalls the moment when he first began to seriously think about investing in a holiday let.
“We’d had a terrible holiday weather-wise earlier on in the year and we were running out of indoor activities to do with the kids and that’s when we decided to catch the bus and do a tour of the Island,“ says Mark, an IT manager from Chobham in Surrey. “My mind had begun to wander about retirement as I was looking at the houses – I know it’s still a few years off but I do need to start planning for the future. I can’t see myself living on the Island permanently when I retire but yet I am pulled to this beautiful place.
“That’s when I thought about holiday homes and Brexit – my feeling is that more people may st