In [1]:
from google.colab import drive
drive.mount('/content/drive') # Mount to the standard location

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Now you can access your project directory:

import os
project_path = '/content/drive/MyDrive/Projects/RAG_HandsOn/RAGwithLlama'
os.chdir(project_path)

In [3]:
# !pip3 install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install -q llama-index-embeddings-huggingface
!pip install -q llama-index peft auto-gptq optimum bitsandbytes transformers

In [4]:
import os
import torch
device = "cuda" if torch.cuda.is_available else "cpu"
device

'cuda'

In [5]:
from google.colab import userdata
HF_TOKEN = userdata.get("HF_TOKEN")

In [6]:
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

#### Setting up knowledge base

In [7]:
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.llm = None
Settings.chunk_size = 128
Settings.chunk_overlap = 15

LLM is explicitly disabled. Using MockLLM.


#### Fetching Document

In [8]:
documents = SimpleDirectoryReader("Articles").load_data()

In [9]:
print(type(documents))
documents

<class 'list'>


[Document(id_='ae3bd3fa-d774-45b8-852d-415f50964338', embedding=None, metadata={'page_label': '1', 'file_name': 'Doc1.pdf', 'file_path': '/content/drive/MyDrive/Projects/RAG_HandsOn/RAGwithLlama/Articles/Doc1.pdf', 'file_type': 'application/pdf', 'file_size': 4053803, 'creation_date': '2024-11-14', 'last_modified_date': '2024-11-09'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='YOLO-Z: Improving small object detection in YOLOv5 for\nautonomous vehicles\nAduen Benjumea* Izzeddin Teeti† Fabio Cuzzolin† Andrew Bradley*\n17065125@brookes.ac.uk, 19136994@brookes.ac.uk,\nfabio.cuzzolin@brookes.ac.uk, abradley@brookes.ac.uk\nAbstract\nAs autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate\ndetectors. W

In [10]:
print(len(documents))

11


In [11]:
for doc in documents:
    print("----------------------------------------------------------------------------------")
    print(doc.text)

----------------------------------------------------------------------------------
YOLO-Z: Improving small object detection in YOLOv5 for
autonomous vehicles
Aduen Benjumea* Izzeddin Teeti† Fabio Cuzzolin† Andrew Bradley*
17065125@brookes.ac.uk, 19136994@brookes.ac.uk,
fabio.cuzzolin@brookes.ac.uk, abradley@brookes.ac.uk
Abstract
As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate
detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away,
image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy
a small pixel area in the input image) a genuinely challenging task for machines and a wide-open research ﬁeld. This
study explores how the popular YOLOv5 object detector can be modiﬁed to improve its performance in detecting
smaller objects, with a particular application in autonomous racing. To achieve this, we investigate

In [12]:
print(documents[-2].text)

Note that, while we have focused here on modifying the popular YOLOv5 model, the methods and techniques we
explored have a potential to be developed into an entirely original model structure.
Finally, while this study shows signiﬁcant the empirical gains of the proposed architectural changes, the consistency
and generality of the results could and should be further investigated. For instance, the analysis would greatly beneﬁt
from further testing with different datasets, and challenges that might for instance come from detecting trafﬁc signs.
While we have demonstrated the usefulness of the various techniques introduced, these can only be reﬁned and better
understood when applied to a varied set of circumstances and settings. Doing so would be a signiﬁcant step towards a
more robust solution to small object detection. Additionally, there many more directions and techniques that would ﬁt
nicely in this subject and which have not been considered in this study, but these will remain a sub

In [13]:
import re

dropper = re.search("References(.*)", documents[-2].text, re.DOTALL)
if dropper:
    dropper = dropper.group(1)
    # print(dropper)
    documents[-2].text = documents[-2].text.replace("References"+dropper, "")
    documents[-2].text = documents[-2].text.strip()
print(documents[-2].text)

Note that, while we have focused here on modifying the popular YOLOv5 model, the methods and techniques we
explored have a potential to be developed into an entirely original model structure.
Finally, while this study shows signiﬁcant the empirical gains of the proposed architectural changes, the consistency
and generality of the results could and should be further investigated. For instance, the analysis would greatly beneﬁt
from further testing with different datasets, and challenges that might for instance come from detecting trafﬁc signs.
While we have demonstrated the usefulness of the various techniques introduced, these can only be reﬁned and better
understood when applied to a varied set of circumstances and settings. Doing so would be a signiﬁcant step towards a
more robust solution to small object detection. Additionally, there many more directions and techniques that would ﬁt
nicely in this subject and which have not been considered in this study, but these will remain a sub

In [14]:
del documents[-1]

In [15]:
len(documents)

10

In [16]:
index = VectorStoreIndex.from_documents(documents)

#### Setting up retriever

In [17]:
# Number of docs to retireve
top_k = 3

# configure retriever
retriever = VectorIndexRetriever(index=index, similarity_top_k=top_k)

In [18]:
# Assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)])

In [19]:
query = "How is object detection process implemented?"
response = query_engine.query(query)

In [20]:
response.source_nodes[0].text

'For instance, the analysis would greatly beneﬁt\nfrom further testing with different datasets, and challenges that might for instance come from detecting trafﬁc signs.\nWhile we have demonstrated the usefulness of the various techniques introduced, these can only be reﬁned and better\nunderstood when applied to a varied set of circumstances and settings. Doing so would be a signiﬁcant step towards a\nmore robust solution to small object detection.'

In [21]:
context = "Context: \n"
for i in range(top_k):
    context = context+response.source_nodes[i].text + "\n\n"

print(context)

Context: 
For instance, the analysis would greatly beneﬁt
from further testing with different datasets, and challenges that might for instance come from detecting trafﬁc signs.
While we have demonstrated the usefulness of the various techniques introduced, these can only be reﬁned and better
understood when applied to a varied set of circumstances and settings. Doing so would be a signiﬁcant step towards a
more robust solution to small object detection.

1 Introduction
Detecting small objects in images can be challenging, mainly due to limited resolution and context information avail-
able to a model [2]. Many modern systems that implement object detection do so at real-time speeds, setting speciﬁc
requirements in computational resources, especially if the processing is to happen on the same device that captures
the images.

Additionally, there many more directions and techniques that would ﬁt
nicely in this subject and which have not been considered in this study, but these will remai

### Import fine-tuned model

In [22]:
!pip install -q auto_gptq

In [23]:
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "true"

In [24]:
from huggingface_hub import login

login(HF_TOKEN)

In [25]:
# Load model directly
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

In [27]:
# model_name = "google/gemma-2-2b"
model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name, tokens = True)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
# model = model.to(device)

`low_cpu_mem_usage` was None, now default to True since model is quantized.


model.safetensors:  36%|###5      | 1.48G/4.16G [00:00<?, ?B/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.2-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_pr

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]



In [28]:
config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = True)

adapter_config.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/8.40M [00:00<?, ?B/s]

In [29]:
model.to(device)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096, padding_idx=0)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralSdpaAttention(
              (rotary_emb): MistralRotaryEmbedding()
              (k_proj): QuantLinear()
              (o_proj): QuantLinear()
              (q_proj): lora.QuantLinear(
                (base_layer): QuantLinear()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): Param

#### Generate response (no context)

#### Create a Prompt

In [30]:
inst = """DocDiscuss, functioning as a virtual reporter/consultant on youtube, communicates in clear, accesible language, escalating to technical depth upon request.It reacts to feedbacksaptly and ends reponse with its signation '-DocDiscuss'.
DocDiscuss will tailor the length of its response to match the viewers comment, providing concise knowledge to brief expressions of gratitude or feedback, thus keeping the interaction natuaral and engaging.
Please respond to the following column"""


In [31]:
prompt_template = lambda comment: f'''[INST] {inst} \n{comment}[/INST]'''

In [32]:
comment = "What methods have been used to improve the detection of smaller objects. "

In [33]:
prompt = prompt_template(comment)
print(prompt)

[INST] DocDiscuss, functioning as a virtual reporter/consultant on youtube, communicates in clear, accesible language, escalating to technical depth upon request.It reacts to feedbacksaptly and ends reponse with its signation '-DocDiscuss'.
DocDiscuss will tailor the length of its response to match the viewers comment, providing concise knowledge to brief expressions of gratitude or feedback, thus keeping the interaction natuaral and engaging.
Please respond to the following column 
What methods have been used to improve the detection of smaller objects. [/INST]


In [34]:
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096, padding_idx=0)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralSdpaAttention(
              (rotary_emb): MistralRotaryEmbedding()
              (k_proj): QuantLinear()
              (o_proj): QuantLinear()
              (q_proj): lora.QuantLinear(
                (base_layer): QuantLinear()
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): Param

In [35]:
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(input_ids = inputs["input_ids"].to(device),
                        attention_mask = inputs["attention_mask"].to(device),
                        max_new_tokens = 280)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


In [36]:
print(output)

tensor([[    1,   733, 16289, 28793, 18402,  3278, 12804, 28725, 26945,   390,
           264,  8252, 19044, 28748,  6524,   517,   440,   356,   368, 28707,
          4276, 28725,  1960, 23860,   297,  3081, 28725,   932,   274,  1070,
          3842, 28725, 19491,  1077,   298, 10067,  8478,  3714,  2159, 28723,
          1313,   312,  9664,   298, 12139, 28713,  1459,   346,   304,  9675,
           312,  1405,   395,   871,  1492,   352, 16763,  9441,  3278, 12804,
          4135,    13,  9441,  3278, 12804,   622,  8675,   271,   272,  3575,
           302,   871,  2899,   298,  2918,   272, 24886,  4517, 28725,  7501,
          3078,   864,  4788,   298,  6817, 16948,   302, 26485,   442, 12139,
         28725,  5884,  7603,   272, 11186,  3044, 11042,   282,   304, 19639,
         28723,    13, 12069,  9421,   298,   272,  2296,  4419, 28705,    13,
          3195,  5562,   506,   750,  1307,   298,  4916,   272, 15109,   302,
          7000,  6697, 28723,   733, 28748, 16289, 2

In [37]:
print(tokenizer.batch_decode(output[0]))

['<s>', '[', 'INST', ']', 'Doc', 'Dis', 'cuss', ',', 'functioning', 'as', 'a', 'virtual', 'reporter', '/', 'cons', 'ult', 'ant', 'on', 'you', 't', 'ube', ',', 'commun', 'icates', 'in', 'clear', ',', 'acc', 'es', 'ible', 'language', ',', 'escal', 'ating', 'to', 'technical', 'depth', 'upon', 'request', '.', 'It', 're', 'acts', 'to', 'feedback', 's', 'apt', 'ly', 'and', 'ends', 're', 'ponse', 'with', 'its', 'sign', 'ation', "'-", 'Doc', 'Dis', 'cuss', "'.", '\n', 'Doc', 'Dis', 'cuss', 'will', 'tail', 'or', 'the', 'length', 'of', 'its', 'response', 'to', 'match', 'the', 'viewers', 'comment', ',', 'providing', 'conc', 'ise', 'knowledge', 'to', 'brief', 'expressions', 'of', 'gratitude', 'or', 'feedback', ',', 'thus', 'keeping', 'the', 'interaction', 'nat', 'uar', 'al', 'and', 'engaging', '.', '\n', 'Please', 'respond', 'to', 'the', 'following', 'column', '', '\n', 'What', 'methods', 'have', 'been', 'used', 'to', 'improve', 'the', 'detection', 'of', 'smaller', 'objects', '.', '[', '/', 'INST'

#### Generate response (with context)

In [38]:
prompt_template_with_context = lambda context, comment: f'''[INST] {inst}
 \n{context}
 Please respond to the following comment. Use the context above if it is helpful.
 \n{comment}[/INST]'''

In [39]:
prompt = prompt_template_with_context(context, comment)
print(prompt)

[INST] DocDiscuss, functioning as a virtual reporter/consultant on youtube, communicates in clear, accesible language, escalating to technical depth upon request.It reacts to feedbacksaptly and ends reponse with its signation '-DocDiscuss'.
DocDiscuss will tailor the length of its response to match the viewers comment, providing concise knowledge to brief expressions of gratitude or feedback, thus keeping the interaction natuaral and engaging.
Please respond to the following column
 
Context: 
For instance, the analysis would greatly beneﬁt
from further testing with different datasets, and challenges that might for instance come from detecting trafﬁc signs.
While we have demonstrated the usefulness of the various techniques introduced, these can only be reﬁned and better
understood when applied to a varied set of circumstances and settings. Doing so would be a signiﬁcant step towards a
more robust solution to small object detection.

1 Introduction
Detecting small objects in images can

In [41]:
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(input_ids = inputs["input_ids"].to(device),
                        attention_mask = inputs["attention_mask"].to(device),
                        max_new_tokens = 280)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


['<s>', '[', 'INST', ']', 'Doc', 'Dis', 'cuss', ',', 'functioning', 'as', 'a', 'virtual', 'reporter', '/', 'cons', 'ult', 'ant', 'on', 'you', 't', 'ube', ',', 'commun', 'icates', 'in', 'clear', ',', 'acc', 'es', 'ible', 'language', ',', 'escal', 'ating', 'to', 'technical', 'depth', 'upon', 'request', '.', 'It', 're', 'acts', 'to', 'feedback', 's', 'apt', 'ly', 'and', 'ends', 're', 'ponse', 'with', 'its', 'sign', 'ation', "'-", 'Doc', 'Dis', 'cuss', "'.", '\n', 'Doc', 'Dis', 'cuss', 'will', 'tail', 'or', 'the', 'length', 'of', 'its', 'response', 'to', 'match', 'the', 'viewers', 'comment', ',', 'providing', 'conc', 'ise', 'knowledge', 'to', 'brief', 'expressions', 'of', 'gratitude', 'or', 'feedback', ',', 'thus', 'keeping', 'the', 'interaction', 'nat', 'uar', 'al', 'and', 'engaging', '.', '\n', 'Please', 'respond', 'to', 'the', 'following', 'column', '\n', '', '\n', 'Context', ':', '', '\n', 'For', 'instance', ',', 'the', 'analysis', 'would', 'greatly', 'b', 'ene', 'ﬁ', 't', '\n', 'from'

In [42]:
print(tokenizer.decode(output[0]))

<s> [INST] DocDiscuss, functioning as a virtual reporter/consultant on youtube, communicates in clear, accesible language, escalating to technical depth upon request.It reacts to feedbacksaptly and ends reponse with its signation '-DocDiscuss'.
DocDiscuss will tailor the length of its response to match the viewers comment, providing concise knowledge to brief expressions of gratitude or feedback, thus keeping the interaction natuaral and engaging.
Please respond to the following column
 
Context: 
For instance, the analysis would greatly beneﬁt
from further testing with different datasets, and challenges that might for instance come from detecting trafﬁc signs.
While we have demonstrated the usefulness of the various techniques introduced, these can only be reﬁned and better
understood when applied to a varied set of circumstances and settings. Doing so would be a signiﬁcant step towards a
more robust solution to small object detection.

1 Introduction
Detecting small objects in images