<a href="https://colab.research.google.com/github/KaifAhmad1/Agri-Llama/blob/main/RAG_Improvement.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -qU chromadb
!pip install -qU langchain
!pip install -qU transformers
!pip install -qU sentence_transformers
!pip install -qU huggingface_hub
!pip install -qU tiktoken
!pip install -qU accelerate
!pip install -qU bitsandbytes
!pip install -qU unstructured unstructured[docx]

In [2]:
!pip show unstructured unstrucuted[docx]

[0mName: unstructured
Version: 0.12.3
Summary: A library that prepares raw documents for downstream ML tasks.
Home-page: https://github.com/Unstructured-IO/unstructured
Author: Unstructured Technologies
Author-email: devops@unstructuredai.io
License: Apache-2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: backoff, beautifulsoup4, chardet, dataclasses-json, emoji, filetype, langdetect, lxml, nltk, numpy, python-iso639, python-magic, rapidfuzz, requests, tabulate, typing-extensions, unstructured-client, wrapt
Required-by: 


In [3]:
import re
import os
from google.colab import drive
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    pipeline
)

In [4]:
hf_auth = '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'
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
from torch import cuda, bfloat16
import transformers
model_id = 'mistralai/Mistral-7B-v0.1'
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

In [6]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

In [7]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    device_map='auto',
    quantization_config=bnb_config,
    use_auth_token=hf_auth,
    low_cpu_mem_usage=True
)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [8]:
tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
    )
tokenizer.pad_token = tokenizer.eos_token

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


**Stopping Criteria:**

In [9]:
stop_list = ['\nHuman:', '\n```\n']
stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list]
stop_token_ids

[[1, 28705, 13, 28769, 6366, 28747], [1, 28705, 13, 13940, 28832, 13]]

In [10]:
import torch
stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
stop_token_ids

[tensor([    1, 28705,    13, 28769,  6366, 28747], device='cuda:0'),
 tensor([    1, 28705,    13, 13940, 28832,    13], device='cuda:0')]

In [11]:
from transformers import StoppingCriteria, StoppingCriteriaList

# Define a custom stopping criteria class
class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        # Check if the end of input_ids matches any stop_token_ids
        for stop_ids in stop_token_ids:
            if torch.equal(input_ids[0][-len(stop_ids):], stop_ids):
                return True
        return False

# Create a StoppingCriteriaList with the custom stopping criteria
stopping_criteria = StoppingCriteriaList([StopOnTokens()])

In [12]:
# Set up text generation pipeline
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task='text-generation',
    stopping_criteria=stopping_criteria,
    temperature=0.1,
    max_new_tokens=512,
    repetition_penalty=1.1
)

In [13]:
result = generate_text("What are the advantages and disadvantages of selective breeding in fish farming?")
print('''
{}
'''.format(result))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[{'generated_text': 'What are the advantages and disadvantages of selective breeding in fish farming?\n\nAdvantages: Selective breeding is a process that can be used to improve the quality of fish. It can also help to increase the number of fish that are produced. Disadvantages: Selective breeding can lead to the loss of genetic diversity, which can make it difficult for fish to adapt to changing conditions.\n\n## How do you breed fish?\n\nThere are many ways to breed fish. One way is to put two fish together in a tank and hope they will mate. Another way is to use a hormone called gonadotropin-releasing hormone (GnRH) to stimulate the fish’s reproductive system. GnRH is available as an injection or implant.\n\n## Can you breed fish with different species?\n\nYes, you can breed fish with different species. However, there are some things to keep in mind when doing so. First, make sure that the fish are compatible. Second, be aware of the potential risks involved. Finally, be prepared t

In [14]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)
llm(prompt="What are the advantages and disadvantages of selective breeding in fish farming?")

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'\n\nAdvantages: Selective breeding is a process that can be used to improve the quality of fish. It can also help to increase the number of fish that are produced. Disadvantages: Selective breeding can lead to the loss of genetic diversity, which can make it difficult for fish to adapt to changing conditions.\n\n## How do you breed fish?\n\nThere are many ways to breed fish. One way is to put two fish together in a tank and hope they will mate. Another way is to use a hormone called gonadotropin-releasing hormone (GnRH) to stimulate the fish’s reproductive system. GnRH is available as an injection or implant.\n\n## Can you breed fish with different species?\n\nYes, you can breed fish with different species. However, there are some things to keep in mind when doing so. First, make sure that the fish are compatible. Second, be aware of the potential risks involved. Finally, be prepared to take care of the offspring.\n\n## Is it possible to breed fish with different colors?\n\nIt is poss

In [15]:
from langchain_community.document_loaders import DirectoryLoader

In [16]:
loader = DirectoryLoader('/content/drive/MyDrive/QnA Pair Documents', glob="**/*.docx")

In [17]:
documents = loader.load()

In [18]:
print(len(documents))

8


In [19]:
documents[0]

Document(page_content='3GPP TS 23.216 V18.0.0 (2023-06) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Single Radio Voice Call Continuity (SRVCC); Stage 2 (Release 18) The present document has been developed within the 3rd Generation Partnership Project (3GPP TM) and may be further elaborated for the purposes of 3GPP.\nThe present document has not been subject to any approval process by the 3GPP Organizational Partners and shall not be implemented.\nThis Specification is provided for future development work within 3GPP only. The Organizational Partners accept no liability for any use of this Specification.\nSpecifications and Reports for implementation of the 3GPP TM system should be obtained via the 3GPP Organizational Partners\' Publications Offices.\n\n3GPP TS 23.216 V18.0.0 (2023-06)\n14\nRelease 18\n\n3GPP Postal address 3GPP support office address 650 Route des Lucioles - Sophia Antipolis Valbonne - FRANCE Te

In [20]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)

In [23]:
all_splits[50]

Document(page_content='UE initiates the IMS emergency session as specified in TS\xa023.167\xa0[28] and TS\xa023.501\xa0[48]. For facilitating session transfer (SRVCC) of the IMS emergency session to the CS domain, the IMS emergency session needs to be anchored in the serving IMS (i.e., in visited PLMN when roaming) as specified in TS\xa023.237\xa0[14].\n\nThe NG-RAN initiates the SRVCC procedure as specified for regular Voice over IMS session.\n\nIf the UE has an Emergency PDU Session established, the AMF shall send the Emergency Indication and Equipment Identifier to the MSC Server enhanced for SRVCC via the MME_SRVCC. MSC Server then initiates the IMS service continuity procedure with the locally configured E-STN-SR to the serving IMS. When handover of the emergency session has been completed, the MSC Server may initiate location continuity procedures for the UE as defined in TS\xa023.271\xa0[29].', metadata={'source': '/content/drive/MyDrive/QnA Pair Documents/23216-i00.docx'})

In [24]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

In [26]:
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)

In [27]:
# Vector Store Backed Retriever:
retriever = vectorstore.as_retriever()

In [28]:
from langchain.chains import ConversationalRetrievalChain
chain = ConversationalRetrievalChain.from_llm(llm, retriever, return_source_documents=True)

In [29]:
chat_history = []
query = "What considerations should the HSS follow during emergency registrations?"
result = chain({"question": query, "chat_history": chat_history})
print(result['answer'])

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


OutOfMemoryError: CUDA out of memory. Tried to allocate 170.88 GiB. GPU 0 has a total capacty of 14.75 GiB of which 8.48 GiB is free. Process 127431 has 6.27 GiB memory in use. Of the allocated memory 5.99 GiB is allocated by PyTorch, and 153.14 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF