# **Connect the Colab instance to google drive**

In [1]:
# Code to mount Google Drive at Colab Notebook instance
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Workaround to avoid following error at notebook
# NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968
import locale
locale.getpreferredencoding = lambda: "UTF-8"

# **Install all required libraries**

1. Huggingface libraries to use Mistral 7B LLM (Open Source LLM)

2. LangChain library to call LLM to generate reponse based on the prompt

In [3]:
# Huggingface libraries to run LLM.
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes

# LangChain related libraries
!pip install -q -U langchain

# Open-source pure-python PDF library capable of splitting, merging, cropping,
# and transforming the pages of PDF files
!pip install -q -U pypdf

# Python framework for state-of-the-art sentence, text and image embeddings.
!pip install -q -U sentence-transformers

# FAISS Vector Databses specific Libraries
!pip install -q -U faiss-gpu

### Importing all the required libraries and checking whether we have access to GPU or not. Here, we are using the T4 GPU provided by colab.

We must be getting following output after running the cell

Device: cuda

Tesla T4

In [4]:
# Importing all the libraries
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, BitsAndBytesConfig

import torch

from langchain.llms import HuggingFacePipeline

from langchain.chains import ConversationalRetrievalChain

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings

from langchain.vectorstores import FAISS

In [5]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

print("Device:", device)
if device == 'cuda':
    print(torch.cuda.get_device_name(0))

Device: cuda
Tesla T4



# **We are using Mistral 7 Billion parameter LLM (Open source from HuggingFace)**

In [6]:
# Load the Mitsral 7B model and create an instance of the model and tokenier from Huggingface

origin_model_path = "mistralai/Mistral-7B-Instruct-v0.1"
model_path = "filipealmeida/Mistral-7B-Instruct-v0.1-sharded"
bnb_config = BitsAndBytesConfig \
              (
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch.bfloat16,
              )
model = AutoModelForCausalLM.from_pretrained (model_path, trust_remote_code=True,
                                              quantization_config=bnb_config,
                                              device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(origin_model_path)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

# **Creating pipelines to run LLM at Colab notebook**


In [7]:
# Creating a pipeline object for the model
text_generation_pipeline = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=300,
    temperature = 0.3,
    do_sample=True,
)
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

In [8]:
# Let's try running the Mistral 7B model once
text = "What is the future of AI?"
response = mistral_llm.invoke(text)
print(response)


A: The future of AI is difficult to predict, but it is likely that AI will continue to play an increasingly important role in our lives. Some possible developments include more advanced machine learning algorithms, greater integration of AI into everyday devices and systems, and the creation of truly autonomous machines capable of making decisions on their own. However, there are also concerns about the potential risks and ethical implications of these developments, and it remains to be seen how society will respond to them.


# **Load the PDF and create chunk of texts**


In [9]:
# Load the pdf file
loader = PyPDFLoader('/content/drive/MyDrive/Colab Notebooks/Attention-is-all-you-need-Paper.pdf')
documents = loader.load()

# Split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunked_docs  = text_splitter.split_documents(documents)

In [10]:
# HuggingFace embeddings to create embedding of chunked docs to store at
# Vector Database

embeddings = HuggingFaceEmbeddings()

In [11]:
# Store chunk of pdf files at FAISS vector database by using HuggingFace Embedding model

faiss_db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2'))

In [12]:
# Connect query to FAISS index using a retriever
retriever = faiss_db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 4}
)

In [13]:
# Create the Conversational Retrieval Chain
qa_chain = ConversationalRetrievalChain.from_llm(mistral_llm, retriever,return_source_documents=True)

In [16]:
# Get the answer of question from Vector Database
import sys

def get_user_input():
    return input('Prompt: ').lower()

def main():
    chat_history = []

    query = get_user_input()

    result = qa_chain.invoke({'question': query, 'chat_history': chat_history})
    print(f'Answer: {result["answer"]}\n')

    chat_history.append((query, result['answer']))

if __name__ == "__main__":
    main()

Prompt: What do you mean by transformers and the attention mechanism?
Answer:  Transformers are a type of deep learning architecture that are based on attention mechanisms. Attention mechanisms are a way of computing a weighted combination of inputs, where the weights are determined by a compatibility function. In the context of transformers, attention is used to compute a weighted combination of inputs in order to extract relevant information from the input sequence. The attention mechanism is a fundamental component of transformers and is what sets them apart from other deep learning architectures.

