<a href="https://colab.research.google.com/github/SunLite9/Harry-Potter-LLM-Chatbot/blob/main/SunLite9/Harry-Potter-LLM-Chatbot/%20Harry_Potter_LLM_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
! nvidia-smi -L

GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-2e03bc7d-b580-64a0-1083-1b0a325fd4dc)


# Install/Imports


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
%%time

! pip install --no-deps langchain
! pip install --no-deps tiktoken
! pip install --no-deps pypdf
! pip install --no-deps faiss-gpu
! pip install --no-deps InstructorEmbedding
! pip install --no-deps transformers
! pip install --no-deps accelerate
! pip install --no-deps bitsandbytes
! pip install --no-deps langchain-huggingface
! pip install --no-deps langchain-community
! pip install sentence_transformers==2.2.2
! pip install -qq -U langchain tiktoken pypdf chromadb faiss-gpu
! pip install -qq -U transformers InstructorEmbedding
# ! pip install -qq -U transformers InstructorEmbedding sentence_transformers==2.2.2
! pip install -qq -U accelerate bitsandbytes xformers einops
! pip install gradio


CPU times: user 136 ms, sys: 27.4 ms, total: 164 ms
Wall time: 20.7 s


In [4]:
! pip show sentence-transformers

Name: sentence-transformers
Version: 2.2.2
Summary: Multilingual text embeddings
Home-page: https://github.com/UKPLab/sentence-transformers
Author: Nils Reimers
Author-email: info@nils-reimers.de
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: huggingface-hub, nltk, numpy, scikit-learn, scipy, sentencepiece, torch, torchvision, tqdm, transformers
Required-by: langchain-huggingface


In [5]:
import warnings
warnings.filterwarnings("ignore")

import os
import textwrap

import langchain

from langchain_huggingface import HuggingFacePipeline

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import LlamaTokenizer, LlamaForCausalLM, pipeline

import gradio as gr

print(gr.__version__)

print(langchain.__version__)

5.1.0
0.3.3


In [6]:
from langchain.vectorstores import Chroma, FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.chains import RetrievalQA, VectorDBQA
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader


from InstructorEmbedding import INSTRUCTOR
from langchain.embeddings import HuggingFaceInstructEmbeddings

# Defining the Model Configuration and Loading Function

In [7]:
class CFG:
    model_name = 'falcon' # wizardlm, llama, bloom, falcon

In [8]:
def get_model(model = CFG.model_name):

    print('\nDownloading model: ', model, '\n\n')

    if CFG.model_name == 'wizardlm':
        tokenizer = AutoTokenizer.from_pretrained('TheBloke/wizardLM-7B-HF')

        model = AutoModelForCausalLM.from_pretrained('TheBloke/wizardLM-7B-HF',
                                                     load_in_8bit=True,
                                                     device_map='auto',
                                                     torch_dtype=torch.float16,
                                                     low_cpu_mem_usage=True
                                                    )
        max_len = 1024
        task = "text-generation"
        T = 0

    elif CFG.model_name == 'llama':
        tokenizer = AutoTokenizer.from_pretrained("aleksickx/llama-7b-hf")

        model = AutoModelForCausalLM.from_pretrained("aleksickx/llama-7b-hf",
                                                     load_in_8bit=True,
                                                     device_map='auto',
                                                     torch_dtype=torch.float16,
                                                     low_cpu_mem_usage=True,
                                                    )
        max_len = 1024
        task = "text-generation"
        T = 0.1

    elif CFG.model_name == 'bloom':
        tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-7b1")

        model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-7b1",
                                                     load_in_8bit=True,
                                                     device_map='auto',
                                                     torch_dtype=torch.float16,
                                                     low_cpu_mem_usage=True,
                                                    )
        max_len = 1024
        task = "text-generation"
        T = 0

    elif CFG.model_name == 'falcon':

        #model identifier
        tokenizer = AutoTokenizer.from_pretrained("h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2")

        model = AutoModelForCausalLM.from_pretrained("h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2",
                                                     load_in_8bit=True,
                                                     device_map='auto',
                                                     torch_dtype=torch.float16,
                                                     low_cpu_mem_usage=True,
                                                     trust_remote_code=True
                                                    )
        max_len = 1024
        task = "text-generation"
        T = 0

    else:
        print("Not implemented model (tokenizer and backbone)")

    return tokenizer, model, max_len, task, T

# Hugging Face

In [9]:
%%time

tokenizer, model, max_len, task, T = get_model(CFG.model_name)


Downloading model:  falcon 




The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

CPU times: user 4.76 s, sys: 1.95 s, total: 6.71 s
Wall time: 9.44 s


In [10]:
pipe = pipeline(
    task=task,
    model=model,
    tokenizer=tokenizer,
    max_length=max_len,
    temperature=T,
    top_p=0.95,
    repetition_penalty=1.15
)

llm = HuggingFacePipeline(pipeline=pipe)

In [11]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7e43c40ca1d0>)

# Langchain

In [12]:
CFG.model_name

'falcon'

Load Harry Potter PDFs

In [13]:
from langchain.document_loaders import DirectoryLoader, PyPDFLoader


loader = DirectoryLoader('/content/drive/My Drive/HarryPotter/',
                         glob="*.pdf",
                         loader_cls=PyPDFLoader,
                         show_progress=True,
                         use_multithreading=True)


documents = loader.load()


100%|██████████| 7/7 [00:54<00:00,  7.79s/it]


Document Preprocessing

In [14]:
%%time

for i in range(len(documents)):
    documents[i].page_content = documents[i].page_content.replace('\t', ' ')\
                                                         .replace('\n', ' ')\
                                                         .replace('       ', ' ')\
                                                         .replace('      ', ' ')\
                                                         .replace('     ', ' ')\
                                                         .replace('    ', ' ')\
                                                         .replace('   ', ' ')\
                                                         .replace('  ', ' ')

CPU times: user 54.7 ms, sys: 11 ms, total: 65.7 ms
Wall time: 65.5 ms


In [15]:
len(documents)

2578

In [16]:
documents[99].page_content

'But as all they knew for sure about the mysterious object was that it was about two inches long, they didn’t have much chance of guessing what it was without further clues. Neither Neville nor Hermione showed the slightest interest in what lay under- neath the dog and the trapdoor. All Neville cared about was never going near the dog again. Hermione was now refusing to speak to Harry and Ron, but she was such a bossy know-it-all that they saw this as an added bonus. All they really wanted now was a way of getting back at Malfoy, and to their great delight, just such a thing arrived in the mail about a week later. As the owls flooded into the Great Hall as usual, everyone’s attention was caught at once by a long, thin package carried by six large screech owls. Harry was just as interested as everyone else to see what was in this large parcel, and was amazed when the owls soared down and dropped it right in front of him, knocking his bacon to the floor. They had hardly fluttered out of 

Splits texts to smaller chunks

In [17]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
len(texts)

7659

Vector Database

In [18]:
%%time

from langchain.embeddings import HuggingFaceInstructEmbeddings


persist_directory = 'harry-potter-vectordb-chroma'

instructor_embeddings = HuggingFaceInstructEmbeddings(
    model_name="hkunlp/instructor-xl",
    # model_name="hkunlp/instructor-base",
    model_kwargs={"device": "cuda"}
    )

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=instructor_embeddings,
                                 persist_directory=persist_directory,
                                 collection_name='hp_books')



vectordb.persist()

load INSTRUCTOR_Transformer
max_seq_length  512
CPU times: user 6min 7s, sys: 3.31 s, total: 6min 10s
Wall time: 5min 53s




# Retriever and QA Chain

In [19]:
# retriever = vectordb.as_retriever(search_kwargs={"k": 3, "search_type" : "similarity"})
retriever = vectordb.as_retriever(search_kwargs={"k": 3})


qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                       chain_type="stuff",
                                       retriever=retriever,
                                       return_source_documents=True,
                                       verbose=False)

In [20]:
qa_chain

RetrievalQA(verbose=False, combine_documents_chain=StuffDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:"), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7e43c40ca1d0>), output_parser=StrOutputParser(), llm_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_variable_name='context'), return_source_documents=True, retriever=VectorStoreRetriever(tags=['Chroma', 'HuggingFaceInstructEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7e43b173b490>, search_kwarg

In [36]:
def wrap_text_preserve_newlines(text, width=110):
    lines = text.split('\n')

    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text

def process_llm_response(llm_response):
    print(wrap_text_preserve_newlines(llm_response['result']))
    print('\n\nSources:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])

# Function to Query the Chatbot

In [44]:


def llm_ans(query):

    try:
        llm_response = qa_chain(query)
        print(f"LLM Raw Response: {llm_response}")
        return llm_response['result']
    except Exception as e:
        print(f"Error in LLM processing: {e}")
        return None


# Questions and Answers

In [52]:
query = "Who is Harry Potter?"
llm_ans(query)

LLM Raw Response: {'query': 'Who is Harry Potter?', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nmessages to anyone in the wizarding world. Harry looked nothing like the rest of the family . Uncle V ernon was large and neckless, with an enormous black mustache; Aunt Petunia was horse-faced and bony; Dudley was blond, pink, and porky . Harry , on the other hand, was small and skinny , with brilliant green eyes and jet-black hair that was always untidy . He wore round glasses, and on his forehead was a thin, lightning-shaped scar. It was this scar that made Harry so particularly unusual, even for a wizard. This scar was the only hint of Harry’s very mysterious past, of the reason he had been left on the Dursleys’ doorstep eleven years before. At the age of one year old, Harry had somehow survived a curse from the greatest Dark sorcerer of all time, Lord V ol

"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nmessages to anyone in the wizarding world. Harry looked nothing like the rest of the family . Uncle V ernon was large and neckless, with an enormous black mustache; Aunt Petunia was horse-faced and bony; Dudley was blond, pink, and porky . Harry , on the other hand, was small and skinny , with brilliant green eyes and jet-black hair that was always untidy . He wore round glasses, and on his forehead was a thin, lightning-shaped scar. It was this scar that made Harry so particularly unusual, even for a wizard. This scar was the only hint of Harry’s very mysterious past, of the reason he had been left on the Dursleys’ doorstep eleven years before. At the age of one year old, Harry had somehow survived a curse from the greatest Dark sorcerer of all time, Lord V oldemort, whose name most witches and wizards still feared to sp

In [45]:
query = "What is the name of the three-headed dog that guards the Sorcerer’s Stone?"
llm_ans(query)

LLM Raw Response: {'query': 'What is the name of the three-headed dog that guards the Sorcerer’s Stone?', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nhappened so far. They weren’t in a room, as he had supposed. They were in a corridor. The forbidden corridor on the third floor. And now they knew why it was forbidden. They were looking straight into the eyes of a monstrous dog, a dog that filled the whole space between ceiling and floor. It had three heads. Three pairs of rolling, mad eyes; three noses, twitching and quivering in their direction; three drooling mouths, saliva hanging in slippery ropes from yellowish fangs. It was standing quite still, all six eyes staring at them, and Harry knew that the only reason they weren’t already dead was that their sudden appearance had taken it by surprise, but it was quickly getting over that, there was no mistak

"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nhappened so far. They weren’t in a room, as he had supposed. They were in a corridor. The forbidden corridor on the third floor. And now they knew why it was forbidden. They were looking straight into the eyes of a monstrous dog, a dog that filled the whole space between ceiling and floor. It had three heads. Three pairs of rolling, mad eyes; three noses, twitching and quivering in their direction; three drooling mouths, saliva hanging in slippery ropes from yellowish fangs. It was standing quite still, all six eyes staring at them, and Harry knew that the only reason they weren’t already dead was that their sudden appearance had taken it by surprise, but it was quickly getting over that, there was no mistaking what those thunderous growls meant. Harry groped for the doorknob — between Filch and death, he’d take Filch. Th

In [46]:
query = "Who is the head of Gryffindor house during Harry Potter's time at Hogwarts?"
llm_ans(query)





In [48]:
query = "What is the prophecy about Harry Potter and Voldemort?"
llm_ans(query)

LLM Raw Response: {'query': 'What is the prophecy about Harry Potter and Voldemort?', 'result': 'Use the following pieces of context to answer the question at the end. If you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\n\nDumbledore surveyed him for a moment through his glasses. ‘The odd thing, Harry,’ he said softly, ‘is that it may not have meant you at all. Sybill’s prophecy could have applied to two wizard boys, both born at the end of July that year, both of whom had parents in the Order of the Phoenix, both sets of parents having narrowly escaped Voldemort three times. One, of course, was you. The other was Neville Longbottom.’ ‘But then … but then, why was it my name on the prophecy and not Neville’s?’ ‘The oﬀicial record was re-labelled after Voldemort’s attack on you as a child,’ said Dumbledore. ‘It seemed plain to the keeper of the Hall of Prophecy that Voldemort could only have tried to kill you because he knew you to be the one t

'Use the following pieces of context to answer the question at the end. If you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\n\nDumbledore surveyed him for a moment through his glasses. ‘The odd thing, Harry,’ he said softly, ‘is that it may not have meant you at all. Sybill’s prophecy could have applied to two wizard boys, both born at the end of July that year, both of whom had parents in the Order of the Phoenix, both sets of parents having narrowly escaped Voldemort three times. One, of course, was you. The other was Neville Longbottom.’ ‘But then … but then, why was it my name on the prophecy and not Neville’s?’ ‘The oﬀicial record was re-labelled after Voldemort’s attack on you as a child,’ said Dumbledore. ‘It seemed plain to the keeper of the Hall of Prophecy that Voldemort could only have tried to kill you because he knew you to be the one to whom Sybill was referring.’ ‘Then - it might not be me?’ said Harry ‘I am afraid,’ said Dumble

In [49]:
query = "What are the Deathly Hallows?"
llm_ans(query)

LLM Raw Response: {'query': 'What are the Deathly Hallows?', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nBoth Harry’s and the Death Eater’s wands flew out of their hands and soared back towards the entrance to the Hall of Prophecy; both scrambled to their feet and charged after them, the Death Eater in front, Harry hot on his heels, and Neville bringing up the rear, plainly horrorstruck by what he had done. ’Get out of the way, Harry!’ yelled Neville, clearly determined to repair the damage. Harry flung himself sideways as Neville took aim again and shouted: ’STUPEFY!’ The jet of red light flew right over the Death Eater’s shoulder and hit a glass- fronted cabinet on the wall full of variously shaped hour-glasses; the cabinet fell to the floor and burst apart, glass flying everywhere, sprang back up on to the wall, fully mended, then fell down again, and 

"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nBoth Harry’s and the Death Eater’s wands flew out of their hands and soared back towards the entrance to the Hall of Prophecy; both scrambled to their feet and charged after them, the Death Eater in front, Harry hot on his heels, and Neville bringing up the rear, plainly horrorstruck by what he had done. ’Get out of the way, Harry!’ yelled Neville, clearly determined to repair the damage. Harry flung himself sideways as Neville took aim again and shouted: ’STUPEFY!’ The jet of red light flew right over the Death Eater’s shoulder and hit a glass- fronted cabinet on the wall full of variously shaped hour-glasses; the cabinet fell to the floor and burst apart, glass flying everywhere, sprang back up on to the wall, fully mended, then fell down again, and shattered - The Death Eater had snatched up his wand, which lay on the 

# Webapp


In [51]:
import gradio as gr

def api_query(query):

    return llm_ans(query)

def interface(query):
    return api_query(query)


demo = gr.Interface(fn=interface, inputs="text", outputs="text", title="Harry Potter LLM Chatbot", description="Ask questions about Harry Potter")
demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://fa64eccda7eadb8247.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


