<a href="https://colab.research.google.com/github/bbanzai88/Data-Science-Repository/blob/main/Thomas_Heiman_Resume_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is an experiment in using a chatbot to query my resume. I am using the following packages:

> Quantized TRURL 🤖
TRURL 7b is an LLM, finetuned Llama 2, trained on a large number of Polish data by Voicelab.ai. The quantized model takes around 8 GB of GPU RAM so it perfectly fits GPU memory in Colab.

> Embedding Model from HuggingFace 🤗
We will use the embedding model to create a vector knowledge base which will be used to pass relevant data to the LLM during the chat 💬

> FAISS 👩‍💻
Facebook AI Similarity Search is a popular library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other. We will use it to quickly search through CVs and recommendation letters to find relevant text fragments.

> LangChain 🦜🔗
LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It’s use-cases overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.

This notebook based upon the medium article at:https://blog.gopenai.com/transform-your-cv-into-an-interactive-chatbot-with-llm-faiss-and-langchain-64263241d46d

Import the necessary packages

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
import warnings
warnings. filterwarnings('ignore')

In [None]:
!pip install torch transformers langchain sentence_transformers faiss-gpu accelerate bitsandbytes pypdf --upgrade

Collecting torch
  Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.2/670.2 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers
  Downloading transformers-4.34.0-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m72.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.309-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m66.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86

In [None]:
import transformers
import torch

from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

Below we load the resume data

In [None]:
from langchain.document_loaders import PyPDFLoader

# loading data from pdf
pdf_loader = PyPDFLoader("/content/sample_data/tom resume090823.pdf") # upload your cv here
cv = pdf_loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
cv = text_splitter.split_documents(cv)

Scrape any web links

In [None]:
# Loading data from websites
web_links = [
    "https://scholar.google.com/citations?hl=en&user=QUCaYZgAAAAJ&view_op=list_works&sortby=pubdate",
] # add your website links here
web_loader = WebBaseLoader(web_links)
web_docs = web_loader.load()
web_docs = text_splitter.split_documents(web_docs)

Merge the cv and webdocs

In [None]:
docs = web_docs + cv

Now, we will use the prepared data to create an embeddings database

In [None]:
# Creating embeddings and move them to FAISS
embedding_model_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name, model_kwargs={"device": "cuda"})
embeddings_retriever = FAISS.from_documents(docs, embeddings).as_retriever()

Downloading (…)0fe39/.gitattributes:   0%|          | 0.00/968 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)83e900fe39/README.md:   0%|          | 0.00/3.79k [00:00<?, ?B/s]

Downloading (…)e900fe39/config.json:   0%|          | 0.00/645 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/471M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading unigram.json:   0%|          | 0.00/14.8M [00:00<?, ?B/s]

Downloading (…)900fe39/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

In [None]:
Load LLM

SyntaxError: ignored

In [None]:
import torch
# Load TRURL
model_id = "Voicelab/trurl-2-7b-8bit"

# Load tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
)

# Load model
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    offload_folder = "."
).eval()

Downloading (…)okenizer_config.json:   0%|          | 0.00/698 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Downloading (…)lve/main/config.json:   0%|          | 0.00/988 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/7.01G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/162 [00:00<?, ?B/s]

In [None]:
# Set generation details
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task="text-generation",
    temperature=0.1,  # generation parameter resposible for output sampling
    max_new_tokens=512,  # max number of tokens to generate in the output
    repetition_penalty=1.05,  # penalty for repeating tokens
    do_sample=True,
)

In [None]:
llm = HuggingFacePipeline(pipeline=generate_text, model_id=model_id)

In [None]:
chain = ConversationalRetrievalChain.from_llm(
        llm,
        embeddings_retriever,
        return_source_documents=True,
        max_tokens_limit=3500,
    )

Run chatbot and test

In [None]:
prompt = "Hello!"
print(f"HUMAN: {prompt}")
result = chain({"question": prompt, "chat_history": []})
print(f'ANSWER:{result["answer"]}')

HUMAN: Hello!


Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

ANSWER: I don't know.


The model seems to work

In [None]:
# Let's bias our bot a little so it will recommend me all the time :) :)
# We will add a bit of a context to the model by adding a "fake chat history"
chat_history = []
chat_history.append(("Is Thomas Heiman a good scientist?", # prompt
                     "Yes, he is a great scientist and I would definitely recommend him for any ML or DL role.")) # answer

Chat with the chatbot

In [None]:
prompt = ""
print("write 'quit' to quit the chat")
print("TRURL: Hi! I am Trurl. I am here to tell you about Thomas Heiman. Do you have any questions?")
while prompt != "quit":
    if prompt != "":
      result = chain({"question": prompt, "chat_history": chat_history})["answer"].lstrip()
      chat_history.append((prompt, result)) # adding messages to chat history
      print(f"TRURL: {result}") # printing the answer
    prompt = input("HUMAN: ") # getting the prompt from you

write 'quit' to quit the chat
TRURL: Hi! I am Trurl. I am here to tell you about Thomas Heiman. Do you have any questions?
HUMAN: Who is Tom?
TRURL: Thomas Heiman is a data scientist with expertise in predictive analytics, data mining, and visualization. He has worked for the United States Citizenship and Immigration Services (USCIS) and the Center for Biologic Evaluation and Research (CBER) at the FDA.
HUMAN: What does he specialize in?
TRURL: Thomas J. Hei man has expertise in artificial intelligence/machine learning (AI/ML), data science, data mining and visualization, predictive analytics, natural language processing (NLP), topic modeling, information extraction (IE), sentiment analysis, machine learning, computational science and informatics, scientific programming, bioinformatics, and computational biology.
HUMAN: Would you recommend him for a data scietist position?
TRURL: Yes, I would recommend Thomas Heiman for a data scientist position. He has experience working with predicti

Tom's Note: Aside from the fact that I actually have 12 papers published (and the paper listed is not published as it was an internal FDA document), the chatbot does a decent job