<a href="https://colab.research.google.com/github/bbanzai88/Data-Science-Repository/blob/main/Thomas_Heiman_Resume_Chatbot_Updated.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is an experiment in using a chatbot to query my resume. I am using the following packages:💬

FAISS 👩‍💻 Facebook AI Similarity Search is a popular library that allows developers to quickly search for embeddings of multimedia documents that are similar to each other. We will use it to quickly search through CVs and recommendation letters to find relevant text fragments.

LangChain 🦜🔗 LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). It’s use-cases overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.

This notebook based upon the medium article at:https://blog.gopenai.com/transform-your-cv-into-an-interactive-chatbot-with-llm-faiss-and-langchain-64263241d46d

I have updated it to use cpu only as the requirements for the faiss-gpu changed and made the whole notebook unusable.

In [None]:
# STEP 1: Install dependencies
!pip install --quiet faiss-cpu langchain langchain-community transformers sentence-transformers pypdf

# STEP 2: Import libraries
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
from google.colab import files
import os

# STEP 3: Upload your resume (PDF)
print("📄 Upload your resume PDF")
uploaded = files.upload()
resume_path = list(uploaded.keys())[0]

# STEP 4: Load and split the resume into chunks
loader = PyPDFLoader(resume_path)
pages = loader.load()

splitter = CharacterTextSplitter(separator="\n", chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(pages)

# STEP 5: Create embeddings and FAISS index
print("🔍 Creating vector index from resume...")
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = FAISS.from_documents(docs, embeddings)

# Step 6: Load LLM (lightweight and fast)
print("⚙️ Loading LLM...")
generator = pipeline(
    "text-generation",
    model="distilgpt2",
    do_sample=True,
    temperature=0.7,
    max_new_tokens=150,
    pad_token_id=50256
)
llm = HuggingFacePipeline(pipeline=generator)

# STEP 7: Set up Retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever()
)

llm = HuggingFacePipeline(pipeline=generator)

# Step 8: Start interactive Q&A
def chat():
    print("\n💬 Ask me anything about your resume! (type 'exit' to quit)")
    while True:
        try:
            query = input("You: ")
        except EOFError:
            break
        if query.lower() in ['exit', 'quit']:
            print("👋 Goodbye!")
            break
        result = qa_chain.invoke(query)
        print("🤖 Bot:", result)
chat()


📄 Upload your resume PDF


Saving TomHeiman_Resume_New.pdf to TomHeiman_Resume_New (2).pdf
🔍 Creating vector index from resume...
⚙️ Loading LLM...


Device set to use cpu



💬 Ask me anything about your resume! (type 'exit' to quit)
You: Is Thomas Heiman a good scientist?
🤖 Bot: {'query': 'Is Thomas Heiman a good scientist?', 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\ninitiatives across federal agencies including the FDA, DHS, USCIS, and DoD. \nHolds a Ph.D. in Computational Biology and Bioinformatics and a proven track \nrecord in machine learning, data engineering, scientific programming, and \nregulatory data science. Skilled in R, Python, SQL, and MATLAB, with deep \nexpertise in entity resolution, topic modeling, and time -series forecasting. \nPublished contributor to Science and recognized AI reviewer and educator.  \nPROFESSIONAL EXPERIENCE\n\n703-303-3517 theiman@verizon.net Reston, Virginia \n \n \n \n \n \nEDUCATION \nGeorge Mason University \nPh.D. in Computational Biology \nand Bioinformatics \nM.S. in Computat