<a href="https://colab.research.google.com/github/anuraglahon16/AI-Engineering/blob/main/ResumeExtraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# Install dependencies
!pip install langchain langchain-core langchain-community langchain_openai huggingface-hub requests -q -U
!pip install arxiv pymupdf faiss-cpu chromadb tenacity PyPDF2 -q

# Set up environment variables
import os
import getpass
from tenacity import retry, stop_after_attempt, wait_exponential

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")

from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent, Tool
from langchain.tools import BaseTool
from PyPDF2 import PdfReader

# Get user input for the resume file path
resume_file_path = input("Enter the path to your resume file: ")

# Load the resume file (PDF)
reader = PdfReader(resume_file_path)
resume_content = ""
for page in reader.pages:
    resume_content += page.extract_text()

# Split the resume into chunks
text_splitter = CharacterTextSplitter(separator='\n---\n', chunk_size=1000, chunk_overlap=0)
resume_docs = text_splitter.split_documents([Document(page_content=resume_content)])

# Create vector database and store the resume
embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-ada-002")
db = Chroma.from_documents(resume_docs, embedding_model, persist_directory='./chroma_db')

# Initialize the language model
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo", temperature=0.5, max_tokens=150)

# Initialize the conversation memory
chat_memory = ConversationBufferMemory(ai_prefix="AI Assistant")

# Define the conversation template
template = """
You're an AI assistant and your task is to gather additional details from a candidate after they upload their resume.

At the beginning, shortly describe the purpose of this conversation.

You should gather answers for the following questions:

- What specific skills or experiences do you have that are relevant to the position you're applying for?
- Can you provide an example of a project or accomplishment that demonstrates your qualifications?
- What are your career goals and how do you see this position aligning with those goals?

Don't answer the question you are asking.

Be patient and encouraging if the candidate doesn't know how to answer some questions, and help guide them.

Ask one question at a time.

Once you have gathered all the details, thank the candidate for their responses, summarize the relevant information that will help the recruiter better understand their qualifications, and put ''

Current conversation:
{history}
Human: {input}
AI assistant:
"""

# Create the conversation prompt
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
    prompt=prompt,
    llm=llm,
    verbose=False,
    memory=chat_memory
)

# Start the conversation
current_input = "I have uploaded my resume and I'm ready to provide additional information."

end_seq = ''
candidate_details = ''

while True:
    ai_response = conversation.predict(input=current_input)
    print(ai_response)

    if end_seq in ai_response:
        candidate_details = chat_memory.chat_memory.messages[-1].content.replace(end_seq, '')
        break

    user_input = input('Candidate: ')
    current_input = user_input

# Initialize the RAG tool
rag_tool = Tool(
    name="RAG System",
    func=lambda q: db.similarity_search(q, k=3),
    description="Useful for retrieving relevant information from the candidate's resume."
)

# Generate the summary using the agent with retry logic
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def generate_summary(query):
    agent = initialize_agent([rag_tool], llm, agent="zero-shot-react-description", verbose=True)
    return agent.run(f"Use the resume and additional details provided by the candidate to generate a summary for the recruiter: {query}")

try:
    summary = generate_summary(candidate_details)
except Exception as e:
    print(f"Error generating summary: {str(e)}")
    summary = "I apologize, I encountered an error while generating the summary. I tried using the available information from the resume and additional details provided by the candidate, but I still don't have enough relevant information to provide a satisfactory summary for the recruiter. I would recommend gathering more specific details from the candidate related to their qualifications and experience."

# Print the summary
print("Summary for the Recruiter:")
print(summary)

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m225.3/232.6 kB[0m [31m7.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hOpenAI API Key:··········
Enter the path to your resume file: /content/Anurag Lahon Resume 2024.pdf


  warn_deprecated(
  warn_deprecated(


Great! Thank you for uploading your resume. The purpose of this conversation is to gather additional details to better understand your qualifications for the position you're applying for. 

What specific skills or experiences do you have that are relevant to the position you're applying for?


[1m> Entering new AgentExecutor chain...[0m


  warn_deprecated(
  warn_deprecated(


[32;1m[1;3mI need to look for specific skills or experiences on the candidate's resume that are relevant to the position.
Action: RAG System
Action Input: Candidate's resume[0m
Observation: [36;1m[1;3m[Document(page_content="ANURAG LAHON (720)341-5435| Newark, Delaware| anurag.lahon@ucdenver.edu | LinkedIn | GitHub | Portfolio TECHNICAL SKILLS Database and Vector Database: PostgreSQL, MySQL, Cassandra, Pinecone,ChromaDB Data Visualization: Tableau, PowerBI, Excel, Plotly Tools: Jupyter, Git,AWS, SAS, Alteryx, Jira, Programming Languages: Python, R, SQL, JavaScript, HTML, CSS Big data Ecosystem: Hadoop, Hive, PySpark, HBase, Sqoop Libraries and Frameworks: PyTorch, TensorFlow, Numpy, Pandas, Langchain, LlamaIndex Deployment Platforms:                                         AWS Bedrock, AWS (EC2, Lambda),Hugging Face Spaces        AI/ML Techniques:                                                Fine-tuning with custom data, vector embedding, NLP,MLOPS        Generative AI Technolog



[32;1m[1;3mI need to identify specific skills or experiences that are relevant to the position the candidate is applying for.
Action: RAG System
Action Input: Candidate's resume[0m
Observation: [36;1m[1;3m[Document(page_content="ANURAG LAHON (720)341-5435| Newark, Delaware| anurag.lahon@ucdenver.edu | LinkedIn | GitHub | Portfolio TECHNICAL SKILLS Database and Vector Database: PostgreSQL, MySQL, Cassandra, Pinecone,ChromaDB Data Visualization: Tableau, PowerBI, Excel, Plotly Tools: Jupyter, Git,AWS, SAS, Alteryx, Jira, Programming Languages: Python, R, SQL, JavaScript, HTML, CSS Big data Ecosystem: Hadoop, Hive, PySpark, HBase, Sqoop Libraries and Frameworks: PyTorch, TensorFlow, Numpy, Pandas, Langchain, LlamaIndex Deployment Platforms:                                         AWS Bedrock, AWS (EC2, Lambda),Hugging Face Spaces        AI/ML Techniques:                                                Fine-tuning with custom data, vector embedding, NLP,MLOPS        Generative AI Techn

In [1]:
# Install dependencies
!pip install langchain langchain-core langchain-community langchain_openai huggingface-hub requests -q -U
!pip install arxiv pymupdf faiss-cpu chromadb tenacity PyPDF2 -q

# Set up environment variables
import os
import getpass
from tenacity import retry, stop_after_attempt, wait_exponential

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")

from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent, Tool
from langchain.tools import BaseTool
from PyPDF2 import PdfReader

# Get user input for the resume file path
resume_file_path = input("Enter the path to your resume file: ")

# Load the resume file (PDF)
reader = PdfReader(resume_file_path)
resume_content = ""
for page in reader.pages:
    resume_content += page.extract_text()

# Split the resume into chunks
text_splitter = CharacterTextSplitter(separator='\n---\n', chunk_size=1000, chunk_overlap=0)
resume_docs = text_splitter.split_documents([Document(page_content=resume_content)])

# Create vector database and store the resume
embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-ada-002")
db = Chroma.from_documents(resume_docs, embedding_model, persist_directory='./chroma_db')

# Initialize the language model
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo", temperature=0.5, max_tokens=150)

# Initialize the conversation memory
chat_memory = ConversationBufferMemory(ai_prefix="AI Assistant")

# Define the conversation template
template = """
You're an AI assistant and your task is to gather additional details from a candidate after they upload their resume.

At the beginning, shortly describe the purpose of this conversation.

You should gather answers for the following questions:

- What specific skills or experiences do you have that are relevant to the position you're applying for?
- Can you provide an example of a project or accomplishment that demonstrates your qualifications?
- What are your career goals and how do you see this position aligning with those goals?

Don't answer the question you are asking.

Be patient and encouraging if the candidate doesn't know how to answer some questions, and help guide them.

Ask one question at a time and wait for the candidate's response before asking the next question.

Once you have gathered all the details, thank the candidate for their responses, summarize the relevant information that will help the recruiter better understand their qualifications, and put ''

Current conversation:
{history}
Human: {input}
AI assistant:
"""

# Create the conversation prompt
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
    prompt=prompt,
    llm=llm,
    verbose=False,
    memory=chat_memory
)

# Start the conversation
print("AI Assistant: Great! Thank you for uploading your resume. The purpose of this conversation is to gather additional details to better understand your qualifications for the position you're applying for.")
print("AI Assistant: What specific skills or experiences do you have that are relevant to the position you're applying for?")

candidate_details = ""

while True:
    user_input = input("Candidate: ")
    candidate_details += f"Candidate: {user_input}\n"

    ai_response = conversation.predict(input=user_input)
    print(f"AI Assistant: {ai_response}")

    if '' in ai_response:
        candidate_details += f"AI Assistant: {ai_response.replace('', '')}\n"
        break

# Initialize the RAG tool
rag_tool = Tool(
    name="RAG System",
    func=lambda q: db.similarity_search(q, k=3),
    description="Useful for retrieving relevant information from the candidate's resume."
)

# Generate the summary using the agent with retry logic
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def generate_summary(query):
    agent = initialize_agent([rag_tool], llm, agent="zero-shot-react-description", verbose=True)
    return agent.run(f"Use the resume and additional details provided by the candidate to generate a summary for the recruiter: {query}")

try:
    summary = generate_summary(candidate_details)
except Exception as e:
    print(f"Error generating summary: {str(e)}")
    summary = "I apologize, I encountered an error while generating the summary. I tried using the available information from the resume and additional details provided by the candidate, but I still don't have enough relevant information to provide a satisfactory summary for the recruiter. I would recommend gathering more specific details from the candidate related to their qualifications and experience."

# Print the summary
print("Summary for the Recruiter:")
print(summary)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.2/290.2 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m113.7/113.7 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.0/53.0 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.0/311.0 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━

  warn_deprecated(
  warn_deprecated(


AI Assistant: Great! Thank you for uploading your resume. The purpose of this conversation is to gather additional details to better understand your qualifications for the position you're applying for.
AI Assistant: What specific skills or experiences do you have that are relevant to the position you're applying for?
Candidate: I am skilled in Python, SQL,AWS
AI Assistant: Hello, thank you for sharing your skills with me. The purpose of our conversation is to gather additional details from you after you uploaded your resume. Could you please provide more information on the specific skills or experiences you have that are relevant to the position you're applying for?


[1m> Entering new AgentExecutor chain...[0m


  warn_deprecated(
  warn_deprecated(


[32;1m[1;3mI need to gather more specific information about the candidate's skills and experiences related to the position they are applying for.
Action: RAG System
Action Input: Python, SQL, AWS[0m
Observation: [36;1m[1;3m[Document(page_content="ANURAG LAHON (720)341-5435| Newark, Delaware| anurag.lahon@ucdenver.edu | LinkedIn | GitHub | Portfolio TECHNICAL SKILLS Database and Vector Database: PostgreSQL, MySQL, Cassandra, Pinecone,ChromaDB Data Visualization: Tableau, PowerBI, Excel, Plotly Tools: Jupyter, Git,AWS, SAS, Alteryx, Jira, Programming Languages: Python, R, SQL, JavaScript, HTML, CSS Big data Ecosystem: Hadoop, Hive, PySpark, HBase, Sqoop Libraries and Frameworks: PyTorch, TensorFlow, Numpy, Pandas, Langchain, LlamaIndex Deployment Platforms:                                         AWS Bedrock, AWS (EC2, Lambda),Hugging Face Spaces        AI/ML Techniques:                                                Fine-tuning with custom data, vector embedding, NLP,MLOPS        G



[32;1m[1;3mI have found relevant information about the candidate's skills and experiences related to Python, SQL, and AWS.
Action: RAG System
Action Input: Python, SQL, AWS[0m
Observation: [36;1m[1;3m[Document(page_content="ANURAG LAHON (720)341-5435| Newark, Delaware| anurag.lahon@ucdenver.edu | LinkedIn | GitHub | Portfolio TECHNICAL SKILLS Database and Vector Database: PostgreSQL, MySQL, Cassandra, Pinecone,ChromaDB Data Visualization: Tableau, PowerBI, Excel, Plotly Tools: Jupyter, Git,AWS, SAS, Alteryx, Jira, Programming Languages: Python, R, SQL, JavaScript, HTML, CSS Big data Ecosystem: Hadoop, Hive, PySpark, HBase, Sqoop Libraries and Frameworks: PyTorch, TensorFlow, Numpy, Pandas, Langchain, LlamaIndex Deployment Platforms:                                         AWS Bedrock, AWS (EC2, Lambda),Hugging Face Spaces        AI/ML Techniques:                                                Fine-tuning with custom data, vector embedding, NLP,MLOPS        Generative AI Technologi

In [2]:
# Install dependencies
!pip install langchain langchain-core langchain-community langchain_openai huggingface-hub requests -q -U
!pip install arxiv pymupdf faiss-cpu chromadb tenacity PyPDF2 -q

# Set up environment variables
import os
import getpass
from tenacity import retry, stop_after_attempt, wait_exponential

OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")

from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.agents import initialize_agent, Tool
from langchain.tools import BaseTool
from PyPDF2 import PdfReader

# Get user input for the resume file path
resume_file_path = input("Enter the path to your resume file: ")

# Load the resume file (PDF)
reader = PdfReader(resume_file_path)
resume_content = ""
for page in reader.pages:
    resume_content += page.extract_text()

# Split the resume into chunks
text_splitter = CharacterTextSplitter(separator='\n---\n', chunk_size=1000, chunk_overlap=0)
resume_docs = text_splitter.split_documents([Document(page_content=resume_content)])

# Create vector database and store the resume
embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-ada-002")
db = Chroma.from_documents(resume_docs, embedding_model, persist_directory='./chroma_db')

# Initialize the language model
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo", temperature=0.5, max_tokens=150)

# Initialize the conversation memory
chat_memory = ConversationBufferMemory(ai_prefix="AI Assistant")

# Define the conversation template
template = """
You're an AI assistant and your task is to gather additional details from a candidate after they upload their resume.

At the beginning, shortly describe the purpose of this conversation.

You should gather answers for the following questions:

- What specific skills or experiences do you have that are relevant to the position you're applying for?
- Can you provide an example of a project or accomplishment that demonstrates your qualifications?
- What are your career goals and how do you see this position aligning with those goals?

Don't answer the question you are asking.

Be patient and encouraging if the candidate doesn't know how to answer some questions, and help guide them.

Ask one question at a time and wait for the candidate's response before asking the next question.

Once you have gathered all the details, thank the candidate for their responses, summarize the relevant information that will help the recruiter better understand their qualifications, and put ''

Current conversation:
{history}
Human: {input}
AI assistant:
"""

# Create the conversation prompt
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
    prompt=prompt,
    llm=llm,
    verbose=False,
    memory=chat_memory
)

# Start the conversation
print("AI Assistant: Great! Thank you for uploading your resume. The purpose of this conversation is to gather additional details to better understand your qualifications for the position you're applying for.")
print("AI Assistant: What specific skills or experiences do you have that are relevant to the position you're applying for?")

candidate_details = ""
questions_asked = 0

while questions_asked < 3:
    user_input = input("Candidate: ")
    candidate_details += f"Candidate: {user_input}\n"

    ai_response = conversation.predict(input=user_input)
    print(f"AI Assistant: {ai_response}")

    if '' in ai_response:
        candidate_details += f"AI Assistant: {ai_response.replace('', '')}\n"
        questions_asked += 1

        if questions_asked == 1:
            print("AI Assistant: Can you provide an example of a project or accomplishment that demonstrates your qualifications?")
        elif questions_asked == 2:
            print("AI Assistant: What are your career goals and how do you see this position aligning with those goals?")

# Initialize the RAG tool
rag_tool = Tool(
    name="RAG System",
    func=lambda q: db.similarity_search(q, k=3),
    description="Useful for retrieving relevant information from the candidate's resume."
)

# Generate the summary using the agent with retry logic
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def generate_summary(query):
    agent = initialize_agent([rag_tool], llm, agent="zero-shot-react-description", verbose=True)
    return agent.run(f"Use the resume and additional details provided by the candidate to generate a summary for the recruiter: {query}")

try:
    summary = generate_summary(candidate_details)
except Exception as e:
    print(f"Error generating summary: {str(e)}")
    summary = "I apologize, I encountered an error while generating the summary. I tried using the available information from the resume and additional details provided by the candidate, but I still don't have enough relevant information to provide a satisfactory summary for the recruiter. I would recommend gathering more specific details from the candidate related to their qualifications and experience."

# Print the summary
print("Summary for the Recruiter:")
print(summary)

OpenAI API Key:··········
Enter the path to your resume file: /content/Anurag Lahon Resume 2024.pdf
AI Assistant: Great! Thank you for uploading your resume. The purpose of this conversation is to gather additional details to better understand your qualifications for the position you're applying for.
AI Assistant: What specific skills or experiences do you have that are relevant to the position you're applying for?
Candidate: SQL, Python, AWS
AI Assistant: Hello! The purpose of this conversation is to gather additional details from you after you have uploaded your resume. I will be asking you a few questions to better understand your skills and experiences relevant to the position you're applying for, as well as your career goals.

What specific skills or experiences do you have that are relevant to the position you're applying for?
AI Assistant: Can you provide an example of a project or accomplishment that demonstrates your qualifications?
Candidate: Built an ETL pipeline to move fro



[32;1m[1;3mI should focus on extracting relevant information about the candidate's experience with SQL, Python, and AWS.
Action: RAG System
Action Input: SQL, Python, AWS[0m
Observation: [36;1m[1;3m[Document(page_content="ANURAG LAHON (720)341-5435| Newark, Delaware| anurag.lahon@ucdenver.edu | LinkedIn | GitHub | Portfolio TECHNICAL SKILLS Database and Vector Database: PostgreSQL, MySQL, Cassandra, Pinecone,ChromaDB Data Visualization: Tableau, PowerBI, Excel, Plotly Tools: Jupyter, Git,AWS, SAS, Alteryx, Jira, Programming Languages: Python, R, SQL, JavaScript, HTML, CSS Big data Ecosystem: Hadoop, Hive, PySpark, HBase, Sqoop Libraries and Frameworks: PyTorch, TensorFlow, Numpy, Pandas, Langchain, LlamaIndex Deployment Platforms:                                         AWS Bedrock, AWS (EC2, Lambda),Hugging Face Spaces        AI/ML Techniques:                                                Fine-tuning with custom data, vector embedding, NLP,MLOPS        Generative AI Technologies



[32;1m[1;3mI have gathered relevant information about the candidate's experience with SQL, Python, and AWS from their resume.
Action: RAG System
Action Input: SQL, Python, AWS[0m
Observation: [36;1m[1;3m[Document(page_content="ANURAG LAHON (720)341-5435| Newark, Delaware| anurag.lahon@ucdenver.edu | LinkedIn | GitHub | Portfolio TECHNICAL SKILLS Database and Vector Database: PostgreSQL, MySQL, Cassandra, Pinecone,ChromaDB Data Visualization: Tableau, PowerBI, Excel, Plotly Tools: Jupyter, Git,AWS, SAS, Alteryx, Jira, Programming Languages: Python, R, SQL, JavaScript, HTML, CSS Big data Ecosystem: Hadoop, Hive, PySpark, HBase, Sqoop Libraries and Frameworks: PyTorch, TensorFlow, Numpy, Pandas, Langchain, LlamaIndex Deployment Platforms:                                         AWS Bedrock, AWS (EC2, Lambda),Hugging Face Spaces        AI/ML Techniques:                                                Fine-tuning with custom data, vector embedding, NLP,MLOPS        Generative AI Technol