# step 1: load CV(.pdf) 

In [1]:
from langchain_community.document_loaders.llmsherpa import LLMSherpaFileLoader

cv_loader = LLMSherpaFileLoader(file_path = "data/AbhishekVidhate_Resume.pdf",
                             new_indent_parser=True,
                             apply_ocr=True,
                             strategy="chunks",
                             )
cv_content = cv_loader.load()

In [2]:
cv_content

[Document(page_content='\n | Abhishek Vidhate | abhishekvidhate.dev@gmail.com\n | +91 7385194243\n | Pune\n | EDUCATION | B.Tech, Computer Science & Engineering VIT Bhopal University 2021 - 2025 Senior Secondary (XII) Sarosh Junior College (MAHARASHTRA STATE BOARD OF SECONDARY AND HIGHER SECONDARY EDUCATION board) Year of completion: 2021 Percentage: 93.42% Secondary (X) Stepping Stones High School, Aurangabad (CBSE board) Year of completion: 2019 Percentage: 93.44%\n | TRAININGS | Machine Learning Engineering For Production (MLOps) Specialization\n', metadata={'source': 'data/AbhishekVidhate_Resume.pdf', 'chunk_number': 0, 'chunk_type': 'table'}),
 Document(page_content='Deeplearning.ai (Coursera), Online Jan 2024 - Feb 2024\nhttps://coursera.org/share/03f3a8576e983222b93fe5c5f4 09b405', metadata={'source': 'data/AbhishekVidhate_Resume.pdf', 'chunk_number': 1, 'chunk_type': 'para'}),
 Document(page_content='Deeplearning.ai (Coursera), Online Jan 2024 - Feb 2024\nIn this course, I prio

In [3]:
from langchain_community.document_loaders import WebBaseLoader

web_loader = WebBaseLoader("https://wellfound.com/jobs/2872063-ai-research-internship")
job_content = web_loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
job_content

[Document(page_content='AI Research Internship at AI Planet • India • Remote (Work from Home) | Wellfound (formerly AngelList Talent)DiscoverFind JobsFor RecruitersLog InSign UpAI Research Internship\xa0(1+ years exp)₹1.8L – ₹2.4L • No equityPublished: 1 week agoSaveApply NowAI PlanetMaking secure, private and safe AI accessible for all to solve meaningful problems!Hyderabad11-50StartupPrivate CompanySaaSMachine LearningMarketplacesSee all jobs at AI Planet\xa0Job LocationRemote\xa0•\xa0IndiaJob TypeInternshipVisa SponsorshipNot AvailableRemote Work PolicyRemote onlyHires remotely EverywherePreferred TimezonesEastern Time, Central European Time, Indochina TimeRelocationAllowedSkillsPythonResearch and DevelopmentDeep Learning with TensorFlowPyTorchTransformers (BERT, GPT)LLMsLarge Language Models (LLMs)The RoleAbout AI Planet:\nAI Planet is an inclusive ecosystem with a mission to educate and build AI for all. With a thriving community of over 350,000 members, we offer a community-drive

In [5]:
print(type(job_content))

<class 'list'>


In [6]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

In [7]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
llm = ChatGroq(model_name="llama3-8b-8192",temperature=0)

# Define the prompt template
similarities_prompt_template = PromptTemplate(
    input_variables=["cv_content", "job_content"],
    template="""
    I have two documents: one is a user's CV and the other is a job posting. I want to identify the similarities between them in terms of relevant skills, experiences, and keywords that match the job requirements. Here are the documents:

    User's CV:
    {cv_content}

    Job Posting:
    {job_content}

    Analyze both documents and provide a detailed comparison highlighting the similarities. Specifically, identify the key skills and experiences from the user's CV that match the job requirements. Also, suggest any modifications to the CV to better align it with the job posting.
    """
)

prompt = similarities_prompt_template.format(cv_content=cv_content, job_content=job_content)

print(prompt)


    I have two documents: one is a user's CV and the other is a job posting. I want to identify the similarities between them in terms of relevant skills, experiences, and keywords that match the job requirements. Here are the documents:

    User's CV:
    [Document(page_content='\n | Abhishek Vidhate | abhishekvidhate.dev@gmail.com\n | +91 7385194243\n | Pune\n | EDUCATION | B.Tech, Computer Science & Engineering VIT Bhopal University 2021 - 2025 Senior Secondary (XII) Sarosh Junior College (MAHARASHTRA STATE BOARD OF SECONDARY AND HIGHER SECONDARY EDUCATION board) Year of completion: 2021 Percentage: 93.42% Secondary (X) Stepping Stones High School, Aurangabad (CBSE board) Year of completion: 2019 Percentage: 93.44%\n | TRAININGS | Machine Learning Engineering For Production (MLOps) Specialization\n', metadata={'source': 'data/AbhishekVidhate_Resume.pdf', 'chunk_number': 0, 'chunk_type': 'table'}), Document(page_content='Deeplearning.ai (Coursera), Online Jan 2024 - Feb 2024\nhttps

In [8]:
similarity_chain = similarities_prompt_template | llm

In [9]:
# Invoke the chain with the correct input format
similarities = similarity_chain.invoke({"cv_content": cv_content, "job_content": job_content})
print(similarities)

content='After analyzing both documents, I\'ve identified the following similarities:\n\n**Similarities:**\n\n1. **Machine Learning**: Both the user\'s CV and the job posting highlight machine learning as a key area of expertise. The user has mentioned machine learning as a skill, and the job posting requires proficiency in AI, machine learning, and particularly LLMs.\n2. **Python**: The user\'s CV mentions Python as a skill, and the job posting requires strong programming skills in Python.\n3. **Research and Development**: The user\'s CV mentions research and development as a skill, and the job posting requires conducting research and developing state-of-the-art Large Language Models.\n4. **Large Language Models (LLMs)**: The user\'s CV mentions LLMs as a skill, and the job posting requires hands-on experience working with large language models like LLAMA, BERT, or Transformer-based architectures.\n5. **Data Analysis**: The user\'s CV mentions data analysis as a skill, and the job pos

# step : generate new CV content 

In [10]:
new_cv_prompt_template = PromptTemplate(
    input_variables=["cv_content", "job_content", "similarities"],
    template="""
    System: You are an AI assistant that helps users by updating their CVs based on job postings and identified similarities. Your task is to create new CV content that incorporates relevant skills and experiences from the job posting into the user's CV. Here are the documents and the identified similarities:

    User's CV:
    {cv_content}

    Job Posting:
    {job_content}

    Similarities:
    {similarities}

    Please create a new version of the user's CV that highlights the relevant skills and experiences from the job posting. Ensure that the new CV content is tailored to match the job requirements.
    """
)

In [11]:
new_cv_chain = new_cv_prompt_template | llm

# Invoke the chain
new_cv_content = new_cv_chain.invoke({
    "cv_content": cv_content,
    "job_content": job_content,
    "similarities": similarities
})

In [12]:
print(new_cv_content)

content='Based on the similarities identified, I\'ve created a new version of the user\'s CV that highlights the relevant skills and experiences from the job posting. Here is the updated CV content:\n\n**Abhishek Vidhate**\n**Contact Information:**\n* Email: [abhishekvidhate.dev@gmail.com](mailto:abhishekvidhate.dev@gmail.com)\n* Phone: +91 7385194243\n* Location: Pune\n\n**EDUCATION:**\n* B.Tech, Computer Science & Engineering, VIT Bhopal University (2021-2025)\n* Senior Secondary (XII), Sarosh Junior College, Maharashtra State Board of Secondary and Higher Secondary Education (2021)\n* Secondary (X), Stepping Stones High School, Aurangabad, CBSE Board (2019)\n\n**TRAININGS:**\n* Machine Learning Engineering for Production (MLOps) Specialization, Coursera (2024)\n\n**PROJECTS:**\n* Data Analysis AI Assistant, Streamlit App (2024)\n\t+ Built a large language model-powered AI application that automates data cleaning, preprocessing, and complex operations like identifying target objects,

# step : ATS optimization of new CV content

In [13]:
# Define the ATS optimization prompt template
ats_prompt_template = PromptTemplate(
    input_variables=["new_cv_content"],
    template="""
    System: You are an AI assistant that helps users format their CVs to be ATS-friendly. Your task is to convert the new CV content into a format that is easily readable by Applicant Tracking Systems. Here is the new CV content:

    {new_cv_content}

    Please format this CV content to ensure it meets ATS requirements, including simple layout, keyword optimization, and clear section headers.
    """
)

### StrOutputParser

In [14]:
from langchain_core.output_parsers.string import StrOutputParser


In [15]:
ats_chain = ats_prompt_template | llm | StrOutputParser()

ats_friendly_cv_response = ats_chain.invoke({
    "new_cv_content": new_cv_content
})


In [19]:
# this AI message , we have to extract text/content for PDF generation
print(ats_friendly_cv_response) 

Here is the formatted CV content that meets ATS requirements:

**Abhishek Vidhate**

**Contact Information:**

* Email: [abhishekvidhate.dev@gmail.com](mailto:abhishekvidhate.dev@gmail.com)
* Phone: +91 7385194243
* Location: Pune

**Education:**

* B.Tech, Computer Science & Engineering, VIT Bhopal University (2021-2025)
* Senior Secondary (XII), Sarosh Junior College, Maharashtra State Board of Secondary and Higher Secondary Education (2021)
* Secondary (X), Stepping Stones High School, Aurangabad, CBSE Board (2019)

**Trainings:**

* Machine Learning Engineering for Production (MLOps) Specialization, Coursera (2024)

**Projects:**

* **Data Analysis AI Assistant**, Streamlit App (2024)
	+ Built a large language model-powered AI application that automates data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on user data.
	+ Results visualization and evaluation become seamless with this ap

## StructuredOutput Parser 

# PDF generation of new CV content

In [17]:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

# Function to create a PDF from the ATS-friendly CV content
def create_pdf(cv_content, file_path):
    c = canvas.Canvas(file_path, pagesize=letter)
    c.drawString(100, 750, "Resume / CV")
    text = c.beginText(50, 730)
    text.setFont("Helvetica", 12)
    for line in cv_content.split('\n'):
        text.textLine(line)
    c.drawText(text)
    c.save()

# Path to save the PDF
pdf_path = "data/new_cv.pdf"

# Create the PDF
create_pdf(ats_friendly_cv_response, pdf_path)
print(f"PDF saved at {pdf_path}")


PDF saved at new_cv.pdf
