# Cover Letter Generator
In this notebook we are going to create a Cover letter generator RAG using LangChain

## Workflow of Cover Letter Generator
1. Input Collection
* User uploads Resume (PDF/DOCX → parsed into text).
* User pastes Job Description (JD).

2. Keyword Extraction (LangChain pipeline #1)
* Use an LLM + PydanticOutputParser to extract:
    - role (e.g., "Data Scientist")
    - seniority (e.g., "Mid-level")
    - must_have (e.g., Python, SQL, Machine Learning)
    - nice_to_have (e.g., Cloud, Docker, NLP)
    - tools (e.g., TensorFlow, PyTorch, Tableau)
✅ Output = structured JDKeywords object.

3. Resume Analysis

* Parse the resume into sections: Experience, Skills, Projects, Education.
* Extract skills & experiences → again with a structured schema (e.g., ResumeProfile).

4. Keyword Matching

* Compare JD keywords vs Resume keywords.
* Mark:
    - ✅ Matches (strengths to emphasize).
    - ❌ Gaps (skills missing → handled carefully, not overclaimed).

5. Cover Letter Generation (LangChain pipeline #2)

* Prompt template uses:
    - JDKeywords (so letter aligns with employer’s needs).
    - ResumeProfile (so letter emphasizes relevant experience).
    - LLM writes a personalized cover letter, structured into:
    - Greeting
    - Hook (why the candidate is interested in this role/company)
    - Body (match candidate’s experience → JD requirements)
    - Closing (enthusiasm + call to action).

configure LLM

In [1]:
from dotenv import load_dotenv
import os
import sys
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults

class Settings:
    def __init__(self) -> None:
        sys.path.append(os.path.abspath(".."))
        self.gemini_api = os.environ.get("GOOGLE_API_KEY")
        self.tavily_api_key = os.environ.get("TAVILY_API_KEY")
        
    def load_gemini(self, temp: float = 0.5) -> ChatGoogleGenerativeAI:
        llm = ChatGoogleGenerativeAI(
            model = "gemini-1.5-flash",
            api_key = self.gemini_api,
            temperature = temp
        )
        print("LLM ready:", type(llm).__name__)
        return llm
    def load_gemma(self, temp: float = 0.5)->ChatOpenAI:
        """
        This method returns the local gemma3 model hosted by LM Studio.
        """
        llm = ChatOpenAI(
            model="google/gemma-3-4b",
            openai_api_key = 'lm-studio', # type: ignore
            openai_api_base="http://localhost:1234/v1", # type: ignore
            temperature=temp
        )
        return llm
    
    def load_tavily_search(self, max_results: int = 2) -> TavilySearchResults:
        return TavilySearchResults(max_results = max_results)
        

In [2]:
config = Settings()
llm = config.load_gemini()

LLM ready: ChatGoogleGenerativeAI


## Define the schemas
We will use this schemas to structure our outputs from LLMS

1. schema for extracting keywords

In [3]:
from pydantic import BaseModel, Field
from typing import List, Optional

class JDKeyWords(BaseModel):
    role: Optional[str] = Field(None, description="Role inferred from job description")
    seniority: Optional[str] = Field(None, description="Seniority level like junior, senior, associate etc")
    must_have: List[str] = Field(..., description="Critical must-have skills needed for the job")
    nice_to_have: List[str] = Field(default_factory=list, description="Optional skills")
    tools: List[str] = Field(default_factory=list, description="Software/tools needed for this job")

# ... means the field is required
# Optional[str] means it can be a string or None
# if tools is empty then we will return a []

In [4]:
# cover_letter schema
class CoverLetterOut(BaseModel):
    cover_letter: str = Field(... , description="Generated cover letter text")

## Creating Pipelines

* pipeline to extract keywords from job description

In [5]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser(pydantic_object=JDKeyWords)

prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are an AI tool that is responsible for extracting hiring signals from job description."
     "Return ONLY valid JSON that matches this schema: \n{format_instructions}"),
     ("human",
      "Job description:\n {job_description}\n"
      "Be precise. Keep lists concise and deduplicated")
]).partial(format_instructions=parser.get_format_instructions())

jd = prompt | llm | parser

In [6]:
job_desc = """
Job Description: Senior Full-Stack Developer
Company: InnovateTech Solutions
Location: San Francisco, CA (Hybrid Remote)
Job Type: Full-Time

About Us
At InnovateTech Solutions, we're building the next generation of SaaS tools that empower businesses to thrive. Our platform leverages cutting-edge AI and data analytics to provide actionable insights. Join our passionate team of engineers and play a key role in shaping our product's future.

The Role
We are seeking a highly skilled and motivated Senior Full-Stack Developer to design, develop, and implement robust software solutions. You will be involved in all stages of the product lifecycle, from concept to deployment, and will mentor junior developers on the team. This is a fantastic opportunity to make a significant impact on a product used by thousands.

Key Responsibilities
Design, code, test, and manage full-stack applications from the database to the UI.

Collaborate with product managers, designers, and other engineers to define, design, and ship new features.

Lead technical architecture discussions and make recommendations on system improvements.

Write clean, maintainable, and efficient code while following best practices.

Conduct code reviews and provide constructive feedback to team members.

Identify and troubleshoot complex performance and scalability issues.

Must-Have Qualifications
5+ years of professional experience in software development.

Frontend: Proven expertise with modern JavaScript frameworks, specifically React and its ecosystem (Redux, Webpack, Hooks).

Backend: Strong proficiency in Python and experience with web frameworks, specifically Django or FastAPI.

Database: Experience with both PostgreSQL and Redis.

Cloud & DevOps: Hands-on experience with AWS (EC2, S3, RDS, Lambda) and familiarity with Docker and CI/CD pipelines.

Solid understanding of RESTful API design principles.

Experience with version control using Git.

Nice-to-Have Qualifications
Experience with TypeScript.

Knowledge of GraphQL.

Familiarity with testing frameworks (e.g., Jest, Pytest, Cypress).

Understanding of agile/scrum development methodologies.

Previous experience in a startup or SaaS environment.

Experience with Kubernetes.

Tools You'll Use
Frontend: React, Redux Toolkit, TypeScript, Vite, Jest

Backend: Python, Django REST Framework, FastAPI, Celery

Database: PostgreSQL, Redis

Infrastructure: AWS, Docker, GitHub Actions, Terraform

Collaboration: Jira, Slack, Figma, Confluence

What We Offer
Competitive salary and equity package.

Comprehensive health, dental, and vision insurance.

401(k) with company matching.

Flexible work schedule and generous PTO.

Professional development budget.

A collaborative, inclusive, and innovative culture.

How to Apply

If you are excited about this opportunity, please apply with your resume and a link to your GitHub profile or portfolio.
"""

In [7]:
def generate_jd_keywords(job_desc: str, llm)->JDKeyWords:

    parser = JsonOutputParser(pydantic_object=JDKeyWords)
    prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are an AI tool that is responsible for extracting hiring signals from job description."
     "Return ONLY valid JSON that matches this schema: \n{format_instructions}"),
     ("human",
      "Job description:\n {job_description}\n"
      "Be precise. Keep lists concise and deduplicated")
    ]).partial(format_instructions=parser.get_format_instructions())
    jd = prompt | llm | parser
    result = jd.invoke({"job_description": job_desc})
    
    return result    


In [29]:
import json

#llm2 = config.load_gemma()

result1 = generate_jd_keywords(job_desc , llm)
print(json.dumps(result1 , indent=2))

{
  "role": "Senior Full-Stack Developer",
  "seniority": "Senior",
  "must_have": [
    "5+ years of professional experience in software development",
    "Proven expertise with modern JavaScript frameworks, specifically React and its ecosystem (Redux, Webpack, Hooks)",
    "Strong proficiency in Python and experience with web frameworks, specifically Django or FastAPI",
    "Experience with both PostgreSQL and Redis",
    "Hands-on experience with AWS (EC2, S3, RDS, Lambda) and familiarity with Docker and CI/CD pipelines",
    "Solid understanding of RESTful API design principles",
    "Experience with version control using Git"
  ],
  "nice_to_have": [
    "Experience with TypeScript",
    "Knowledge of GraphQL",
    "Familiarity with testing frameworks (e.g., Jest, Pytest, Cypress)",
    "Understanding of agile/scrum development methodologies",
    "Previous experience in a startup or SaaS environment",
    "Experience with Kubernetes"
  ],
  "tools": [
    "React",
    "Redux Tool

## Pipeline to extract the contents of the resume

In [24]:
class ResumeProfile(BaseModel):
    name: str = Field(..., description="Name of the cantidate applying for the job")
    contact: List[str] = Field(default_factory=list , description="contact informations such as email, phone etc")
    education: List[str] = Field(default_factory=list, description="Contains the education information of the cantidate.")
    experience: List[str] = Field(default_factory=list ,  description="Work experience of the cantidate mentioned in resume, with a short description")
    skills: List[str] = Field(..., description="Skills of the cantidate mentioned in the resume.")
    projects: List[str] = Field(default_factory=list , description="Projects done by the candidate with a short description.")

In [14]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("resume.pdf")

pages = loader.load()
resume_text = "\n".join([p.page_content for p in pages])

In [16]:
class ResumeReport(BaseModel):
    matched_skills: List[str] = Field(... , description="Skills that are needed in job description and present in resume")
    missed_skills: List[str] = Field(default_factory=list , description="Skills needed in job description but not present in resume.")
    phrasing_suggestions: List[str] = Field(
        default_factory=list, description="Concrete bullet suggestions to add"
    )
    relevance_score: int = Field(..., ge=0, le=100)

In [26]:
def get_resume_profile(resume: str , llm)->ResumeProfile:

    parser = JsonOutputParser(pydantic_object=ResumeProfile)
    prompt = ChatPromptTemplate.from_messages([
        ("system",
         "You are an AI tool that analyzes a resume in the form of string and parse information in a structured format."
         "Include a brief and small description about the experience(if any) and projects(if any) done by the user"
         "Return ONLY valid JSON that matches this schema:\n{format_instructions}"),
         ("human",
          "Below contains the resume:\n\n {resume_text}")
    ]).partial(format_instructions=parser.get_format_instructions())

    chain = prompt | llm | parser
    result = chain.invoke({"resume_text":resume})

    return result



In [27]:
result = get_resume_profile(resume_text , llm)
print(json.dumps(result , indent=2))

{
  "name": "Amal Varghese",
  "contact": [
    "Kerala, India",
    "LinkedIn",
    "+919207506741",
    "officialamalv2004@gmail.com"
  ],
  "education": [
    "Model Engineering College, Trikkakara Kerala, India Bachelor of Technology (GPA: 9.33) September 2023 \u2013 Present"
  ],
  "experience": [
    "IEDC MEC Kerala, India Tech Team Member September 2020 \u2013 Present \u25cf Collaborated in the development and maintenance of EventSync, a comprehensive event management platform designed to manage multiple events with over 1000 users. \u25cf Contributed to the development of the user interface and helped scale the website's architecture.",
    "CSRBOX & IBM SkillsBuild Remote Web Development Intern June 2024 - August 2024 \u25cf Selected for and completed a competitive 6-week web development internship program. \u25cf Developed BookBridge, a full-featured, serverless book donation platform, as the primary internship project.",
    "Wrench Solutions Kochi, Kerala Machine Learning 

In [32]:
dummy_resume = """
Johnathan Chen
San Francisco, CA | (555) 123-4567 | johnathan.chen@email.com | linkedin.com/in/johnathanchen | github.com/jchen-dev

Summary
Senior Full-Stack Developer with 6 years of experience building scalable web applications in fast-paced startup environments. Proven expertise in modern JavaScript (React) and Python (Django, FastAPI) stacks. Passionate about clean architecture, mentorship, and leveraging AWS to build efficient, cloud-native solutions.

Technical Skills
Languages: JavaScript (ES6+), TypeScript, Python, SQL, HTML5, CSS3

Frontend: React, Redux, Redux Toolkit, Context API, Vite, Webpack, Jest, Cypress

Backend: Django, Django REST Framework, FastAPI, Flask, Celery

Databases: PostgreSQL, Redis, SQLite

Cloud & DevOps: AWS (EC2, S3, RDS, Lambda, IAM), Docker, GitHub Actions, CI/CD, Terraform

Tools & Methods: Git, Jira, Agile/Scrum, Figma, Confluence, Slack

Professional Experience
Senior Software Engineer | TechNova Inc., San Francisco, CA | June 2020 – Present

Led the end-to-end development of a new B2B SaaS analytics dashboard using React, TypeScript, and FastAPI, resulting in a 30% increase in user engagement.

Migrated legacy monolithic Django application to a microservices architecture, improving system scalability and reducing API response time by 40%.

Designed and implemented CI/CD pipelines with GitHub Actions and Docker, automating testing and deployment processes.

Mentored 3 junior developers on best practices for React state management, Python code quality, and AWS security.

Key Technologies: Python, FastAPI, React, TypeScript, PostgreSQL, Redis, AWS (EC2, S3, Lambda), Docker, Jest

Full-Stack Developer | StartUpGrid, Austin, TX | July 2018 – May 2020

Developed and maintained core features for a project management platform using Django REST Framework and React.

Built real-time notification features using WebSockets and Redis for pub/sub.

Optimized database queries on PostgreSQL, reducing page load times by over 25%.

Collaborated in an Agile team, participating in sprint planning, code reviews, and daily standups.

Key Technologies: Python, Django, Django REST Framework, React, JavaScript, PostgreSQL, Redis, Jira

Projects
CI/CD Pipeline Automator | github.com/jchen-dev/cicd-automator

A custom Docker and GitHub Actions pipeline for automating testing, building, and deployment of Python/JS applications to AWS.

Reduced deployment time by 70% and eliminated manual deployment errors.

Education
Bachelor of Science in Computer Science | University of Texas at Austin | *2014 – 2018*
"""
result2 = get_resume_profile(dummy_resume , llm)
print(json.dumps(result2 , indent=2))

{
  "name": "Johnathan Chen",
  "contact": [
    "San Francisco, CA",
    "(555) 123-4567",
    "johnathan.chen@email.com",
    "linkedin.com/in/johnathanchen",
    "github.com/jchen-dev"
  ],
  "education": [
    "Bachelor of Science in Computer Science | University of Texas at Austin | *2014 \u2013 2018*"
  ],
  "experience": [
    "Senior Software Engineer | TechNova Inc., San Francisco, CA | June 2020 \u2013 Present\n\nLed the end-to-end development of a new B2B SaaS analytics dashboard using React, TypeScript, and FastAPI, resulting in a 30% increase in user engagement.\n\nMigrated legacy monolithic Django application to a microservices architecture, improving system scalability and reducing API response time by 40%.\n\nDesigned and implemented CI/CD pipelines with GitHub Actions and Docker, automating testing and deployment processes.\n\nMentored 3 junior developers on best practices for React state management, Python code quality, and AWS security.",
    "Full-Stack Developer | 

In [None]:
def generate_jd_keywords(job_desc: str, llm)->JDKeyWords:

    parser = JsonOutputParser(pydantic_object=JDKeyWords)
    prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are an AI tool that is responsible for extracting hiring signals from job description."
     "Return ONLY valid JSON that matches this schema: \n{format_instructions}"),
     ("human",
      "Job description:\n {job_description}\n"
      "Be precise. Keep lists concise and deduplicated")
    ]).partial(format_instructions=parser.get_format_instructions())
    jd = prompt | llm | parser
    result = jd.invoke({"job_description": job_desc})
    
    return result    


In [30]:
def generate_resume_report(job_desc: JDKeyWords , resume: ResumeProfile, llm)-> ResumeReport:
    parser = JsonOutputParser(pydantic_object=ResumeReport)
    prompt = ChatPromptTemplate.from_messages([
        ("system",
         "You are an AI tool that takes a structured JSON describing a job description and a Resume."
         "Your task is to analyze both to find matching skills and missing skills and make phrasing suggestions"
         "You must make the suggestions in concrete bullet points"
         "You must make a relevance_score of the resume and job description with a value between 0 and 100"
         "Return ONLY valid JSON that matches this schema: \n{format_instructions}"),
         ("human",
          "Keywords of Job Description:\n\n{job_description}"
          "Structured information of the resume:\n\n{resume}"
          "Be precise. Keep lists concise and deduplicated")
    ]).partial(format_instructions = parser.get_format_instructions())

    chain = prompt | llm | parser
    result = chain.invoke({"job_description": job_desc,"resume":resume})

    return result

In [33]:
result = generate_resume_report(result1 , result2 , llm)
print(json.dumps(result , indent=2))

{
  "matched_skills": [
    "5+ years of professional experience in software development",
    "React",
    "Redux",
    "TypeScript",
    "Python",
    "Django",
    "FastAPI",
    "PostgreSQL",
    "Redis",
    "AWS (EC2, S3, RDS, Lambda)",
    "Docker",
    "CI/CD pipelines",
    "Git",
    "Agile/Scrum"
  ],
  "missed_skills": [
    "Webpack",
    "Redux Toolkit",
    "RESTful API design principles",
    "GitHub Actions",
    "Testing frameworks (Jest, Pytest, Cypress)",
    "Kubernetes",
    "GraphQL",
    "Terraform"
  ],
  "phrasing_suggestions": [
    "\u2022  Proficient in using Webpack for module bundling and optimization in React projects.",
    "\u2022  Experienced with Redux Toolkit for efficient state management in React applications.",
    "\u2022  Deep understanding of RESTful API design principles and best practices.",
    "\u2022  Successfully implemented and maintained CI/CD pipelines using GitHub Actions.",
    "\u2022  Proficient in various testing frameworks such 