In [2]:
job_desc = """
Job Title: Software Engineer

Job Description:  
We are seeking a passionate and talented Software Engineer to join our dynamic team. In this role, you will design, develop, and maintain scalable software solutions to solve complex problems. You will collaborate with cross-functional teams to deliver high-quality applications and services.

Responsibilities:  
- Develop, test, and deploy software applications using modern programming languages and frameworks.  
- Work closely with product managers and designers to gather and refine requirements.  
- Write clean, efficient, and maintainable code, adhering to best practices.  
- Debug, troubleshoot, and optimize applications for performance and scalability.  
- Collaborate with team members in an Agile environment.

Qualifications:  
- Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent experience.  
- Proficiency in one or more programming languages (e.g., Python, Java, C++).  
- Experience with software development tools, frameworks, and methodologies.  
- Strong problem-solving skills and attention to detail.  
- Excellent communication and teamwork abilities.

Join us and be part of a team that values innovation, growth, and collaboration!
"""

resumes = [
    "John Doe\nSoftware Engineer\njohn.doe@example.com | (123) 456-7890 | linkedin.com/in/johndoe\n\nSummary:\nExperienced software engineer with expertise in developing scalable web applications, strong knowledge of Python and JavaScript, and a passion for solving complex problems.\n\nSkills:\n- Programming Languages: Python, JavaScript, Java\n- Frameworks: Django, React, Spring Boot\n- Tools: Git, Docker, Kubernetes\n- Databases: PostgreSQL, MongoDB\n\nExperience:\nSoftware Engineer | ABC Tech | June 2020 - Present\n- Built and maintained scalable APIs to support high-traffic e-commerce platforms.\n- Led migration of a monolithic application to a microservices architecture, reducing downtime by 30%.\n\nEducation:\nB.S. in Computer Science | University of XYZ | May 2020",
    
    "Jane Smith\nData Scientist\njane.smith@example.com | (987) 654-3210 | github.com/janesmith\n\nSummary:\nData scientist with a strong background in machine learning, statistical modeling, and data visualization. Skilled in Python, R, and SQL with experience in predictive analytics.\n\nSkills:\n- Machine Learning: Scikit-learn, TensorFlow, PyTorch\n- Data Visualization: Tableau, Matplotlib, Seaborn\n- Databases: MySQL, PostgreSQL\n- Tools: Jupyter, Excel, Git\n\nExperience:\nData Scientist | DataCorp | March 2018 - Present\n- Developed machine learning models to predict customer churn, improving retention by 20%.\n- Automated ETL pipelines to streamline data processing, saving 15 hours of manual work weekly.\n\nEducation:\nM.S. in Data Science | University of ABC | December 2017",
    
    "Michael Brown\nFull-Stack Developer\nmichael.brown@example.com | (555) 123-4567 | michaelbrown.dev\n\nSummary:\nFull-stack developer with 5+ years of experience building responsive web applications and services. Proficient in JavaScript, TypeScript, and modern frameworks like React and Node.js.\n\nSkills:\n- Frontend: HTML, CSS, JavaScript, React\n- Backend: Node.js, Express, Python\n- Databases: MongoDB, PostgreSQL\n- Tools: Docker, AWS, Webpack\n\nExperience:\nFull-Stack Developer | XYZ Solutions | August 2019 - Present\n- Designed and implemented a customer management system used by over 10,000 users.\n- Improved application load times by 40% through optimized code and caching strategies.\n\nEducation:\nB.S. in Software Engineering | State University | May 2017",
    
    "Emily Johnson\nDevOps Engineer\nemily.johnson@example.com | (444) 789-0123 | emilyjohnson.dev\n\nSummary:\nDevOps engineer with 4+ years of experience in cloud infrastructure, CI/CD pipelines, and container orchestration. Skilled in AWS, Kubernetes, and Terraform.\n\nSkills:\n- Cloud Platforms: AWS, Azure\n- Tools: Docker, Kubernetes, Terraform, Jenkins\n- Scripting: Bash, Python\n- Monitoring: Prometheus, Grafana\n\nExperience:\nDevOps Engineer | CloudTech | July 2020 - Present\n- Automated infrastructure deployment using Terraform, reducing setup time by 50%.\n- Implemented CI/CD pipelines for microservices, accelerating deployments by 70%.\n\nEducation:\nB.S. in Information Technology | Tech University | May 2016",
    
    "Sophia Williams\nUX Designer\nsophia.williams@example.com | (333) 456-7890 | behance.net/sophiawilliams\n\nSummary:\nUX designer with a passion for creating user-centered designs and improving user experiences. Proficient in Figma, Adobe XD, and usability testing.\n\nSkills:\n- Design Tools: Figma, Adobe XD, Sketch\n- Research: Usability Testing, A/B Testing\n- Prototyping: InVision, Axure\n- Frontend: HTML, CSS, JavaScript\n\nExperience:\nUX Designer | Creative Studio | April 2019 - Present\n- Redesigned mobile app interfaces, resulting in a 25% increase in user engagement.\n- Conducted user research sessions to identify pain points and improve workflows.\n\nEducation:\nB.A. in Graphic Design | Design Institute | May 2018"
]

### TF-IDF for keyword extraction

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer
from pprint import pprint
import re

In [11]:
corpus = [resumes[0], job_desc]

# Remove numbers before extracting keywords
corpus = [re.sub(r"\d+", "", text) for text in corpus]

vectorizer = TfidfVectorizer(stop_words="english")
tfidf_matrix = vectorizer.fit_transform(corpus)

feature_names = vectorizer.get_feature_names_out()
tfidf_scores = tfidf_matrix.toarray()

feature_names

array(['abc', 'abilities', 'adhering', 'agile', 'apis', 'application',
       'applications', 'architecture', 'attention', 'bachelor', 'best',
       'boot', 'built', 'clean', 'closely', 'code', 'collaborate',
       'collaboration', 'com', 'commerce', 'communication', 'complex',
       'computer', 'cross', 'databases', 'debug', 'degree', 'deliver',
       'deploy', 'description', 'design', 'designers', 'develop',
       'developing', 'development', 'django', 'docker', 'doe', 'downtime',
       'dynamic', 'education', 'efficient', 'engineer', 'engineering',
       'environment', 'equivalent', 'example', 'excellent', 'experience',
       'experienced', 'expertise', 'field', 'frameworks', 'functional',
       'gather', 'git', 'growth', 'high', 'innovation', 'java',
       'javascript', 'job', 'john', 'johndoe', 'join', 'june',
       'knowledge', 'kubernetes', 'languages', 'led', 'linkedin',
       'maintain', 'maintainable', 'maintained', 'managers', 'members',
       'methodologies', '

In [12]:
resume_keywords = [
    (feature_names[i], tfidf_scores[0][i])
    for i in tfidf_scores[0].argsort()[::-1]
]
print("Top Resume Keywords:")
pprint(resume_keywords)

Top Resume Keywords:
[('software', 0.2405245183143361),
 ('engineer', 0.2405245183143361),
 ('javascript', 0.22536587875688172),
 ('com', 0.22536587875688172),
 ('doe', 0.22536587875688172),
 ('john', 0.22536587875688172),
 ('python', 0.1603496788762241),
 ('scalable', 0.1603496788762241),
 ('migration', 0.11268293937844086),
 ('microservices', 0.11268293937844086),
 ('maintained', 0.11268293937844086),
 ('mongodb', 0.11268293937844086),
 ('linkedin', 0.11268293937844086),
 ('monolithic', 0.11268293937844086),
 ('led', 0.11268293937844086),
 ('knowledge', 0.11268293937844086),
 ('passion', 0.11268293937844086),
 ('june', 0.11268293937844086),
 ('platforms', 0.11268293937844086),
 ('postgresql', 0.11268293937844086),
 ('johndoe', 0.11268293937844086),
 ('xyz', 0.11268293937844086),
 ('expertise', 0.11268293937844086),
 ('git', 0.11268293937844086),
 ('experienced', 0.11268293937844086),
 ('example', 0.11268293937844086),
 ('education', 0.11268293937844086),
 ('downtime', 0.1126829393784

TF-IDF isn't ideal for matching keywords between a resume and a job description, since TF-IDF assigns higher scores for words that are *unique*. If a word appears in both the resume and job description, its *uniqueness* score is decreased.

### Using NLP libraries

Part of speech tagging

In [16]:
import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp(resumes[0])

In [17]:
doc

John Doe
Software Engineer
john.doe@example.com | (123) 456-7890 | linkedin.com/in/johndoe

Summary:
Experienced software engineer with expertise in developing scalable web applications, strong knowledge of Python and JavaScript, and a passion for solving complex problems.

Skills:
- Programming Languages: Python, JavaScript, Java
- Frameworks: Django, React, Spring Boot
- Tools: Git, Docker, Kubernetes
- Databases: PostgreSQL, MongoDB

Experience:
Software Engineer | ABC Tech | June 2020 - Present
- Built and maintained scalable APIs to support high-traffic e-commerce platforms.
- Led migration of a monolithic application to a microservices architecture, reducing downtime by 30%.

Education:
B.S. in Computer Science | University of XYZ | May 2020

In [20]:
keywords = [re.sub(f"[^a-zA-Z]", "", token.text) for token in doc if token.pos_ in ("NOUN", "VERB")]
keywords

['|',
 '|',
 'Experienced',
 'software',
 'engineer',
 'expertise',
 'developing',
 'web',
 'applications',
 'knowledge',
 'passion',
 'solving',
 'problems',
 'Skills',
 'Languages',
 'MongoDB',
 'Experience',
 '|',
 '|',
 'Built',
 'maintained',
 'APIs',
 'support',
 'traffic',
 'e',
 '-',
 'commerce',
 'platforms',
 'Led',
 'migration',
 'application',
 'microservices',
 'architecture',
 'reducing',
 'downtime',
 '%',
 '|',
 '|']