# Resume Recomendation system
Using the Gemini AI model, we generated an analysis of resumes by providing a detailed prompt. The system then sorted the candidates based on their match percentage to the job description, highlighting matched and missing keywords for each resume.

**Setup and Installation**
- pdf2image
- docx2txt
- PyPDF2
- google-generativeai

In [None]:
!pip install pdf2image docx2txt PyPDF2

Collecting pdf2image
  Downloading pdf2image-1.17.0-py3-none-any.whl.metadata (6.2 kB)
Collecting docx2txt
  Downloading docx2txt-0.8.tar.gz (2.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pdf2image-1.17.0-py3-none-any.whl (11 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: docx2txt
  Building wheel for docx2txt (setup.py) ... [?25l[?25hdone
  Created wheel for docx2txt: filename=docx2txt-0.8-py3-none-any.whl size=3959 sha256=52b54c2307036bc0375ca094aededb26e24a319ec1e69c76296a0f70b907d0b0
  Stored in directory: /root/.cache/pip/wheels/22/58/cf/093d0a6c3ecfdfc5f6ddd5524043b88e59a9a199cb02352966
Successfully built docx2txt
Installing collected packages: docx2txt, PyPDF2, pdf2image
Successfully installed PyPDF2-

In [None]:
!pip install google-generativeai



In [None]:
# google-generativeai
# python-dotenv
# pdf2image
# docx2txt
# PyPDF2


In [None]:
# import streamlit as st
import google.generativeai as genai
import os
import json
import requests
import docx2txt
import PyPDF2 as pdf
# from dotenv import load_dotenv
# from streamlit_lottie import st_lottie
import time

In [None]:
from google.colab import userdata
API_KEY=userdata.get('GEMINI_API')

API_KEY is the Gemini API key

This code configure and utilize a generative AI model from Google’s Gemini to analyze resumes against job descriptions.

**Configure the Generative AI Model:**

In [None]:

# Configure the generative AI model with the Google API key
genai.configure(api_key=API_KEY)


**Set Up Model Configuration for Text Generation:**

- temperature: Controls the creativity of the output (lower values make the output more focused and deterministic).
- top_p and top_k: Influence the diversity of the output by controlling the sampling from the model’s vocabulary.
- max_output_tokens: Limits the length of the generated text.

In [None]:
# Set up the model configuration for text generation
generation_config = {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 4096,
}

**Define Safety Settings for Content Generation:**

Specifies categories of harmful content to block based on predefined safety thresholds.

In [None]:
# Define safety settings for content generation
safety_settings = [
    {"category": f"HARM_CATEGORY_{category}", "threshold": "BLOCK_MEDIUM_AND_ABOVE"}
    for category in ["HARASSMENT", "HATE_SPEECH", "SEXUALLY_EXPLICIT", "DANGEROUS_CONTENT"]
]


**Generate Response Function:**

generate_response_from_gemini: Uses the Gemini model to generate text based on the provided input. Configured with generation parameters and safety settings.

In [None]:


def generate_response_from_gemini(input_text):
     # Create a GenerativeModel instance with 'gemini-pro' as the model type
    llm = genai.GenerativeModel(
    model_name="gemini-pro",
    generation_config=generation_config,
    safety_settings=safety_settings,
    )
    # Generate content based on the input text
    output = llm.generate_content(input_text)
    # Return the generated text
    return output.text



---



**Extract Text from PDF and DOCX Files:**

This function extracts text from a PDF file using the PdfReader library.

In [None]:
def extract_text_from_pdf_file(uploaded_file):
    # Use PdfReader to read the text content from a PDF file
    pdf_reader = pdf.PdfReader(uploaded_file)
    text_content = ""
    for page in pdf_reader.pages:
        text_content += str(page.extract_text())
    return text_content

This function extracts text from a DOCX file using the docx2txt library.

In [None]:
def extract_text_from_docx_file(uploaded_file):
    # Use docx2txt to extract text from a DOCX file
    return docx2txt.process(uploaded_file)


# Job Description

In [None]:
job_description = '''About the job
EdgeVerve is a global leader in AI, Automation, and Analytics. It is a wholly-owned subsidiary of Infosys. EdgeVerve develops innovative software products and offers them on-premise or as cloud-hosted business platforms. Its technology empowers enterprises globally to bring life to their digital transformation initiatives. EdgeVerve aims to create a world where its technology augments human intelligence and creates possibilities for enterprises to thrive

 Product :

EDGE Platform, part of Infosys Topaz, bridges silos in people, processes, data, and technology for enterprises, amplifying the value of their existing digital core investments.
EDGE Platform platform acts as a transformative overlay, seamlessly integrating with enterprises' existing systems. It enables businesses to adopt cutting-edge capabilities, enhance operational efficiency, and unlock new opportunities for growth and innovation, all without disrupting their foundational systems.

Role : Product Manager

Responsibilities :

 - Partner with Internal and External stakeholders and contribute in building overall product strategy
 - Helps to define the product vision
 - Communicate with all stakeholders, prepare project plans, release notes, etc., and conduct timely business-technology governance meetings
 - Work with engineering /UX teams to provide comprehensive business solutions
 - B.tech/MBA


 Experience Required :


 - Prior Experience in AI enabled / using products - NLP or Computer vision or ML or AI
 - Familiarity with various generative AI models, such as generative adversarial networks (GANs), variational autoencoders (VAEs), and transformer models, is important to understand their strengths, limitations, and use cases.
 - Experience in transformer models like BERT, ContractBERT
 - Machine Learning, AI, Language models experience, or experience in AI builder products.
 - Limited experience in Data science and data engineering is a plus
 - Scaled agile product management principles
Qualifications'''

# Prompt Template for Resume Analysis:

- input_prompt_template: A detailed prompt for the generative AI model to analyze and compare resumes against a job description. The response should include:
- "Job Description Match": Percentage match between resume and job description.
- "Missing Keywords": Keywords missing from the resume.
- "Matched Keywords": Keywords present in both resume and job description.
- "Candidate Summary": Summary of the candidate’s qualifications.
- "Experience": Relevant experience mentioned in the resume.

In [None]:

# Prompt Template
input_prompt_template = """
As an experienced Applicant Tracking System (ATS) analyst, with profound knowledge in technology,
 software engineering, data analyst,data science,ML engineer,python engineer,AI engineer, full stack web development,
  cloud engineering, cloud development, DevOps engineering, and big data engineering, your role involves evaluating
   resume against job description. Recognizing the competitive job market, provide top-notch assistance for resume
   improvement. Your goal is to analyze each resume against the given job description, assign a percentage match
    based on key criteria, and pinpoint missing keywords and give matched keywords and any expereince
    mentioned in resume accurately.

resume:{text}
description:{job_description}
I want the response in one single string having the structure
{{"Job Description Match":"%", "Missing Keywords":"", "Matched Keywords": "", "Candidate Summary":"", "Experience":""}}

Order the resumes by their match percentage in descending order.
"""


**Extract Text from Resumes:**

In [None]:
resume_list=["Vihan_IIT Kanpur_Resume.pdf.pdf","210867_riya_silotiya_bt_ee.pdf"]

In [None]:
resume_list

['Vihan_IIT Kanpur_Resume.pdf.pdf', '210867_riya_silotiya_bt_ee.pdf']

In [None]:
all_resume_text=[]
for resume in resume_list:
    resume_text = extract_text_from_pdf_file(resume)
    all_resume_text.append(resume_text)
print(len(all_resume_text))

2


- resume_list: Contains the filenames of the resumes.
- extract_text_from_pdf_file: Function to extract text from each PDF file.
- all_resume_text: List to store the text content of all resumes.
- print(len(all_resume_text)): Prints the number of resumes processed.

In [None]:
all_resume_text[1]

'Riya Silotiya riyas21@iitk.ac.in \n3rd Year Undergraduate Riya Silotiya \nBachelor of Technology(B.Tech) RiyaSilotiya /gtb\nDepartment of Electrical Engineering +91-8306957149 /ne\nAcademic Qualification\nYear Degree/Certificate Institute CPI/%\n2021 - Present B.Tech Indian Institute of Technology Kanpur 6.9/10\n2020 CBSE - AISSCE(XII) Railway Senior Secondary School, Bandikui 92.4%\n2018 CBSE - AISSE(X) Bandikui Public School, Bandikui 96.8%\nWork Experience\n➤Battery Health Modelling: A Data-Driven Approach | SURGE’23 Intern |Prof.Swathi Battula /gtb (May’23-July’23)\nObjective •To predict the State of Health (SoH) of a battery using Recurrent Neural Networks (RNNs).\nApproach•Extracted data from MATLAB files containing battery performance data from the NASA Data Repository.\n•Performed feature selection anddata visualization to identify the crucial features for predicting SOH.\n•Developed a parallel RNNarchitecture using three different RNNs: Vanilla RNN (SimpleRNN), Long\nShort-Te

# Generate Responses Using the Generative AI Model:

- generate_response_from_gemini: Function to generate a response from the AI model based on the resume text and job description.
- input_prompt_template: The prompt template used to guide the AI model.
- all_response: List to store responses generated for each resume.

In [None]:
all_response=[]
for i in range(len(all_resume_text)):
    response_text = generate_response_from_gemini(input_prompt_template.format(text=all_resume_text[i], job_description=job_description))
    all_response.append(response_text)

In [None]:
all_response[0]

'{"Job Description Match":"70%","Missing Keywords":"Data science, Data engineering, Scaled agile product management principles","Matched Keywords":"AI, NLP, Computer vision, ML, Generative AI models, GANs, VAEs, Transformer models, BERT, ContractBERT, Machine Learning, Language models, AI builder products","Candidate Summary":"Vihan Kochatta is a Senior Undergraduate at the Indian Institute of Technology Kanpur, pursuing a Bachelor of Science in Chemistry. With a strong academic record and a passion for technology, software engineering, and data science, Vihan has gained valuable experience through internships at DeepInsights and Redopact. As a Software Engineer at DeepInsights, Vihan developed a program to acquire clients\' Zoom access tokens, enabling access to call recordings for storage. At Redopact, he worked as a Web Developer Intern, contributing to the development of responsive web pages for a product-based site. Vihan\'s technical skills include C/C++, JavaScript, Node.js, Rea

In [None]:
response_text = generate_response_from_gemini(input_prompt_template.format(text=resume_text, job_description=job_description))

In [None]:
response_text

'{"Job Description Match":"70%","Missing Keywords":"Data science, Data engineering, Scaled agile product management principles","Matched Keywords":"AI, NLP, Computer vision, ML, Generative AI models, GANs, VAEs, Transformer models, BERT, ContractBERT, Machine Learning, Language models, AI builder products","Candidate Summary":"Riya Silotiya is a 3rd-year undergraduate student pursuing a Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology Kanpur. She has a strong academic record, with a CPI of 6.9/10. Riya has a keen interest in AI, ML, and data science, as evidenced by her work experience and projects. She has experience in developing AI models for battery health prediction, brain tumor segmentation, and security price prediction. Riya is also proficient in programming languages such as C, C++, Python, Matlab, HTML, SQL, and DSA, as well as utilities and frameworks such as Numpy, Pandas, Tensorflow, Keras, Matplotlib, Scikit-Learn, PyTorch, LATEX, a

The function extract_info processes the AI-generated responses to extract specific information like matched keywords, missing keywords, match percentage, and candidate summary.

This data is then stored in a list of dictionaries.

In [None]:
def extract_info(responses):
    """
    Extracts information from AI-generated responses and stores it in a list of dictionaries.

    Args:
    - responses (list of str): List of response strings from the AI model.

    Returns:
    - list_of_dict (list of dict): List containing dictionaries with extracted information.
    """

    for response in responses:
        match_percentage_str = response.split('"Job Description Match":"')[1].split('"')[0]

        Missing_Keywords = response.split('"Missing Keywords":"')[1].split('"')[0]
        matched_keywords = response.split('"Matched Keywords":"')[1].split('"')[0]
        Candidate_Summary = response.split('"Candidate Summary":"')[1].split('"')[0]
        list_of_dict.append({"matched_keywords":matched_keywords,"Missing_Keywords":Missing_Keywords,"match_percentage":match_percentage_str,"Candidate_Summary":Candidate_Summary})
        # yield matched_keywords, Missing_Keywords, match_percentage


Key Components:

responses: A list of response strings generated by the AI.

extract_info: Extracts key information from each response string.

match_percentage_str: The percentage match between the resume and job description.

missing_keywords: Keywords that were expected but missing in the resume.

matched_keywords: Keywords that were found in both the resume and job description.

candidate_summary: A summary of the candidate's qualifications.

list_of_dict: A list where each element is a dictionary containing the extracted information for one resume.

In [None]:
list_of_dict=[]
extract_info(all_response)

In [None]:
list_of_dict

[{'matched_keywords': 'AI, NLP, Computer vision, ML, Generative AI models, GANs, VAEs, Transformer models, BERT, ContractBERT, Machine Learning, Language models, AI builder products',
  'Missing_Keywords': 'Data science, Data engineering, Scaled agile product management principles',
  'match_percentage': '70%',
  'Candidate_Summary': "Vihan Kochatta is a Senior Undergraduate at the Indian Institute of Technology Kanpur, pursuing a Bachelor of Science in Chemistry. With a strong academic record and a passion for technology, software engineering, and data science, Vihan has gained valuable experience through internships at DeepInsights and Redopact. As a Software Engineer at DeepInsights, Vihan developed a program to acquire clients' Zoom access tokens, enabling access to call recordings for storage. At Redopact, he worked as a Web Developer Intern, contributing to the development of responsive web pages for a product-based site. Vihan's technical skills include C/C++, JavaScript, Node.j

This code sorts the list of dictionaries (list_of_dict) by the match_percentage in descending order and then prints the sorted information.

In [None]:

       # Sort the list of dictionaries by match_percentage in descending order
sorted_list_of_dict = sorted(list_of_dict, key=lambda x: x['match_percentage'], reverse=True)

    # return sorted_list_of_dict

# # Print the sorted results
for idx, resume in enumerate(sorted_list_of_dict):
    print(f"Rank {idx + 1}:")
    print(f"Match Percentage: {resume['match_percentage']}")
    print(f"Candidate Summary: {resume['Candidate_Summary']}")
    print(f"Matched Keywords: {resume['matched_keywords']}")
    print(f"Missing Keywords: {resume['Missing_Keywords']}")
    print("\n")


Rank 1:
Match Percentage: 70%
Candidate Summary: Vihan Kochatta is a Senior Undergraduate at the Indian Institute of Technology Kanpur, pursuing a Bachelor of Science in Chemistry. With a strong academic record and a passion for technology, software engineering, and data science, Vihan has gained valuable experience through internships at DeepInsights and Redopact. As a Software Engineer at DeepInsights, Vihan developed a program to acquire clients' Zoom access tokens, enabling access to call recordings for storage. At Redopact, he worked as a Web Developer Intern, contributing to the development of responsive web pages for a product-based site. Vihan's technical skills include C/C++, JavaScript, Node.js, React.js, Express.js, Mongoose.js, Django, MongoDB, PostgreSQL, Git, Postman, Socket.IO, Markdown, Linux, LATEX, and Bcrypt.js. He has also held leadership positions as Festival Coordinator and Head Web and App for Techkriti, Asia's Largest Technical & Entrepreneurial festival, where 

In [None]:
matched_keywords = response_text.split('"Matched Keywords":"')[1].split('"')[0]
matched_keywords

'AI, NLP, Computer vision, ML, Generative AI models, GANs, VAEs, Transformer models, BERT, ContractBERT, Machine Learning, Language models, AI builder products'

In [None]:
Missing_Keywords = response_text.split('"Missing Keywords":"')[1].split('"')[0]
Missing_Keywords

'Data science, Data engineering, Scaled agile product management principles'

In [None]:
match_percentage_str = response_text.split('"Job Description Match":"')[1].split('"')[0]
match_percentage_str

'70%'