In [1]:
job_description='''
Job Description: Senior Backend Engineer

Responsibilities:

Design, develop, and maintain robust and scalable backend systems.
Collaborate with frontend and mobile teams to build seamless user experiences.
Optimize database performance and write efficient SQL queries.
Implement robust security measures to protect sensitive data.
Mentor junior engineers and foster a culture of continuous learning.
Required Skills:

Strong proficiency in backend programming languages (e.g., Python, Node.js, Ruby on Rails, Java).
Experience with database technologies (e.g., PostgreSQL, MySQL, MongoDB).
Solid understanding of RESTful API design and development.
Knowledge of cloud platforms (e.g., AWS, GCP, Azure).
Experience with containerization technologies (e.g., Docker, Kubernetes).
'''

In [2]:
interviewee_responce='''
I've been passionate about backend development for 3 years, and I'm excited to apply my skills to challenging projects.
At my previous role at egy_tech, I was responsible for building a scalable API that handled 100 requests per second.
I utilized [Specific technologies, e.g., Python, Flask, PostgreSQL] to optimize performance and ensure reliability.

I'm particularly interested in your company's focus on [database, data privacy, machine learning, Azuru].
I've been exploring Node.js and believe it could be a valuable asset to your team.
I'm eager to contribute to innovative projects and learn from experienced engineers.
'''

## **Preprocesssing**

In [3]:
import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.corpus import wordnet

from nltk.stem import WordNetLemmatizer
import re


nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt to C:\Users\Mohamed
[nltk_data]     Walid\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Mohamed
[nltk_data]     Walid\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Mohamed
[nltk_data]     Walid\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to C:\Users\Mohamed
[nltk_data]     Walid\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\Mohamed Walid\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


True

In [4]:
stop_words = set(stopwords.words('english'))
translated_table = str.maketrans('', '', string.punctuation)

In [5]:
def get_wordnet_pos(tag):
    if tag.startswith('J'):
        return wordnet.ADJ  # Adjective
    elif tag.startswith('V'):
        return wordnet.VERB  # Verb
    elif tag.startswith('N'):
        return wordnet.NOUN  # Noun
    elif tag.startswith('R'):
        return wordnet.ADV  # Adverb
    else:
        return wordnet.NOUN  # Default to Noun

In [6]:
def preprocess_text(text):
    text = text.lower()

    text = re.sub(r'\d+', '', text)       # Remove numbers
    text = text.translate(translated_table)

    text_tokens = word_tokenize(text)

    filtered_words=[word for word in text_tokens if word not in stop_words ]
    # lemmatization => transforming words to their base or dictionary form
    lemmatizer=WordNetLemmatizer()

    lemma_words = []
    for word in filtered_words:
        pos_tag = nltk.pos_tag([word])[0][1]  # Get POS tag for each word
        wordnet_pos = get_wordnet_pos(pos_tag)  # Map POS to WordNet POS
        lemma_word = lemmatizer.lemmatize(word, pos=wordnet_pos)  # Lemmatize using WordNet POS
        lemma_words.append(lemma_word)

    processed_text = ' '.join(lemma_words)
    return processed_text

In [7]:
preprocessed_job_description = preprocess_text(job_description)
print(f"Preprocessed job description : {preprocessed_job_description}")

Preprocessed job description : job description senior backend engineer responsibility design develop maintain robust scalable backend system collaborate frontend mobile team build seamless user experience optimize database performance write efficient sql query implement robust security measure protect sensitive data mentor junior engineer foster culture continuous learn require skill strong proficiency backend program language eg python nodejs ruby rail java experience database technology eg postgresql mysql mongodb solid understand restful api design development knowledge cloud platform eg aws gcp azure experience containerization technology eg docker kubernetes


In [8]:
preprocessed_interviewee_responce= preprocess_text(interviewee_responce)
print(f"Preprocessed interviewee responce : {preprocessed_interviewee_responce}")

Preprocessed interviewee responce : ive passionate backend development year im excite apply skill challenge project previous role egytech responsible building scalable api handle request per second utilized specific technology eg python flask postgresql optimize performance ensure reliability im particularly interested company focus database data privacy machine learn azuru ive explore nodejs believe could valuable asset team im eager contribute innovative project learn experienced engineer


### **Extract important keywords from job description and interviewee responce**

In [9]:
from keybert import KeyBERT
import spacy

In [10]:
def extract_relevant_keywords(text, nlp=None):
    """
    Extract relevant keywords with robust filtering and customization options.

    Args:
        text (str): Input text for keyword extraction.
        nlp (spacy.Language): spaCy language model for linguistic analysis.
        top_n (int): Maximum number of keywords to return.

    Returns:
        List[str]: Refined list of keywords.
        List[float]: Corresponding scores for the keywords.
    """
    # Load spaCy model if not provided
    if nlp is None:
        nlp = spacy.load("en_core_web_md")

    # Initialize KeyBERT
    kw_model = KeyBERT()

    # Extract keywords with KeyBERT
    raw_keywords = kw_model.extract_keywords(
        text, keyphrase_ngram_range=(1, 3), stop_words="english", top_n=25
    )

    # Filter keywords
    filtered_keywords = []
    filtered_scores = []
    valid_pos = {"NOUN", "PROPN","VERB"}  # Focus on nouns and proper nouns for relevance and verbs

    for keyword, score in raw_keywords:
        doc = nlp(keyword)  # Process the keyword with spaCy

        # Check if all tokens in the keyword are either NOUN or PROPN
        if all(token.pos_ in valid_pos for token in doc):
            filtered_keywords.append(keyword)
            filtered_scores.append(score)

    return filtered_keywords, filtered_scores

In [11]:
# Example usage
key_words_JobD, key_words_JobD_scores = extract_relevant_keywords(preprocessed_job_description)
print("Keywords:", key_words_JobD)
print("Scores:", key_words_JobD_scores)
print(f"The length of keywords in the job description is: {len(key_words_JobD)}")

Keywords: ['backend engineer', 'backend engineer responsibility', 'backend', 'proficiency backend', 'experience database technology', 'data mentor junior', 'data mentor', 'proficiency backend program', 'job description', 'develop maintain', 'development knowledge cloud', 'knowledge cloud platform', 'backend program', 'technology postgresql', 'backend program language']
Scores: [0.651, 0.5717, 0.4895, 0.4771, 0.4766, 0.4688, 0.4636, 0.458, 0.4554, 0.4471, 0.4435, 0.4418, 0.4369, 0.4319, 0.4316]
The length of keywords in the job description is: 15


In [12]:
key_words_interviewee,key_words_interviewee_scores = extract_relevant_keywords(preprocessed_interviewee_responce)

print("Keywords:", key_words_interviewee)
print("Scores:", key_words_interviewee_scores)
print(f"the length of keywords in the interviewee responce is : {len(key_words_interviewee)}")

Keywords: ['passionate backend development', 'backend development', 'backend development year', 'project learn experienced', 'experienced engineer', 'backend', 'asset team', 'role egytech', 'learn experienced engineer', 'development', 'engineer', 'project learn', 'nodejs', 'learn experienced']
Scores: [0.6362, 0.5504, 0.5419, 0.5132, 0.4885, 0.4837, 0.4411, 0.4311, 0.4145, 0.4111, 0.409, 0.4047, 0.3986, 0.3943]
the length of keywords in the interviewee responce is : 14


### **Synonyms of each word in the keyword**

In [31]:
from nltk.corpus import wordnet
from nltk.util import ngrams
from nltk.tokenize import word_tokenize

# Function to fetch synonyms for a word using WordNet
def get_synonyms(word):
    """Fetch a set of synonyms for a word using WordNet."""
    synonyms = set()
    for syn in wordnet.synsets(word):
        for lemma in syn.lemmas():
            synonyms.add(lemma.name())
    return synonyms

def get_similarity(word1, word2):
    """Calculate the similarity between two words using WordNet's Wu-Palmer similarity."""
    syn1 = wordnet.synsets(word1)
    syn2 = wordnet.synsets(word2)
    
    if syn1 and syn2:
        # Calculate similarity between the first synsets of both words
        return syn1[0].wup_similarity(syn2[0])  # Wu-Palmer similarity (range: 0 to 1)
    return 0  # Return 0 if no similarity found

# Function to generate n-grams (1-gram and 2-gram) from the tokens
def generate_ngrams(tokens, n=2):
    """Generate n-grams from the list of tokens."""
    n_grams = ngrams(tokens, n)
    return [' '.join(gram) for gram in n_grams]

# Function to combine each bigram with its synonyms and calculate similarity with a threshold
def combine_with_synonyms_and_similarity(doc, n=1, threshold=0.5):
    """
    Combine each bigram in the text with its synonyms and calculate similarity.
    Only include synonyms with a similarity score above the threshold.
    """
    combined_dict = {}
    tokens = [token.lower() for token in doc]  # Lowercase tokens (assuming `doc` is a list of tokens)
    n_grams = generate_ngrams(tokens, n)  # Generate n-grams
    
    for gram in n_grams:
        synonyms_with_scores = {}
        words_in_bigram = gram.split()  # Split bigram into individual words
        
        for word in words_in_bigram:
            synonyms = get_synonyms(word)  # Get synonyms for the word
            
            for synonym in synonyms:
                if word != synonym:  # Avoid self-similarity
                    similarity_score = get_similarity(word, synonym)
                    
                    # Include only if the similarity score is above the threshold
                    if similarity_score and similarity_score >= threshold:
                        synonyms_with_scores[synonym] = similarity_score
        
        combined_dict[gram] = synonyms_with_scores  # Store the bigram with synonyms and scores
    
    return combined_dict

In [32]:
# Combine words and bigrams with synonyms
job_synonyms = combine_with_synonyms_and_similarity(key_words_JobD,threshold=0.9)

print(f"Job Description Synonyms: {job_synonyms}")
print('#'*50)
# Print the results
print("Job Description Synonyms:")
for gram, syn_dict in job_synonyms.items():
    print(f"\n{gram}:")
    for synonym, score in syn_dict.items():
        print(f"  - {synonym}: {score}")


Job Description Synonyms: {'backend engineer': {}, 'backend engineer responsibility': {'obligation': 1.0, 'duty': 1.0}, 'backend': {}, 'proficiency backend': {}, 'experience database technology': {'engineering': 1.0}, 'data mentor junior': {'wise_man': 1.0, 'Junior': 1.0}, 'data mentor': {'wise_man': 1.0}, 'proficiency backend program': {'plan': 1.0}, 'job description': {'line_of_work': 1.0, 'Job': 1.0, 'occupation': 1.0, 'verbal_description': 1.0}, 'develop maintain': {}, 'development knowledge cloud': {'noesis': 1.0, 'cognition': 1.0}, 'knowledge cloud platform': {'noesis': 1.0, 'cognition': 1.0}, 'backend program': {'plan': 1.0}, 'technology postgresql': {'engineering': 1.0}, 'backend program language': {'plan': 1.0, 'linguistic_communication': 1.0}}
##################################################
Job Description Synonyms:

backend engineer:

backend engineer responsibility:
  - obligation: 1.0
  - duty: 1.0

backend:

proficiency backend:

experience database technology:
  - eng

In [33]:
# Combine words and bigrams with synonyms
response_synonyms = combine_with_synonyms_and_similarity(key_words_interviewee,threshold=0.9)

print(f"Response Synonyms: {response_synonyms}")

print('#'*50)
# Print the results
print("interviewee responce Synonyms:")
for gram, syn_dict in response_synonyms.items():
    print(f"\n{gram}:")
    for synonym, score in syn_dict.items():
        print(f"  - {synonym}: {score}")


Response Synonyms: {'passionate backend development': {}, 'backend development': {}, 'backend development year': {'twelvemonth': 1.0, 'yr': 1.0}, 'project learn experienced': {'task': 1.0, 'undertaking': 1.0, 'larn': 1.0, 'go_through': 1.0}, 'experienced engineer': {'go_through': 1.0}, 'backend': {}, 'asset team': {'plus': 1.0}, 'role egytech': {}, 'learn experienced engineer': {'larn': 1.0, 'go_through': 1.0}, 'development': {}, 'engineer': {}, 'project learn': {'task': 1.0, 'undertaking': 1.0, 'larn': 1.0}, 'nodejs': {}, 'learn experienced': {'larn': 1.0, 'go_through': 1.0}}
##################################################
interviewee responce Synonyms:

passionate backend development:

backend development:

backend development year:
  - twelvemonth: 1.0
  - yr: 1.0

project learn experienced:
  - task: 1.0
  - undertaking: 1.0
  - larn: 1.0
  - go_through: 1.0

experienced engineer:
  - go_through: 1.0

backend:

asset team:
  - plus: 1.0

role egytech:

learn experienced engineer

## **Calculate similarity percentage between job description and interviewee responce**

In [41]:
def calculate_similarity(job_synonyms, response_synonyms):
    """
    Calculate the percentage of similarity between job description and interviewee response based on synonyms.

    Parameters:
        job_synonyms (dict): Dictionary of job description phrases and their synonyms with similarity scores.
        response_synonyms (dict): Dictionary of response phrases and their synonyms with similarity scores.

    Returns:
        float: Percentage of similarity between job description and interviewee response.
    """
    count = 0  # Matches found
    total_keywords = len(key_words_JobD)  # Total number of job description keywords

    for job_key, job_values in job_synonyms.items():
        # Check if the job key or its synonyms exist in the response synonyms
        if job_key in response_synonyms:
            count += 1
        else:
            for synonym in job_values.keys():
                if synonym in response_synonyms:
                    count += 1
                    break  # Avoid double counting for the same job key

    # Calculate similarity percentage
    similarity_percentage = (count / total_keywords) * 100 if total_keywords else 0

    return similarity_percentage

similarity_percentage = calculate_similarity(job_synonyms, response_synonyms)
print(f"Similarity Percentage: {similarity_percentage:.2f}%")


Similarity Percentage: 6.67%
