# Code to check the similarity between a Candidate and a Job Profile

## Input 

### LinkedIn Username
### LinkedIn Password
### Candidate Profile Link
### Job Description Link

In [40]:
username = "Username" #Your Username

password = "Password" #Your Password

candidate_link = "Username" #Candidate Link on LinkedIn

job_link = "Job_Link" #Job Link on LinkedIn


## Output

### Similarity Score of these can be used to compare various candidates

## Import Libraries and Download Prerequisites

In [41]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import csv
import time
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import spacy
from sentence_transformers import SentenceTransformer, util
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import numpy as np
import nltk
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from nltk.tag import pos_tag
import re

In [42]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
      --------------------------------------- 0.3/12.8 MB 7.9 MB/s eta 0:00:02
     ---- ----------------------------------- 1.4/12.8 MB 17.2 MB/s eta 0:00:01
     ----- ---------------------------------- 1.9/12.8 MB 17.5 MB/s eta 0:00:01
     ------ --------------------------------- 1.9/12.8 MB 15.5 MB/s eta 0:00:01
     -------- ------------------------------- 2.8/12.8 MB 14.0 MB/s eta 0:00:01
     --------- ------------------------------ 3.0/12.8 MB 12.7 MB/s eta 0:00:01
     ---------- ----------------------------- 3.3/12.8 MB 11.2 MB/s eta 0:00:01
     ------------ --------------------------- 4.1/12.8 MB 11.8 MB/s eta 0:00:01
     ------------ --------------------------- 4.1/12.8 MB 11.8 MB/s eta 0:00:01
     -------------- -------------

In [43]:
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\kanka\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\kanka\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\kanka\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

## Run the selenium driver

### Might need to update the chrome version in accordance to the selenium driver

In [44]:
driver=webdriver.Chrome(service=Service(r"chromedriver.exe"))

## Automatically Login to LinkedIn of the given User

In [45]:
service = Service(r"chromedriver.exe")
driver = webdriver.Chrome(service=service)

driver.get("https://www.linkedin.com/login")

time.sleep(3)

username_field = driver.find_element(By.ID, "username")
username_field.send_keys(username)
password_field = driver.find_element(By.ID, "password")
password_field.send_keys(password)

password_field.send_keys(Keys.RETURN)

time.sleep(5)

In [46]:
driver.get(job_link)

time.sleep(5)

## Extract Job Description from the Job Profile

### Need to change the "jobs-description-content__text" here according to the class name of the job description

In [47]:
job_details_element = driver.find_element(By.CLASS_NAME, "jobs-description-content__text")

job_details_text = job_details_element.text

print(job_details_text)

About the job
About Ascendion
Ascendion is a full-service digital engineering solutions company. We make and manage software platforms and products that power growth and deliver captivating experiences to consumers and employees. Our engineering, cloud, data, experience design, and talent solution capabilities accelerate transformation and impact for enterprise clients. Headquartered in New Jersey, our workforce of 6,000+ Ascenders delivers solutions from around the globe. Ascendion is built differently to engineer the next.

Ascendion | Engineering to elevate life | www.ascendion.com

We have a culture built on opportunity, inclusion, and a spirit of partnership. Come, change the world with us:
Build the coolest tech for the world’s leading brands
Solve complex problems - and learn new skills
Experience the power of transforming digital engineering for Fortune 500 clients
Master your craft with leading training programs and hands-on experience

Experience a community of change-makers!

In [48]:
def extract_skills(job_description):
    skills = []
    
    description_index = job_description.find("Description:")
    skills_index = job_description.find("Required Skills")
    
    if description_index != -1:
        skills_section = job_description[description_index:]
        if skills_index != -1:
            skills_section = skills_section[:skills_index]
        
        skills.extend(re.findall(r'(?:Hands-on with|Experience with|Good understanding and experience using|Ability to use|Solid understanding of|Hands-on experience in|Familiarity with)[\sA-Za-z&/\-()]+', skills_section))
    
    skills = list(set(filter(None, skills)))
    
    return skills

job_skills = extract_skills(job_details_text)

print("Extracted Skills:", job_skills)

Extracted Skills: ['Ability to use advanced techniques like HyDE', 'Good understanding and experience using Transformer/Neural Network model\nHands-on experience in creating Embedding using MPNET', 'Solid understanding of frameworks like Langchain', 'Hands-on with LLMs (In previous projects) and worked on product development using LLMs\nHands-on experience with Vector Database & Graph DB (Pinecone', 'Experience with Speech-to-Text using Whisper/Google TTS etc', 'Experience with Chat']


## Automatically browses the Candidate Link and extract unique skills of the candidate

In [49]:
cand_link = candidate_link + "/details/skills/"
driver.get(cand_link)

time.sleep(5)

In [50]:
html_content = driver.page_source

soup = BeautifulSoup(html_content, "html.parser")

section = soup.find("section", class_="artdeco-card pb3")

if section:
    skill_containers = section.find_all("div", class_="display-flex align-items-center mr1 hoverable-link-text t-bold")

    skills = [container.text.strip() for container in skill_containers]

else:
    print("Section not found")

unique_skills = list(set(skill[:len(skill)//2 + len(skill)%2] for skill in skills))

print("Unique Skills:", unique_skills)

driver.close()

Unique Skills: ['Data Modeling', 'React.js', 'C++', 'HTML', 'Deep Learning', 'Problem Solving', 'Machine Learning', 'Python (Programming Language)', 'Artificial Intelligence (AI)', 'MySQL', 'ChatGPT', 'Cascading Style Sheets (CSS)', 'Research Skills', 'Teamwork', 'Generative AI', 'Android Development', 'Team Leadership', 'Java']


## Basic Preprocessing of the data and computation of cosine similarities between job and candidate skills

In [51]:
print("Loading BERT model...")
model = SentenceTransformer('all-MiniLM-L6-v2')
print("Model loaded.")

def preprocess(text):
    tokens = word_tokenize(text.lower())
    stop_words = set(stopwords.words('english'))
    tokens = [token for token in tokens if token not in stop_words and token.isalpha()]
    return ' '.join(tokens)

def extract_keywords(text):
    tokens = word_tokenize(text.lower())
    pos_tags = pos_tag(tokens)
    keywords = [word for word, tag in pos_tags if tag.startswith('N')]
    return ' '.join(keywords)

candidate_skills_processed = [extract_keywords(skill) for skill in unique_skills]
job_skills_processed = [extract_keywords(skill) for skill in job_skills]

candidate_embeddings = model.encode(candidate_skills_processed, convert_to_tensor=True)
job_embeddings = model.encode(job_skills_processed, convert_to_tensor=True)

cosine_similarities = util.pytorch_cos_sim(job_embeddings, candidate_embeddings)

for i, job_skill in enumerate(job_skills):
    for j, candidate_skill in enumerate(unique_skills):
        similarity_score = cosine_similarities[i][j].item()
        print(f"Similarity between '{job_skill}' and '{candidate_skill}': {similarity_score:.2f}")

Loading BERT model...




Model loaded.
Similarity between 'Ability to use advanced techniques like HyDE' and 'Data Modeling': 0.07
Similarity between 'Ability to use advanced techniques like HyDE' and 'React.js': -0.02
Similarity between 'Ability to use advanced techniques like HyDE' and 'C++': 0.06
Similarity between 'Ability to use advanced techniques like HyDE' and 'HTML': 0.06
Similarity between 'Ability to use advanced techniques like HyDE' and 'Deep Learning': 0.26
Similarity between 'Ability to use advanced techniques like HyDE' and 'Problem Solving': 0.10
Similarity between 'Ability to use advanced techniques like HyDE' and 'Machine Learning': 0.24
Similarity between 'Ability to use advanced techniques like HyDE' and 'Python (Programming Language)': 0.05
Similarity between 'Ability to use advanced techniques like HyDE' and 'Artificial Intelligence (AI)': 0.36
Similarity between 'Ability to use advanced techniques like HyDE' and 'MySQL': 0.05
Similarity between 'Ability to use advanced techniques like H

## Calculate a similarity score between the candidate skills and the skills required by the job

In [52]:
max_scores = []

for j, candidate_skill in enumerate(unique_skills):
    max_score = 0
    for i, job_skill in enumerate(job_skills):
        similarity_score = cosine_similarities[i][j].item()
        if similarity_score > max_score:
            max_score = similarity_score
    max_scores.append(max_score)

overall_score = np.mean(max_scores)

print(f"Overall similarity score for the candidate: {overall_score:.2f}")

Overall similarity score for the candidate: 0.24


## This score can be used to compare and rank various candidates for the job profile.