# BERT implementation of Relevancy score generation and Matching of jobs with relevant candidates 

#### 1. **Import Libraries**
The code starts by importing necessary libraries:
- `pandas` for data manipulation.
- `cosine_similarity` from `sklearn.metrics.pairwise` for computing the similarity between vectors.
- `BertTokenizer` and `BertModel` from `transformers` to handle the BERT model for text embeddings.
- `torch` for tensor operations.
- `json` to parse JSON strings.

#### 2. **Load Data**
The code loads two CSV files:
- `all_resumes_data.csv`: Contains data about job candidates.
- `job_details_with_predictions.csv`: Contains job role details.

This data is loaded into pandas DataFrames for easy manipulation.

#### 3. **Parse JSON Columns**
Certain columns in the candidates' data contain JSON strings (e.g., lists of experiences, skills). The code:
- Specifies which columns are JSON-encoded.
- Parses these JSON strings into Python dictionaries or lists.

#### 4. **Extract and Combine Text**
To prepare the data for embedding generation:
- The code defines a function to extract text from the JSON fields by combining relevant information (e.g., descriptions of experiences, names of skills) into a single string.
- It combines the relevant text fields for each candidate into a single text string.
- Similarly, it combines the relevant fields (e.g., role description, requirements) for each job into a single text string.

#### 5. **Initialize BERT Tokenizer and Model**
The code initializes a pre-trained BERT tokenizer and model. These tools are used to convert text into embeddings:
- The tokenizer converts text into tokens that the BERT model can process.
- The model generates embeddings (numerical representations) for the text.

#### 6. **Generate BERT Embeddings**
For both candidates and job roles:
- The combined text strings are tokenized and passed through the BERT model.
- The model generates embeddings for the text, which are essentially vectors that represent the text in a high-dimensional space.
- These embeddings are stored for further comparison.

#### 7. **Compute Cosine Similarity**
To match candidates with job roles:
- The embeddings for candidates and job roles are compared using cosine similarity. This metric measures the cosine of the angle between two vectors, indicating their similarity.
- The code computes the similarity between each candidate's embedding and each job role's embedding.

#### 8. **Determine Relevance Scores**
For each job role, the code:
- Identifies the top candidates based on their similarity scores.
- Creates a list of these top candidates, including their names, email addresses, and relevancy scores.

#### 9. **Store Results**
The results are stored in a new DataFrame:
- Each row corresponds to a job role.
- A column contains a JSON string with the details of the top relevant candidates for that job role.

Finally, the results are saved to a new CSV file, which includes the job roles and the relevant candidates with their relevancy scores. This CSV file provides a ranked list of candidates for each job role based on the relevance of their skills and experiences.

In [4]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from transformers import BertTokenizer, BertModel
import torch
import json

candidates_data = pd.read_csv("C:/Users/hsahn/OneDrive/Desktop/all_resumes_data.csv")
job_details_data = pd.read_csv("C:/Users/hsahn/Downloads/job_details_with_predictions.csv")

json_columns = ['Experiences', 'Projects', 'Achievements', 'Certifications', 'HardSkills', 'SoftSkills']

for column in json_columns:
    candidates_data[column] = candidates_data[column].apply(json.loads)

def extract_text_from_json(json_field, key):
    if isinstance(json_field, list):
        return ' '.join([str(item[key]) for item in json_field if key in item])
    return ''

def combine_skills(row):
    text = ''
    for column in json_columns:
        if column == 'HardSkills' or column == 'SoftSkills':
            if isinstance(row[column], list):
                text += ' '.join([skill['skill'] for skill in row[column] if 'skill' in skill]) + ' '
        else:
            text += extract_text_from_json(row[column], 'description') + ' '
    return text.strip()

job_details_data['combined_job_text'] = job_details_data.apply(lambda row: f"{row['role_description']} {row['requirement']}", axis=1)

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

def get_bert_embedding(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).detach().numpy().flatten()

candidates_data['combined_skills'] = candidates_data.apply(combine_skills, axis=1)
candidates_data['embedding'] = candidates_data['combined_skills'].apply(get_bert_embedding)
job_details_data['embedding'] = job_details_data['combined_job_text'].apply(get_bert_embedding)

candidate_embeddings = candidates_data['embedding'].tolist()
job_embeddings = job_details_data['embedding'].tolist()

relevancy_scores = pd.DataFrame(index=candidates_data.index, columns=job_details_data.index)
for job_index, job_vector in enumerate(job_embeddings):
    similarity_scores = cosine_similarity(candidate_embeddings, [job_vector])
    relevancy_scores[job_index] = similarity_scores.flatten()

job_candidates_df = pd.DataFrame(columns=['JobRole', 'RelevantCandidates'])

for job_index in relevancy_scores.columns:
    top_candidates = relevancy_scores[job_index].nlargest(5)  # Adjust the number as needed
    relevant_candidates = []

    for candidate_index, score in top_candidates.items():
        candidate_details = {
            'CandidateName': candidates_data.loc[candidate_index, 'Name'],
            'CandidateEmail': candidates_data.loc[candidate_index, 'Email'],
            'RelevancyScore': score
        }
        relevant_candidates.append(candidate_details)

    job_candidates_df = job_candidates_df.append({
        'JobRole': job_details_data.loc[job_index, 'role_title'],
        'RelevantCandidates': json.dumps(relevant_candidates)
    }, ignore_index=True)

job_candidates_df.to_csv('C:/Users/hsahn/Downloads/job_relevancy_scores_with_candidates.csv', index=False)

print(f"Job relevancy scores with relevant candidates saved to 'job_relevancy_scores_with_candidates.csv'")


  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df 

Job relevancy scores with relevant candidates saved to 'job_relevancy_scores_with_candidates.csv'


  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df = job_candidates_df.append({
  job_candidates_df 