# Function for Job_Candidate_RelevancyMatching_OpenAI 

1. **Function and Library Imports**: The code imports necessary libraries like pandas for data manipulation, json for handling JSON data, sklearn for calculating cosine similarity, transformers and torch for handling BERT embeddings, and openai for accessing OpenAI's API.


2. **Function Definitions**:
   - **`extract_job_details`**: This function takes a job description and an OpenAI API key as inputs. It uses OpenAI's API to extract detailed job requirements like eligibility criteria, skills, and experiences required for the job.
   - **`parse_resume`**: This function takes a resume text and an OpenAI API key as inputs. It uses OpenAI's API to extract skills, experiences, projects, and certifications from the resume.
   - **`get_bert_embedding`**: This function takes text, a BERT tokenizer, and a BERT model as inputs. It converts the text into a BERT embedding vector, which is used for calculating the similarity between job descriptions and candidate profiles.
   - **`combine_skills`**: This function combines various skill-related fields from the candidate's data into a single string, making it easier to process and compare.
   - **`extract_text_from_json`**: This helper function extracts specific text fields from JSON objects within the data.
   
   
3. **Main Function - `find_relevant_candidates`**:
   - **Loading Data**: The function reads the student database CSV and the job description CSV into pandas DataFrames.
   - **Processing Candidate Data**: It processes each candidate's resume to extract and combine skills and experiences into a single text string. It then generates BERT embeddings for these combined texts.
   - **Processing Job Descriptions**: It processes each job description to extract detailed job requirements using OpenAI. It then generates BERT embeddings for the combined job descriptions and requirements.
   - **Calculating Relevancy Scores**: The function calculates cosine similarity scores between the BERT embeddings of the job descriptions and candidate profiles. These scores represent how relevant each candidate is to each job.
   - **Selecting Top Candidates**: For each job, the function selects the top 15 candidates with the highest relevancy scores.
   - **Saving Results**: It saves the job roles, along with the top 15 relevant candidates' names, emails, and relevancy scores, into a specified output CSV file.




In [None]:
import pandas as pd
import json
from sklearn.metrics.pairwise import cosine_similarity
from transformers import BertTokenizer, BertModel
import torch
import openai

def extract_job_details(job_description, openai_api_key):
    openai.api_key = openai_api_key
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=f"Extract the job description, eligibility criteria including college, year of passing, branch, skills, and experiences required from the following job description: {job_description}",
        max_tokens=500
    )
    return response.choices[0].text.strip()

def parse_resume(resume_text, openai_api_key):
    openai.api_key = openai_api_key
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=f"Extract skills, experiences, projects, and certifications from the following resume: {resume_text}",
        max_tokens=500
    )
    return json.loads(response.choices[0].text.strip())

def get_bert_embedding(text, tokenizer, model):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).detach().numpy().flatten()

def combine_skills(row, json_columns):
    text = ''
    for column in json_columns:
        if column in ['HardSkills', 'SoftSkills']:
            if isinstance(row[column], list):
                text += ' '.join([skill['skill'] for skill in row[column] if 'skill' in skill]) + ' '
        else:
            text += extract_text_from_json(row[column], 'description') + ' '
    return text.strip()

def extract_text_from_json(json_field, key):
    if isinstance(json_field, list):
        return ' '.join([str(item[key]) for item in json_field if key in item])
    return ''

def find_relevant_candidates(job_description_csv, student_database_csv, openai_api_key, output_csv_path):
    candidates_data = pd.read_csv(student_database_csv)
    job_details_data = pd.read_csv(job_description_csv)

    json_columns = ['Experiences', 'Projects', 'Achievements', 'Certifications', 'HardSkills', 'SoftSkills']

    for column in json_columns:
        candidates_data[column] = candidates_data[column].apply(json.loads)

    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained('bert-base-uncased')

    candidates_data['combined_skills'] = candidates_data.apply(lambda row: combine_skills(row, json_columns), axis=1)
    candidates_data['embedding'] = candidates_data['combined_skills'].apply(lambda x: get_bert_embedding(x, tokenizer, model))

    job_details_data['combined_job_text'] = job_details_data.apply(
        lambda row: f"{row['job_description']} {extract_job_details(row['job_description'], openai_api_key)}", axis=1)
    job_details_data['embedding'] = job_details_data['combined_job_text'].apply(lambda x: get_bert_embedding(x, tokenizer, model))

    candidate_embeddings = candidates_data['embedding'].tolist()
    job_embeddings = job_details_data['embedding'].tolist()

    relevancy_scores = pd.DataFrame(index=candidates_data.index, columns=job_details_data.index)
    for job_index, job_vector in enumerate(job_embeddings):
        similarity_scores = cosine_similarity(candidate_embeddings, [job_vector])
        relevancy_scores[job_index] = similarity_scores.flatten()

    job_candidates_df = pd.DataFrame(columns=['JobRole', 'RelevantCandidates'])
    for job_index in relevancy_scores.columns:
        top_candidates = relevancy_scores[job_index].nlargest(15)
        relevant_candidates = []
        for candidate_index, score in top_candidates.items():
            candidate_details = {
                'CandidateName': candidates_data.loc[candidate_index, 'Name'],
                'CandidateEmail': candidates_data.loc[candidate_index, 'Email'],
                'RelevancyScore': score
            }
            relevant_candidates.append(candidate_details)
        job_candidates_df = job_candidates_df.append({
            'JobRole': job_details_data.loc[job_index, 'job_name'],
            'RelevantCandidates': json.dumps(relevant_candidates)
        }, ignore_index=True)

    job_candidates_df.to_csv(output_csv_path, index=False)
    print(f"Job relevancy scores with relevant candidates saved to '{output_csv_path}'")

# Usage example:
job_description_csv = "C:/Users/hsahn/Downloads/job_details_with_predictions.csv"
student_database_csv = "C:/Users/hsahn/OneDrive/Desktop/all_resumes_data.csv"
openai_api_key = "your_openai_api_key"
output_csv_path = "C:/Users/hsahn/Downloads/job_relevancy_scores_with_candidates.csv"

find_relevant_candidates(job_description_csv, student_database_csv, openai_api_key, output_csv_path)


## I will check the code once the Open AI API key is given 