# Potential Talents

## Background:

As a talent sourcing and management company, we are interested in finding talented individuals for sourcing these candidates to technology companies. Finding talented candidates is not easy, for several reasons. The first reason is one needs to understand what the role is very well to fill in that spot, this requires understanding the client’s needs and what they are looking for in a potential candidate. The second reason is one needs to understand what makes a candidate shine for the role we are in search for. Third, where to find talented individuals is another challenge.

The nature of our job requires a lot of human labor and is full of manual operations. Towards automating this process we want to build a better approach that could save us time and finally help us spot potential candidates that could fit the roles we are in search for. Moreover, going beyond that for a specific role we want to fill in we are interested in developing a machine learning powered pipeline that could spot talented individuals, and rank them based on their fitness.

We are right now semi-automatically sourcing a few candidates, therefore the sourcing part is not a concern at this time but we expect to first determine best matching candidates based on how fit these candidates are for a given role. We generally make these searches based on some keywords such as “full-stack software engineer”, “engineering manager” or “aspiring human resources” based on the role we are trying to fill in. These keywords might change, and you can expect that specific keywords will be provided to you.

Assuming that we were able to list and rank fitting candidates, we then employ a review procedure, as each candidate needs to be reviewed and then determined how good a fit they are through manual inspection. This procedure is done manually and at the end of this manual review, we might choose not the first fitting candidate in the list but maybe the 7th candidate in the list. If that happens, we are interested in being able to re-rank the previous list based on this information. This supervisory signal is going to be supplied by starring the 7th candidate in the list. Starring one candidate actually sets this candidate as an ideal candidate for the given role. Then, we expect the list to be re-ranked each time a candidate is starred.`

## Data Description:

The data comes from our sourcing efforts. We removed any field that could directly reveal personal details and gave a unique identifier for each candidate.

Attributes:
id : unique identifier for candidate (numeric)

job_title : job title for candidate (text)

location : geographical location for candidate (text)

connections: number of connections candidate has, 500+ means over 500 (text)

Output (desired target):
fit - how fit the candidate is for the role? (numeric, probability between 0-1)

Keywords: “Aspiring human resources” or “seeking human resources”

## Goal:

Predict how fit the candidate is based on their available information (variable fit)

## Success Metric(s):

Rank candidates based on a fitness score.

Re-rank candidates when a candidate is starred.

## Imports and Preprocessing
Let's start by importing necessary libraries and packages for our project.

In [1]:
import pandas as pd
import numpy as np
import re
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from transformers import BertTokenizer, BertModel
from sentence_transformers import SentenceTransformer 
import torch
import warnings
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Download required NLTK data
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')
try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords')
try:
    nltk.data.find('corpora/wordnet')
except LookupError:
    nltk.download('wordnet')
try:
    nltk.data.find('tokenizers/punkt_tab')
except LookupError:
    nltk.download('punkt_tab')    

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\sirak\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [3]:
# Now let's load the data
data = pd.read_csv('data/potential_talents_data.csv')
print(f"Loaded {len(data)} candidates")
data.head()

Loaded 104 candidates


Unnamed: 0,id,job_title,location,connection,fit
0,1,2019 C.T. Bauer College of Business Graduate (...,"Houston, Texas",85,
1,2,Native English Teacher at EPIK (English Progra...,Kanada,500+,
2,3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,
3,4,People Development Coordinator at Ryan,"Denton, Texas",500+,
4,5,Advisory Board Member at Celal Bayar University,"İzmir, Türkiye",500+,


Now let's start the preprocessing by creating a custom function which will convert the text to lowercase, remove punctuation and extra whitespaces, remove the stopwords, tokenize and finally lemmatize the given text:

In [4]:
def preprocess_text(text):
    if pd.isna(text):
        return ""

    #convert to lowercase
    text=text.lower()

    #remove punctuation
    text=re.sub(r'[^\w\s]', '', text)

    # remove extra whitespace
    text=re.sub(r'\s+', ' ', text).strip()

    #Tokenize
    tokens=word_tokenize(text)

    # Create lemmatizer object
    lemmatizer = WordNetLemmatizer()

    #remove stopwords and lemmatize
    tokens= [lemmatizer.lemmatize(token) for token in tokens
    if token not in set(stopwords.words('english')) ]

    return ' '.join(tokens)

In [5]:
# Now let's add a new column 'job_title_preprocessed' to our dataset by applying the preprocess_text function
data['job_title_preprocessed']=data['job_title'].apply(preprocess_text)

In [6]:
data[['job_title', 'job_title_preprocessed']].head()

Unnamed: 0,job_title,job_title_preprocessed
0,2019 C.T. Bauer College of Business Graduate (...,2019 ct bauer college business graduate magna ...
1,Native English Teacher at EPIK (English Progra...,native english teacher epik english program korea
2,Aspiring Human Resources Professional,aspiring human resource professional
3,People Development Coordinator at Ryan,people development coordinator ryan
4,Advisory Board Member at Celal Bayar University,advisory board member celal bayar university


## Embedding

For word embeddings we'll try a few methods and see which one works the best: Bag of Words, TF-IDF, Bert and Sbert. We'll create custom functions for each one of them.

In [7]:
# Starting with the Bag of Words method

def bag_of_words_similarity(data, target_string):
    #create BoW vectorizer
    vectorizer=CountVectorizer(max_features=1000, ngram_range=(1,2))

    # combine job titles with target string
    all_texts=list(data['job_title_preprocessed'])+[target_string]
    bow_matrix=vectorizer.fit_transform(all_texts)

    #Calculate similarity between each job title and target
    job_title_matrix=bow_matrix[:-1] # All except last (target)
    target_vector = bow_matrix[-1:] # Last row (target)

    similarities=cosine_similarity(job_title_matrix, target_vector).flatten()
    return similarities

In [8]:
# Next on the list is TF-IDF

def tfidf_similarity(data, target_string):
    #create TF-IDF vectorizer
    vectorizer=TfidfVectorizer(max_features=1000, ngram_range=(1,2))

    #Combine job titles with target string
    all_texts=list(data['job_title_preprocessed'])+[target_string]
    tfidf_matrix=vectorizer.fit_transform(all_texts)

    # Calculate similarity between each job title and target
    job_titles_matrix = tfidf_matrix[:-1]  # All except last (target)
    target_vector = tfidf_matrix[-1:]  # Last row (target)
        
    similarities = cosine_similarity(job_titles_matrix, target_vector).flatten()
    return similarities

In [9]:
# BERT similarity

def bert_similarity(data, target_string):
    #Load BERT model and tokenizer
    tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
    model=BertModel.from_pretrained('bert-base-uncased')
    model.eval()

    #Calculate BERT embeddings for job titles
    doc_embeddings=[]
    for text in data['job_title']:
        if pd.isna(text):
            doc_embeddings.append(np.zeros(768))
            continue

        inputs=tokenizer(text, return_tensors='pt', max_length=512, truncation=True, padding=True)

        with torch.no_grad():
            outputs=model(**inputs)
            embedding=outputs.last_hidden_state[:,0,:].numpy().flatten()
            doc_embeddings.append(embedding)

    #Calculate BERT embedding for target
    target_inputs=tokenizer(target_string, return_tensors='pt', max_length=512, truncation=True, padding=True)

    with torch.no_grad():
        target_outputs=model(**target_inputs)
        target_embedding=target_outputs.last_hidden_state[:,0,:].numpy().flatten()

    #Calculate similarities
    similarities=[]
    for doc_embedding in doc_embeddings:
        similarity=np.dot(doc_embedding, target_embedding) / (np.linalg.norm(doc_embedding) * np.linalg.norm(target_embedding) + 1e-8)
        similarities.append(similarity)

    return np.array(similarities)        

In [10]:
# And last, but not least SBERT

def sbert_similarity(data, target_string):

    #Load SBERT model
    sbert_model=SentenceTransformer('all-MiniLM-L6-v2')

    #Get embeddings for job titles
    job_titles=data['job_title'].fillna('').tolist()
    job_embeddings=sbert_model.encode(job_titles)

    #get embedding for target
    target_embedding = sbert_model.encode([target_string])

    #Calculate similarities
    similarities=cosine_similarity(job_embeddings, target_embedding).flatten()
    return similarities

## Ranking and Reranking functions

Before moving to comparing our embedding methods, let's build functions for ranking and reranking our candidates based on a embedding method that was chosen.

In [19]:
# ranking candidates first

def rank_candidates(data, target_string, method='tfidf', starred_candidates=None):
    """
    method (str): Embedding method ('bow', 'tfidf', 'bert', 'sbert')
    starred_candidates (list): List of candidate IDs that have been starred
    """
    #Preprocessing the target string
    target_processed=preprocess_text(target_string)

    # Calculate similarities based on method
    if method == 'bow':
        similarities=bag_of_words_similarity(data, target_processed)
    elif method == 'tfidf':
        similarities = tfidf_similarity(data, target_processed)
    elif method == 'bert':
        similarities = bert_similarity(data, target_string)  
    elif method == 'sbert':
        similarities = sbert_similarity(data, target_string)  
    else:
        raise ValueError(f"Unknown method: {method}. Available methods: 'bow', 'tfidf', 'bert', 'sbert'")

    # Normalize similarities to 0-1 range
    scaler=MinMaxScaler()
    similarities_norm=scaler.fit_transform(similarities.reshape(-1,1)).flatten()

    # Create ranking dataframe
    ranking_df=data.copy()
    ranking_df['similarity_score'] = similarities_norm
    ranking_df['rank'] = ranking_df['similarity_score'].rank(ascending=False, method='min').astype(int)

    # Apply re-ranking if starred candidates provided
    if starred_candidates:
        ranking_df=rerank_with_starred(ranking_df, starred_candidates)

    # Sort by rank
    ranking_df=ranking_df.sort_values('rank').reset_index(drop=True)

    return ranking_df

# Now let's implement the rerank with starred function

def rerank_with_starred(ranking_df, starred_candidates):

    # Get features of starred candidates
    starred_mask=ranking_df['id'].isin(starred_candidates)
    starred_features=ranking_df[starred_mask]['similarity_score']

    if len(starred_candidates) == 0:
        return ranking_df

    # Calculate average similarity of starred candidates
    starred_avg = starred_features.mean()

    # Boost scores for candidates similar to starred ones
    for idx, row in ranking_df.iterrows():
        candidate_score=row['similarity_score']

        # Calculate similarity to starred candidates' average
        similarity_to_starred=1-abs(candidate_score - starred_avg)

        # Apply boost
        boost_factor= 1 + 0.3 * similarity_to_starred
        ranking_df.loc[idx, 'similarity_score'] = candidate_score * boost_factor

    # Re-rank
    ranking_df['rank']=ranking_df['similarity_score'].rank(ascending=False, method='min').astype(int)

    return ranking_df

## Comparing methods and choosing the best one

Now we're ready to create functions for comparing our methods:

In [15]:
# First, comparing the methods
def compare_methods(data, target_string, starred_candidates=None):
    methods = ['bow', 'tfidf', 'bert', 'sbert']
    results = {}

    # Comparing methods for target
    for method in methods:
        try:
            ranking=rank_candidates(data, target_string, method, starred_candidates)

            # Store top 10 results
            top10= ranking.head(10)[['rank', 'id', 'job_title', 'similarity_score']]
            results[method]={
                'ranking': ranking,
                'top_10': top10,
                'avg_score': ranking['similarity_score'].mean(),
                'max_score': ranking['similarity_score'].max()
            }

           # print top 5 candidates
            print(f"Top 5 candidates using {method.upper()}:")
            for _, row in top10.head(5).iterrows():
                starred_mark = " ⭐" if row['id'] in (starred_candidates or []) else ""
                print(f"  {row['rank']:2d}. ID {row['id']:3d} - {row['job_title'][:50]:<50} (Score: {row['similarity_score']:.4f}){starred_mark}")

        except Exception as e:
            print(f"Error with {method}: {e}")
            continue
    return results

# Now let's get the best method
def get_best_method(data, target_string, starred_candidates=None):
    results=compare_methods(data, target_string, starred_candidates=starred_candidates)

    #Find method with highest average score
    best_method = max(results.keys(), key=lambda x:results[x]['avg_score'])

    print(f"Best Method : {best_method}")
    print(f"Average similarity score : {results[best_method]['avg_score']:.4f}")

    return best_method


## Putting it all together

In [13]:
def Talent_ranker(data, target_string, starred_candidates=None):
    # data preprocessing
    data['job_title_preprocessed']=data['job_title'].apply(preprocess_text)


    # Finding the best method
    best_method=get_best_method(data, target_string, starred_candidates=starred_candidates)

    return best_method

Now let's test our function, first without starred candidates:

In [17]:
target_string = "seeking human resources"
Talent_ranker(data, target_string)

Top 5 candidates using BOW:
   1. ID  28 - Seeking Human Resources Opportunities              (Score: 1.0000)
   1. ID  30 - Seeking Human Resources Opportunities              (Score: 1.0000)
   1. ID  99 - Seeking Human Resources Position                   (Score: 1.0000)
   4. ID  73 - Aspiring Human Resources Manager, seeking internsh (Score: 0.8083)
   5. ID  53 - Seeking Human Resources HRIS and Generalist Positi (Score: 0.7977)
Top 5 candidates using TFIDF:
   1. ID  99 - Seeking Human Resources Position                   (Score: 1.0000)
   2. ID  28 - Seeking Human Resources Opportunities              (Score: 0.9913)
   2. ID  30 - Seeking Human Resources Opportunities              (Score: 0.9913)
   4. ID  10 - Seeking Human Resources HRIS and Generalist Positi (Score: 0.7246)
   4. ID  53 - Seeking Human Resources HRIS and Generalist Positi (Score: 0.7246)
Top 5 candidates using BERT:
   1. ID  28 - Seeking Human Resources Opportunities              (Score: 1.0000)
   1. ID  3

'bert'

It seems that BERT showed the highest average similarity score, but I think other method also performed pretty well judging by the results. Now let's try to add starred candidates to the mix:

In [21]:
starred_candidates=[53, 28, 68]
Talent_ranker(data, target_string, starred_candidates=starred_candidates)

Top 5 candidates using BOW:
   1. ID  28 - Seeking Human Resources Opportunities              (Score: 1.2398) ⭐
   1. ID  30 - Seeking Human Resources Opportunities              (Score: 1.2398)
   1. ID  99 - Seeking Human Resources Position                   (Score: 1.2398)
   4. ID  73 - Aspiring Human Resources Manager, seeking internsh (Score: 1.0486)
   5. ID  53 - Seeking Human Resources HRIS and Generalist Positi (Score: 1.0367) ⭐
Top 5 candidates using TFIDF:
   1. ID  99 - Seeking Human Resources Position                   (Score: 1.1949)
   2. ID  28 - Seeking Human Resources Opportunities              (Score: 1.1871) ⭐
   2. ID  30 - Seeking Human Resources Opportunities              (Score: 1.1871)
   4. ID  10 - Seeking Human Resources HRIS and Generalist Positi (Score: 0.9257)
   4. ID  53 - Seeking Human Resources HRIS and Generalist Positi (Score: 0.9257) ⭐
Top 5 candidates using BERT:
   1. ID  28 - Seeking Human Resources Opportunities              (Score: 1.2522) ⭐
 

'bert'

And again, BERT proved to be the best model. As a final step for the project, we'll put everything together in a separate class with all the functions, so that our client can use the model without running all these cells and loading the data.

In [25]:
class ComprehensiveTalentRanker:
    def __init__(self, data_path):
        
        self.data_path = data_path
        self.df = None
        
        # Initialize NLP components
        self.stop_words = set(stopwords.words('english'))
        self.lemmatizer = WordNetLemmatizer()
        

        
        # Load and preprocess data
        self.load_data()
        self.preprocess_data()
        
    def load_data(self):

        self.df = pd.read_csv(self.data_path)
        print(f"Loaded {len(self.df)} candidates")
        
    def preprocess_text(self, text):

        if pd.isna(text):
            return ""
        
        # Convert to lowercase
        text = text.lower()
        
        # Remove punctuation
        text = re.sub(r'[^\w\s]', '', text)
        
        # Remove extra whitespace
        text = re.sub(r'\s+', ' ', text).strip()
        
        # Tokenize
        tokens = word_tokenize(text)
        
        # Remove stopwords and lemmatize
        tokens = [self.lemmatizer.lemmatize(token) for token in tokens 
                 if token not in self.stop_words]
        
        return ' '.join(tokens)
    
    def preprocess_data(self):

        print("Preprocessing data...")
        self.df['job_title_processed'] = self.df['job_title'].apply(self.preprocess_text)
        print("Data preprocessing completed")

    def bag_of_words_similarity(self, target_string):
 
        print("Using Bag of Words...")
        
        # Create BoW vectorizer
        vectorizer = CountVectorizer(max_features=1000, ngram_range=(1, 2))
        
        # Combine job titles with target string
        all_texts = list(self.df['job_title_processed']) + [target_string]
        bow_matrix = vectorizer.fit_transform(all_texts)
        
        # Calculate similarity between each job title and target
        job_titles_matrix = bow_matrix[:-1]  # All except last (target)
        target_vector = bow_matrix[-1:]      # Last row (target)
        
        similarities = cosine_similarity(job_titles_matrix, target_vector).flatten()
        return similarities
    
    def tfidf_similarity(self, target_string):

        print("Using TF-IDF...")
        
        # Create TF-IDF vectorizer
        vectorizer = TfidfVectorizer(max_features=1000, ngram_range=(1, 2))
        
        # Combine job titles with target string
        all_texts = list(self.df['job_title_processed']) + [target_string]
        tfidf_matrix = vectorizer.fit_transform(all_texts)
        
        # Calculate similarity between each job title and target
        job_titles_matrix = tfidf_matrix[:-1]  # All except last (target)
        target_vector = tfidf_matrix[-1:]      # Last row (target)
        
        similarities = cosine_similarity(job_titles_matrix, target_vector).flatten()
        return similarities
    
    def bert_similarity(self, target_string):

        print("Using BERT...")
        
        # Load BERT model and tokenizer
        tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        model = BertModel.from_pretrained('bert-base-uncased')
        model.eval()
        
        # Calculate BERT embeddings for job titles
        doc_embeddings = []
        for text in self.df['job_title']:
            if pd.isna(text):
                doc_embeddings.append(np.zeros(768))
                continue
                
            inputs = tokenizer(text, return_tensors='pt', max_length=512, truncation=True, padding=True)
            
            with torch.no_grad():
                outputs = model(**inputs)
                embedding = outputs.last_hidden_state[:, 0, :].numpy().flatten()
                doc_embeddings.append(embedding)
        
        # Calculate BERT embedding for target
        target_inputs = tokenizer(target_string, return_tensors='pt', max_length=512, truncation=True, padding=True)
        
        with torch.no_grad():
            target_outputs = model(**target_inputs)
            target_embedding = target_outputs.last_hidden_state[:, 0, :].numpy().flatten()
        
        # Calculate similarities
        similarities = []
        for doc_embedding in doc_embeddings:
            similarity = np.dot(doc_embedding, target_embedding) / (np.linalg.norm(doc_embedding) * np.linalg.norm(target_embedding) + 1e-8)
            similarities.append(similarity)
        
        return np.array(similarities)
    

    
    def sbert_similarity(self, target_string):

        print("Using Sentence-BERT...")
        
        sbert_model = SentenceTransformer('all-MiniLM-L6-v2')
        
        # Get embeddings for job titles
        job_titles = self.df['job_title'].fillna('').tolist()
        job_embeddings = sbert_model.encode(job_titles)
        
        # Get embedding for target
        target_embedding = sbert_model.encode([target_string])
        
        # Calculate similarities
        similarities = cosine_similarity(job_embeddings, target_embedding).flatten()
        
        return similarities

    def rank_candidates(self, target_string, method='bert', starred_candidates=None):
        
        # Preprocess target string
        target_processed = self.preprocess_text(target_string)
        
        # Calculate similarities based on the method
        if method == 'bow':
            similarities = self.bag_of_words_similarity(target_processed)
        elif method == 'tfidf':
            similarities = self.tfidf_similarity(target_processed)
        elif method == 'bert':
            similarities = self.bert_similarity(target_string)  
        elif method == 'sbert':
            similarities = self.sbert_similarity(target_string)  
        else:
            raise ValueError(f"Unknown method: {method}. Available methods: 'bow', 'tfidf', 'bert', 'sbert'")
        
        # Normalize similarities to 0-1 range
        scaler = MinMaxScaler()
        similarities_norm = scaler.fit_transform(similarities.reshape(-1, 1)).flatten()
        
        # Create ranking dataframe
        ranking_df = self.df.copy()
        ranking_df['similarity_score'] = similarities_norm
        ranking_df['rank'] = ranking_df['similarity_score'].rank(ascending=False, method='min').astype(int)
        
        # Apply re-ranking if starred candidates provided
        if starred_candidates:
            ranking_df = self.rerank_with_starred(ranking_df, starred_candidates)
        
        # Sort by rank
        ranking_df = ranking_df.sort_values('rank').reset_index(drop=True)
        
        return ranking_df
    
    def rerank_with_starred(self, ranking_df, starred_candidates):

        print(f"Re-ranking with {len(starred_candidates)} starred candidates...")
        
        # Get features of starred candidates
        starred_mask = ranking_df['id'].isin(starred_candidates)
        starred_features = ranking_df[starred_mask]['similarity_score']
        
        if len(starred_features) == 0:
            return ranking_df
        
        # Calculate average similarity of starred candidates
        starred_avg = starred_features.mean()
        
        # Boost scores for candidates similar to starred ones
        for idx, row in ranking_df.iterrows():
            candidate_score = row['similarity_score']
            
            # Calculate similarity to starred candidates' average
            similarity_to_starred = 1 - abs(candidate_score - starred_avg)
            
            # Apply boost
            boost_factor = 1 + 0.3 * similarity_to_starred
            ranking_df.loc[idx, 'similarity_score'] = candidate_score * boost_factor
        
        # Re-rank
        ranking_df['rank'] = ranking_df['similarity_score'].rank(ascending=False, method='min').astype(int)
        
        return ranking_df
    
    def compare_methods(self, target_string, starred_candidates=None):

        methods = ['bow', 'tfidf', 'bert', 'sbert']
        results = {}
        
        print(f"Comparing methods for target: '{target_string}'")
        print("="*60)
        
        for method in methods:
            print(f"\nTesting {method.upper()}...")
            try:
                ranking = self.rank_candidates(target_string, method, starred_candidates)
                
                # Store top 10 results
                top_10 = ranking.head(10)[['rank', 'id', 'job_title', 'similarity_score']]
                results[method] = {
                    'ranking': ranking,
                    'top_10': top_10,
                    'avg_score': ranking['similarity_score'].mean(),
                    'max_score': ranking['similarity_score'].max()
                }
                
                print(f"Top 5 candidates using {method.upper()}:")
                for _, row in top_10.head(5).iterrows():
                    starred_mark = " ⭐" if row['id'] in (starred_candidates or []) else ""
                    print(f"  {row['rank']:2d}. ID {row['id']:3d} - {row['job_title'][:50]:<50} (Score: {row['similarity_score']:.4f}){starred_mark}")
                    
            except Exception as e:
                print(f"Error with {method}: {e}")
                continue
        
        return results
    
    def get_best_method(self, target_string, starred_candidates=None):
 
        results = self.compare_methods(target_string, starred_candidates)
        
        if not results:
            print("No methods worked successfully")
            return None
        
        # Find method with highest average score
        best_method = max(results.keys(), key=lambda x: results[x]['avg_score'])
        
        print(f"\n" + "="*60)
        print(f"BEST METHOD: {best_method.upper()}")
        print(f"Average similarity score: {results[best_method]['avg_score']:.4f}")
        print("="*60)
        
        return best_method
 

In [27]:
ranker=ComprehensiveTalentRanker("data/potential_talents_data.csv")
target = "seeking human resources"
starred_candidates=[53, 28, 68]
best_method = ranker.get_best_method(target, starred_candidates=starred_candidates)

Loaded 104 candidates
Preprocessing data...
Data preprocessing completed
Comparing methods for target: 'seeking human resources'

Testing BOW...
Using Bag of Words...
Re-ranking with 3 starred candidates...
Top 5 candidates using BOW:
   1. ID  28 - Seeking Human Resources Opportunities              (Score: 1.2398) ⭐
   1. ID  30 - Seeking Human Resources Opportunities              (Score: 1.2398)
   1. ID  99 - Seeking Human Resources Position                   (Score: 1.2398)
   4. ID  73 - Aspiring Human Resources Manager, seeking internsh (Score: 1.0486)
   5. ID  53 - Seeking Human Resources HRIS and Generalist Positi (Score: 1.0367) ⭐

Testing TFIDF...
Using TF-IDF...
Re-ranking with 3 starred candidates...
Top 5 candidates using TFIDF:
   1. ID  99 - Seeking Human Resources Position                   (Score: 1.1949)
   2. ID  28 - Seeking Human Resources Opportunities              (Score: 1.1871) ⭐
   2. ID  30 - Seeking Human Resources Opportunities              (Score: 1.1871)