### Elevating Talent Matching with Word2Vec Word Embedding with Gensim in Python.
In the world of hiring, companies face a big challenge: sorting through heaps of resumes to find the right candidates. It's a time-consuming process, and sometimes, they end up with the wrong fit for the job. Our project aims to change that by using smart computer techniques to help with hiring.

We've learned how to use fancy computer tricks to quickly go through resumes. By teaching a computer to understand resumes and job descriptions, we can find out which candidates are best suited for a particular job. This means less time wasted and more chances of finding the perfect match for the job.

Our project is a game-changer for hiring. It makes the whole process faster and easier. With our computer magic, companies can focus on talking to the best candidates instead of spending hours going through resumes. It's a win-win for everyone involved, making hiring smoother and more efficient.

#### Step 1: Resume Data Loading.
Before we start our analysis, first, we need to view the dataset. It is essential to view the data and check the columns. Let's take a look.

In [6]:
import pandas as pd
#--- Read in dataset ----
df = pd.read_csv("Resume.csv")

#--- Inspect data ---
df

Unnamed: 0,ID,Resume_str
0,16852973,HR ADMINISTRATOR/MARKETING ASSOCIATE\...
1,22323967,"HR SPECIALIST, US HR OPERATIONS ..."
2,33176873,HR DIRECTOR Summary Over 2...
3,27018550,HR SPECIALIST Summary Dedica...
4,17812897,HR MANAGER Skill Highlights ...
...,...,...
995,33578873,SALES Summary I am looking f...
996,27607632,SALES Summary Self-motivated...
997,23760084,SALES Summary General Sales...
998,30083943,SALES Professional Summary ...


### Step 2: Empowering Resumes with Word2Vec Word Embedding.

Building and Saving Word Embeddings for Resumes using Word2Vec.

Task is to tokenize the text in the 'Resume_str' column of DataFrame df into lowercase words. Train a Word2Vec model on the tokenized resume data. Set parameters: vector size (100), window size (5), min word count (1), using 4 CPU cores. Save the trained Word2Vec model as "resume_word2vec.model"

In [10]:
import nltk
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize

# Download 'punkt' resource for tokenization
nltk.download('punkt')
nltk.download('punkt_tab')

# Tokenize the text in the 'Resume_str' column of DataFrame df into lowercase words
tokenized_resumes = df['Resume_str'].apply(lambda x: word_tokenize(x.lower()))

# Train a Word2Vec model on the tokenized resume data
word2vec_model = Word2Vec(
    sentences=tokenized_resumes,
    vector_size=100,
    window=5,
    min_count=1,
    workers=4
)

# Save the model
word2vec_model.save("resume_word2vec.model")
print("Model trained and saved successfully!")


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\nandi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\nandi\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


Model trained and saved successfully!


#### Step 3: Job Description Precision: Navigating Talent Waters with Word2Vec.

Creating a Vector Representation for a User-Provided Job Description using Pre-trained Word2Vec Model.

- Your task is to load the pre-trained Word2Vec model named "resume_word2vec.model" using Gensim.

- Load the user-provided job description as a string. Tokenize the job description into lowercase words. Check if the token is present in the Word2Vec model's vocabulary (model.wv). If present, retrieve the word vector. Calculate the mean of all the word vectors to generate a vector representation for the entire job description.

- Convert the generated vector representation to a Numpy array. The variable 'user_provided_vector' now contains a numerical representation of the user-provided job description based on the pre-trained Word2Vec model.

In [11]:
import numpy as np

# # Load the pre-trained Word2Vec model
my_model = Word2Vec.load("resume_word2vec.model")

# # Load the user-provided job description
job_description = """We are seeking a highly motivated and detail-oriented Data Analyst to join our team in Delhi,
India. As a Data Analyst, you will be responsible for analyzing large datasets, identifying trends, 
and generating insights to drive business decisions. You should have strong skills in data analysis, SQL (MySQL), 
and data management. Proficiency in office management and basic knowledge of data science concepts is also required. 
This is an entry-level position, and we offer an annual compensation of 3-6 LPA. 
A bachelor's degree is the minimum qualification required for this role. 
Join us and contribute to our data-driven decision-making process."""

# Tokenize the user-provided job description
tokenized_job_description = word_tokenize(job_description.lower())

 #Retrieve word vectors for tokens present in the model's vocabulary
word_vectors = [word2vec_model.wv[token] for token in tokenized_job_description if token in word2vec_model.wv]

# Calculate the mean of all the word vectors to generate a vector representation for the entire job description
if word_vectors:
    mean_vector = np.mean(word_vectors, axis=0)
else:
    mean_vector = np.zeros(word2vec_model.vector_size)

# Convert the generated vector representation to a Numpy array
user_provided_vector = np.array(mean_vector)

print("Vector representation for the job description:", user_provided_vector)

Vector representation for the job description: [-0.54823947  0.3156618  -0.15656994  0.00200185  0.16903327 -0.78497076
  0.04194819  0.86971027 -0.20658605 -0.31736094  0.07435112 -0.29552442
 -0.36488298  0.39208832  0.40816435 -0.3338771   0.05262439 -0.11582508
 -0.06887101 -0.8429761  -0.5104074  -0.3097133  -0.06228101  0.50397205
  0.0930154   0.00915964 -0.28422952 -0.2644952  -0.41796187  0.05258517
  0.4790165   0.18670195 -0.02552393 -0.42283142 -0.22452849  0.26726878
 -0.14724356 -0.556589    0.20661734 -0.37638184  0.15845901 -0.556226
 -0.19911318 -0.3231706   0.1520997  -0.22602019 -0.09159579 -0.1312308
  0.05002306  0.36709243  0.2376847  -0.12050234 -0.58711296  0.36037737
  0.14033702  0.48992628  0.2245814  -0.24120715 -0.18965845  0.35196248
  0.17626645 -0.20186383  0.18303522  0.13773578 -0.36581966  0.40484056
  0.32192183  0.06897336 -0.01661446  0.48803094 -0.12171568  0.30195642
  0.28555056 -0.24468781  0.46122393  0.6706288  -0.35504895 -0.05604719
 -0.183