### Elevating Talent Matching with Word2Vec Word Embedding with Gensim in Python.
In the world of hiring, companies face a big challenge: sorting through heaps of resumes to find the right candidates. It's a time-consuming process, and sometimes, they end up with the wrong fit for the job. Our project aims to change that by using smart computer techniques to help with hiring.

We've learned how to use fancy computer tricks to quickly go through resumes. By teaching a computer to understand resumes and job descriptions, we can find out which candidates are best suited for a particular job. This means less time wasted and more chances of finding the perfect match for the job.

Our project is a game-changer for hiring. It makes the whole process faster and easier. With our computer magic, companies can focus on talking to the best candidates instead of spending hours going through resumes. It's a win-win for everyone involved, making hiring smoother and more efficient.

#### Step 1: Resume Data Loading.
Before we start our analysis, first, we need to view the dataset. It is essential to view the data and check the columns. Let's take a look.

In [6]:
import pandas as pd
#--- Read in dataset ----
df = pd.read_csv("Resume.csv")

#--- Inspect data ---
df

Unnamed: 0,ID,Resume_str
0,16852973,HR ADMINISTRATOR/MARKETING ASSOCIATE\...
1,22323967,"HR SPECIALIST, US HR OPERATIONS ..."
2,33176873,HR DIRECTOR Summary Over 2...
3,27018550,HR SPECIALIST Summary Dedica...
4,17812897,HR MANAGER Skill Highlights ...
...,...,...
995,33578873,SALES Summary I am looking f...
996,27607632,SALES Summary Self-motivated...
997,23760084,SALES Summary General Sales...
998,30083943,SALES Professional Summary ...


### Step 2: Empowering Resumes with Word2Vec Word Embedding.

Building and Saving Word Embeddings for Resumes using Word2Vec.

Task is to tokenize the text in the 'Resume_str' column of DataFrame df into lowercase words. Train a Word2Vec model on the tokenized resume data. Set parameters: vector size (100), window size (5), min word count (1), using 4 CPU cores. Save the trained Word2Vec model as "resume_word2vec.model"

In [10]:
import nltk
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import pandas as pd

# Download 'punkt' resource for tokenization
nltk.download('punkt')
nltk.download('punkt_tab')

# Tokenize the text in the 'Resume_str' column of DataFrame df into lowercase words
tokenized_resumes = df['Resume_str'].apply(lambda x: word_tokenize(x.lower()))

# Train a Word2Vec model on the tokenized resume data
word2vec_model = Word2Vec(
    sentences=tokenized_resumes,
    vector_size=100,
    window=5,
    min_count=1,
    workers=4
)

# Save the model
word2vec_model.save("resume_word2vec.model")
print("Model trained and saved successfully!")


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\nandi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\nandi\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


Model trained and saved successfully!
