# 1) Study of Embeddings for JD - Resume Matching (Code Explanation)

* *Author - Jithin J Kumar*
* *Date - 25-01-2019*
* *Project - JD-Resume Matching Algorithm*
* *Project Members - Rashmi, Jagadish, Jithin*

## ReadMe Note

Word Embeddings are an important concept in Natural language processing task. Choosing the correct embedding patterns is supposed to improve the efficeny of the model. Hence in this notebook We have explored 2 of the most common Word embeddings currently in use Word2vec and FastText.

**Library Dependencies**
* Gensim Version :  3.6.0

**Data Dependencies**
* outfile - tokenized data in the form of pickle file for preparing model

#### 1) Importing necessary Libraries 

In [1]:
import numpy as np
import pickle
import gensim
from gensim.models import FastText
import warnings
warnings.filterwarnings("ignore")

print("Gensim Version : ",gensim.__version__)

Gensim Version :  3.6.0


#### 2) Loading data into memory from pickle file

In [2]:
with open ('outfile', 'rb') as fp:
    data = pickle.load(fp)

In [3]:
# data looks like this...
data

[['RESUME', 'Y.Sivaram', '.'],
 ['Email',
  ':',
  'ysivaram.developer',
  '@',
  'gmail.com',
  'VBA',
  'Developer',
  '&',
  'Reporting',
  'Analyst',
  '.'],
 ['Mobile', 'No', ':', '+91-', '9742915208', 'Excel', 'Expert', '.'],
 ['Career',
  'objective',
  ':',
  'To',
  'secure',
  'a',
  'challenging',
  'position',
  'where',
  'I',
  'can',
  'effectively',
  'contribute',
  'my',
  'skills',
  'as',
  'Software',
  'Professional',
  'possessing',
  'competent',
  'Technical',
  'skills',
  '.'],
 ['PROFESSIONAL',
  'SUMMARY',
  ':',
  'Having',
  '4',
  'Years',
  'of',
  'professional',
  'IT',
  'experience',
  'in',
  'Visual',
  'Basic',
  'for',
  'Applications',
  'such',
  'as',
  'Excel',
  'Advanced',
  'Excel',
  'VBA',
  'Macros',
  'and',
  'MIS',
  'Analyst',
  '.'],
 ['Customized',
  'and',
  'developed',
  'reusable',
  'objects',
  'and',
  'reports',
  'highlighting',
  'process',
  'status',
  'for',
  'each',
  'record',
  '.'],
 ['Generated',
  'Email',
  '

#### 3) Building Fasttext Model using Gensim Library

In [4]:
modelFT = FastText(data, size=100, window=5, min_count=5, workers=4,sg=1)
modelFT

<gensim.models.fasttext.FastText at 0x7f6a48df2940>

In [5]:
# evaluating the most similar words to 'java' using Fasttext model...
modelFT.most_similar('java')

[('JSPs', 0.9023535251617432),
 ('J2EE/Java', 0.9003547430038452),
 ('servlet', 0.8958097696304321),
 ('JSP', 0.8879826068878174),
 ('JSF', 0.8825526237487793),
 ('Java/J2ee', 0.8789649605751038),
 ('front-end', 0.8755855560302734),
 ('Java/J2EE', 0.8739306330680847),
 ('JSTL', 0.8704612255096436),
 ('CoreJava', 0.8688318729400635)]

In [6]:
# evaluating the most similar words to 'python' using Fasttext model...
modelFT.most_similar('python')

[('Python', 0.9732093214988708),
 ('JSon', 0.9431465268135071),
 ('hub', 0.9393289685249329),
 ('XSL', 0.9371820688247681),
 ('GitHub', 0.9356774091720581),
 ('Bitbucket', 0.9342490434646606),
 ('MAVEN', 0.9332001209259033),
 ('CXF', 0.9326076507568359),
 ('Json', 0.9325049519538879),
 ('Knockout', 0.9320213794708252)]

#### 4) Building Word2Vec Model Using Gensim

In [7]:
from gensim.models import Word2Vec
modelWV = Word2Vec(data, min_count=1)
print(modelWV)

Word2Vec(vocab=31662, size=100, alpha=0.025)


In [8]:
# evaluating the most similar words to 'java' using word2vec model...
modelWV.most_similar('java')

[('spring', 0.9844119548797607),
 ('servlet', 0.9804044961929321),
 ('JSTL', 0.9750860333442688),
 ('framework', 0.9721617698669434),
 ('JSP', 0.971493124961853),
 ('Restful', 0.9673343300819397),
 ('Spring', 0.9652677178382874),
 ('persistence', 0.9639869928359985),
 ('API', 0.963525652885437),
 ('JUnit', 0.9631530046463013)]

In [9]:
# evaluating the most similar words to 'python' using word2vec model...
modelWV.most_similar('python')

[('Perl', 0.9912320375442505),
 ('JSR', 0.991031289100647),
 ('RESTFUL', 0.9910008311271667),
 ('Redux', 0.9896095991134644),
 ('Storm', 0.988836407661438),
 ('Pentaho', 0.9886062741279602),
 ('Highcharts', 0.9885555505752563),
 ('MapReduce', 0.9882293939590454),
 ('HBase', 0.9881883263587952),
 ('D3', 0.9880128502845764)]

### ---End Of Explanation---