Improved-Word-Embeddings

This is a paper implementation of https://arxiv.org/pdf/1711.08609.pdf. This paper increases the accuracy of word embeddings by enriching the word embedding with its associated POS (Part of Speech) tag. (See Below):

However, this excludes the Lexicon2Vec conversion and follows the following steps:

   Inputs:
   S = {W1, W2,……., Wn} , Input sentence S contains n words
   PT = {T1, T2,……, Tm}, All POS tags
   Corpus = Imdb/Your corpus
   
  Output:
   IMV: Improved word vectors of corpus

The steps follow the following algorithm:

for j=1 to m do
    VTj GenerateVector ( Tj )
    Tj < Tj , VTj >
end for

for each Wi in S do
  If Wi exist in "your choice of word vector" then extract VecWi
    MVi VecWi
  endif
  
  # POS ExtractPOS ( Wi )
for k=1 to m do
 If POS=Tk then ADD VTk into MVi
 end if
end for

Usage

This gives you the option of testing it on the Imdb data and also to run it on your corpus. To run it on the imdb data:

In config.json assign use_imdb : 1

To run it on your corpus:

  In config.json assign use_imdb : 0

The model used is a very generic one. Feel free to make changes to the model as per your requirements. To change model:

 In pos_function.py change model_build

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ImprovedWordEmbeddings.png		ImprovedWordEmbeddings.png
README.md		README.md
config.json		config.json
main.py		main.py
model_params.json		model_params.json
pos_function.py		pos_function.py
word_embeddings.py		word_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly