Skip to content

AISangam/Finding-Most-Similar-Words-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Finding-Most-Similar-Words-NLP

Finding Most Important Words using word2vec Model of Google similar_words

Finding the most similar words to each word of sentence

This code is implemented with the aim to find the sentence based on keyword entered by the user and then preporocess that sentence. Preprocessing includes operations like removing stop words from the text or removing non characters. Resultant sentence is tokenized and is passed to word2vec. It converts each word into word embedding of dimension 300 as set by me. There are some of the dimensions that one can set based on the documentation of word2vec. In the end, most similar words corresponding to each word of the vocabulary is displayed at the terminal. One can see such in the image added above. If anyone is finding difficult in understanding this, let me write down these in steps.

  1. User is asked to enter the keyword from the text according to which he/she wants the sentence to get filtered.
  2. Sentence or sentences according to above filter are checked for stop words, non word characters.
  3. Such words are converted into the list and is passed to the word2vec model provided by gensim.
  4. Model is created and each word of the vocabulary is checked for the similar words
  5. Result is displayed on the terminal.

How to run the code

To run the code, first install the dependencies using the below command

pip3 install -r requirements.txt  

Now please run below file

python3 similar_words_word2vec.py

Thanks

Happy Coding

About

Finding Most Important Words using word2vec Model of Google

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages