Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Example		Example
Custom_train_example.ipynb		Custom_train_example.ipynb
Customized_Word2Vec_Embeddings_for_assignment.ipynb		Customized_Word2Vec_Embeddings_for_assignment.ipynb
Customized_Word2Vec_Evaluation.ipynb		Customized_Word2Vec_Evaluation.ipynb
LICENSE		LICENSE
NLP Assignment Report.pdf		NLP Assignment Report.pdf
Negative_Sampling_on_GPU.ipynb		Negative_Sampling_on_GPU.ipynb
README.md		README.md
full_testing_corpus.npy		full_testing_corpus.npy
full_training_corpus.npy		full_training_corpus.npy

Repository files navigation

Customized-Word2Vec

This repository contains additional features, extended to the traditional Word2Vec library, launched in 2013. This project was part of a larger project that involved sentimental analysis and sarcasm detection. The project details have been added in the repo-description section.

Directly clone the repository and start using it. The details of how the files are named and strored is given below:-

-> The model made using 3 options - Skipgram model, Common Bag of Words, Negative Sampling.

-> You can change the following model parameters:

Window Size
Vector Size
Learning Rate
Sub-sampling
number of epochs you want to train for.

-> Optimizer used is SGD.

-> Other that the main code ipynb, entire training corpus and testing corpus (numpy array) of reuters dataset (can be downloaded directly from here), I have included an example directory where I have applied my code base.

For more information, please refer to the project report uploaded.

The Example folder contains :

-> The folder contains numpy array of training and testing datasets

-> The folder also contains a pickle file containing the dictionary: key - Word (string format) values - onehot encoding (numpy array)

-> Naming Convention of the weight1 files are :

../Window_{window size}/{choice of model}/{learning rate}_{vector size}weight_numpy.npy

These, weight vectors must be dot producted with the onehot vectors to get the actual embeddings in the format given : W'X { W - Weight ; X - Onehot Vector}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customized-Word2Vec

The Example folder contains :

About

Releases

Packages

Languages

License

Jash-2000/Optimized-Word2Vec

Folders and files

Latest commit

History

Repository files navigation

Customized-Word2Vec

The Example folder contains :

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages