word2vec (Skip-gram + negative sampling)

Small NumPy-only trainer: builds a vocabulary from plain text, samples Skip-gram (center, context) pairs with a random window, and learns separate input/output embedding matrices with negative sampling.

Setup

Python 3.10+
numpy

Data

Point training at a UTF-8 text file. The script defaults to text8.txt (e.g. the text8 corpus). Place that file in the project directory or pass another path via run_training(text_path=...).

Run

python skipgram_neg_sampleing.py

main() calls run_training() with a small max_chars and more epochs for a quick demo. For full control (window size, batch size, negative samples, etc.), import run_training from skipgram_neg_sampleing or extend the script.

Layout

File	Role
`dataset.py`	Tokenize text, vocab + counts, negative-sampling distribution, Skip-gram pairs
`skipgram_neg_sampleing.py`	Loss, gradients, SGD training loop

Trained input embeddings are the usual word vectors (input_emb returned from run_training).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
1301.3781v3.pdf		1301.3781v3.pdf
1310.4546v1.pdf		1310.4546v1.pdf
README.md		README.md
dataset.py		dataset.py
skipgram_neg_sampleing.py		skipgram_neg_sampleing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

word2vec (Skip-gram + negative sampling)

Setup

Data

Run

Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

word2vec (Skip-gram + negative sampling)

Setup

Data

Run

Layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages