Word2Vec

Small NumPy-based Word2Vec implementation using skip-gram with negative sampling.

Project Structure

word2vec/
|-- main.py
|-- requirements.txt
|-- README.md
|-- word2vec/
|   |-- __init__.py
|   |-- corpus.py
|   |-- vocab.py
|   |-- data.py
|   |-- model.py
|   |-- trainer.py
|   `-- evaluate.py
`-- tests/
    `-- test_word2vec.py

Setup

Create and activate a virtual environment, then install dependencies:

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Run

Train the demo model from the project root:

python main.py

The script will:

Build a vocabulary from the sample corpus.
Generate skip-gram pairs.
Train embeddings with negative sampling.
Print nearest neighbours for a few probe words.
Save weights to word2vec_weights.npz.

Tests

Run the full test suite with:

python -m pytest -q

Current status: 39 tests passing.

Modules

word2vec/corpus.py: sample corpus text and tokeniser.
word2vec/vocab.py: vocabulary lookup tables and noise distribution.
word2vec/data.py: skip-gram pair generation helpers.
word2vec/model.py: sigmoid, Word2Vec parameters, SGNS update step.
word2vec/trainer.py: training loop and hyperparameter config.
word2vec/evaluate.py: cosine similarity and nearest-neighbour utilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word2Vec

Project Structure

Setup

Run

Tests

Modules

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tests		tests
word2vec		word2vec
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Word2Vec

Project Structure

Setup

Run

Tests

Modules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages