Skip to content

SyedSameerFaisall/word2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word2Vec

Small NumPy-based Word2Vec implementation using skip-gram with negative sampling.

Project Structure

word2vec/
|-- main.py
|-- requirements.txt
|-- README.md
|-- word2vec/
|   |-- __init__.py
|   |-- corpus.py
|   |-- vocab.py
|   |-- data.py
|   |-- model.py
|   |-- trainer.py
|   `-- evaluate.py
`-- tests/
    `-- test_word2vec.py

Setup

Create and activate a virtual environment, then install dependencies:

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Run

Train the demo model from the project root:

python main.py

The script will:

  1. Build a vocabulary from the sample corpus.
  2. Generate skip-gram pairs.
  3. Train embeddings with negative sampling.
  4. Print nearest neighbours for a few probe words.
  5. Save weights to word2vec_weights.npz.

Tests

Run the full test suite with:

python -m pytest -q

Current status: 39 tests passing.

Modules

  • word2vec/corpus.py: sample corpus text and tokeniser.
  • word2vec/vocab.py: vocabulary lookup tables and noise distribution.
  • word2vec/data.py: skip-gram pair generation helpers.
  • word2vec/model.py: sigmoid, Word2Vec parameters, SGNS update step.
  • word2vec/trainer.py: training loop and hyperparameter config.
  • word2vec/evaluate.py: cosine similarity and nearest-neighbour utilities.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages