Tokenizer Token Viewer

This repository houses an implementation of the Byte Pair Encoding (BPE) algorithm with several levels of optimization for tokenization. BPE is a data compression technique used primarily in natural language processing tasks such as text tokenization, where it helps in breaking down a given text into smaller subword units. By iteratively merging the most frequent pairs of characters or bytes, BPE efficiently builds a vocabulary that captures both common and rare words, thereby enhancing the performance of various NLP tasks like machine translation, language modeling, and text generation.

Basic BPE Implementation.

❤️ This work is mostly inspired from the Youtube video talking about Tokenization by the talented Andrej Karpathy twitter

Deployed version

bpetokensviewer

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
public		public
src		src
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenizer Token Viewer

Deployed version

About

Releases

Packages

Languages

LahiaOmar/tokens_viewer

Folders and files

Latest commit

History

Repository files navigation

Tokenizer Token Viewer

Deployed version

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages