#

tokenization

Here are 813 public repositories matching this topic...

CompLin / nheengatu

Tools and resources for the computational processing of Nheengatu (Modern Tupi)

natural-language-processing dictionary tokenizer computational-linguistics corpus-linguistics pos-tagger tokenization nheengatu modern-tupi

Updated Jun 8, 2024
Python

LongRoPE

jshuadvd / LongRoPE

Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper

nlp machine-learning natural-language-processing ai deep-learning transformers artificial-intelligence gpt language-model natural-language-inference natural tokenization natural-language-understanding attention-is-all-you-need attention-mechanisms transformer-architecture natural-language-procressing tokenizers llm

Updated Jun 8, 2024
Python

AgentOps-AI / tokencost

Easy token price estimates for LLMs

analytics price openai token price-tracker observability tokenization claude large-language-models llm

Updated Jun 8, 2024
Python

sytelus / nanuGPT

Simple, reliable and well tested training code for quick experiments with transformer based models

machine-learning ai ml pytorch transformer llama gpt tokenization llm llama2

Updated Jun 7, 2024
Jupyter Notebook

Hk669 / bpetokenizer

(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)

tokenizer regex-pattern tokenization vocab bpe gpt-4 llms

Updated Jun 7, 2024
Jupyter Notebook

rosette-api / curl-examples

cUrl examples for the Rosette API

nlp natural-language-processing text-mining curl morphology sentiment text-analytics tokenization categorization lemmatization relation-extraction entity-extraction text-embedding

Updated Jun 7, 2024
Shell

rosette-api / shell

Shell scripts for accessing the Rosette API endpoints

nlp machine-learning natural-language-processing text-mining sentiment text-analysis shell-script bash-script text-analytics tokenization shell-scripting categorization linked-entities entity-linking entity-extraction name-translation

Updated Jun 7, 2024

Basis-Theory / developers.basistheory.com

Basis Theory Developer Documentation

security documentation docs encryption tokenization

Updated Jun 7, 2024
JavaScript

DhairyaC / News-Article-Classification

Use Hugging Face Transformers to classify news articles

tensorflow keras tokenization huggingface-transformers albert-transformer

Updated Jun 7, 2024
Jupyter Notebook

zouharvi / tokenization-scorer

Simple-to-use scoring function for arbitrarily tokenized texts.

segmentation tokenization subword bpe

Updated Jun 6, 2024
Python

adbar / simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

nlp tokenizer language-detection wordlist lemmatizer morphological-analysis lemmatiser tokenization lemmatization corpus-tools language-identification low-resource-nlp

Updated Jun 6, 2024
Python

bminixhofer / zett

Code for Zero-Shot Tokenizer Transfer

multilingual transfer-learning language-model tokenization llm llms

Updated Jun 8, 2024
Python

MayankTamakuwala / Vision_Search_Engine

Vision Search Engine is a sophisticated and versatile search engine designed to provide highly accurate and efficient search capabilities. Leveraging a suite of advanced algorithms and techniques, this project is equipped to handle a wide array of search functionalities, ensuring precise and relevant results.

tokenization normalization boolean-retrieval variable-byte-encoding api-integration k-means-clustering probabilistic-retrieval ranked-retrieval positional-indexing real-time-indexing stopword-removal on-disk-indexing stemming-and-lemmatization synonym-handling phrase-searching wildcard-queries multi-threaded-query-processing

Updated Jun 6, 2024
Python

brave / tokenizer

A modular resource tokenization service.

go tokenization anonymization

Updated Jun 5, 2024
Go

possible-worlds-research / wikiloader

A package to download and preprocess a Wikipedia dump, in any language.

nlp wikipedia distributional-semantics tokenization

Updated Jun 5, 2024
Python

johnexzy / tokenpass-contract

Create, manage and earn by creating gated-contents, track subscription made for contents.

blockchain tokenization token-gating

Updated Jun 5, 2024
Solidity

explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

python nlp data-science machine-learning natural-language-processing ai deep-learning neural-network text-classification cython artificial-intelligence spacy named-entity-recognition neural-networks nlp-library tokenization entity-linking

Updated Jun 5, 2024
Python

TheKidPadra / DeepLearning.AI-TensorFlow_Developer-specialization

This repo contains my work & The code base for this TensorFlow Developer specialization offered by deeplearning.AI

machine-learning natural-language-processing computer-vision time-series tensorflow coursera prediction forecasting convolutional-neural-network tokenization augmentation specialization assignments rnns naturallanguageprocessing inductive-transfer dropouts

Updated Jun 5, 2024
Jupyter Notebook

SmartTokenLabs / TokenScript

TokenScript schema, specs and paper

security cryptography mobile xml blockchain tokens web3 tokenization tokenisation

Updated Jun 5, 2024
JavaScript

TI-Toolkit / tivars_lib_py

A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files

python pil file-format z80 calculators ez80 texas-instruments texas-instruments-calculators tokenize tokenization

Updated Jun 4, 2024
Python

Improve this page

Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."