Tools and resources for the computational processing of Nheengatu (Modern Tupi)
-
Updated
Jun 8, 2024 - Python
Tools and resources for the computational processing of Nheengatu (Modern Tupi)
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
Easy token price estimates for LLMs
Simple, reliable and well tested training code for quick experiments with transformer based models
(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
cUrl examples for the Rosette API
Shell scripts for accessing the Rosette API endpoints
Basis Theory Developer Documentation
Use Hugging Face Transformers to classify news articles
Simple-to-use scoring function for arbitrarily tokenized texts.
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Code for Zero-Shot Tokenizer Transfer
Vision Search Engine is a sophisticated and versatile search engine designed to provide highly accurate and efficient search capabilities. Leveraging a suite of advanced algorithms and techniques, this project is equipped to handle a wide array of search functionalities, ensuring precise and relevant results.
A package to download and preprocess a Wikipedia dump, in any language.
Create, manage and earn by creating gated-contents, track subscription made for contents.
💫 Industrial-strength Natural Language Processing (NLP) in Python
This repo contains my work & The code base for this TensorFlow Developer specialization offered by deeplearning.AI
TokenScript schema, specs and paper
A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."