text-tokenization

Here are 13 public repositories matching this topic...

alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

tokenizer vocabulary vocabulary-builder tokenize tokenization tokenisation tokenizing text-tokenization vocabulary-generator

Updated Jan 28, 2024
Go

twardoch / split-markdown4gpt

Star

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

Updated May 16, 2024
Python

SayamAlt / Resume-Classification-using-fine-tuned-BERT

Star

Successfully developed a resume classification model which can accurately classify the resume of any person into its corresponding job with a tremendously high accuracy of more than 99%.

nlp exploratory-data-analysis word-embeddings model-evaluation text-preprocessing bert-model text-tokenization fine-tuning-bert

Updated Jan 13, 2023
Jupyter Notebook

victoryosiobe / kingchop

Star

Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.

nodejs javascript natural-language-processing text-processing sentence-tokenizer text-tokenization word-tokenizer tokenizers paragraph-tokenizer

Updated Jan 22, 2024
JavaScript

Software-Research-Lab / dropsuit-tok

Star

The tok function is a JavaScript and Node.js function that processes object instances and tokenizes text arrays. It returns tokenized words number, tokenized words array, and tokenized words concatenated string. It's part of the open-source DropSuit NLP library under the Apache License 2.0.

text-analysis text-processing language-understanding text-tokenization

Updated May 1, 2023
JavaScript

SayamAlt / News-Category-Classification

Star

Successfully developed a news category classification model using fine-tuned BERT which can accurately classify any news text into its respective category i.e. Politics, Business, Technology and Entertainment.

nlp text-classification exploratory-data-analysis feature-engineering model-evaluation text-cleaning text-preprocessing bert-embeddings text-tokenization fine-tuning-bert

Updated Jan 17, 2023
Jupyter Notebook

SayamAlt / Customer-Support-Chatbot-using-NLTK

Star

Successfully developed a chatbot model which can provide accurate and concise responses to a wide variety of customer queries regarding the services offered by a particular company as well as general topics.

nlp deep-neural-networks deep-learning nltk chatbots text-tokenization

Updated Mar 29, 2023
Python

SayamAlt / Global-News-Headlines-Text-Summarization

Star

Successfully established a text summarization model using Seq2Seq modeling with Luong Attention, which can give a short and concise summary of the global news headlines.

natural-language-processing text-generation text-summarization attention-mechanism seq2seq-model luong-attention text-tokenization model-inference model-architecture-and-implementation data-exploration-and-preprocessing

Updated May 6, 2024
Jupyter Notebook

SayamAlt / Financial-News-Sentiment-Analysis

Star

Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.

natural-language-processing sentiment-analysis multiclass-classification text-preprocessing text-tokenization distilbert-model hugging-face-transformers fine-tune-bert-tensorflow model-inference model-architecture-and-implementation model-training-and-evaluation data-exploration-and-preprocessing

Updated May 6, 2024
Jupyter Notebook

SayamAlt / Symptoms-Disease-Text-Classification

Star

Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.

natural-language-processing text-classification exploratory-data-analysis multiclass-classification text-preprocessing text-tokenization bert-fine-tuning hugging-face-transformers fine-tune-bert-tensorflow model-inference model-architecture-and-implementation model-training-and-evaluation data-exploration-and-preprocessing

Updated May 6, 2024
Jupyter Notebook

cedrickchee / tokenizers

Star

💥Fast State-of-the-Art Tokenizers optimized for Research and Production

nlp natural-language-processing transformers gpt language-model bert natural-language-understanding text-tokenization

Updated Jan 15, 2020
Rust

SayamAlt / English-to-Spanish-Language-Translation-using-Seq2Seq-and-Attention

Star

Successfully established a Seq2Seq with attention model which can perform English to Spanish language translation up to an accuracy of almost 97%.

natural-language-processing language-translation exploratory-data-analysis text-generation neural-machine-translation attention-model attention-is-all-you-need text-preprocessing luong-attention text-tokenization seq2seq-modeling fine-tuning-bert bert-transformer hugging-face-transformers model-inference model-architecture-and-implementation model-training-and-evaluation

Updated May 6, 2024
Jupyter Notebook

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Star

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

tokenizer text-extraction requests data-extraction beautifulsoup text-processing tokenization stemming lemmatization stopwords-removal text-cleaning text-normalization extract-html text-tokenization text-lemmatization

Updated Apr 5, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the text-tokenization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-tokenization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-tokenization

Here are 13 public repositories matching this topic...

alasdairforsythe / tokenmonster

twardoch / split-markdown4gpt

SayamAlt / Resume-Classification-using-fine-tuned-BERT

victoryosiobe / kingchop

Software-Research-Lab / dropsuit-tok

SayamAlt / News-Category-Classification

SayamAlt / Customer-Support-Chatbot-using-NLTK

SayamAlt / Global-News-Headlines-Text-Summarization

SayamAlt / Financial-News-Sentiment-Analysis

SayamAlt / Symptoms-Disease-Text-Classification

cedrickchee / tokenizers

SayamAlt / English-to-Spanish-Language-Translation-using-Seq2Seq-and-Attention

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Improve this page

Add this topic to your repo