Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 8, 2024 - Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
🧹 Python package for text cleaning
Tools for cleaning and normalizing text data
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
NLP预/后处理工具。
Text preprocessing tools in python.
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
Korean text data preprocess toolkit for NLP
A Python package to get useful information from documents using TopicRank Algorithm.
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
Remove extra whitespace from text.
Common Text Pre-Processing for Portuguese
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker
Corpora and scripts for cleaning political science texts. Scripts are translated into transformations that support SAGE Texti.
Add a description, image, and links to the text-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the text-cleaning topic, visit your repo's landing page and select "manage topics."