datawords

This is a library oriented to common and uncommon NLP tasks.

Datawords emerge after two years of solving different projects that required NLP techniques like training and saving Word2Vec (Gensim) models, finding entities on text (Spacy), ranking texts (scikit-network), indexing it (Spotify Annoy), translating it (Hugging Face).

Then to use those libraries some pre-processing, post-processing tasks and transformations were also required. For this reasons, datawords exists.

Sometimes it’s very opinated (Indexing happens over text, and not over vectors besides Annoy allows it), and sometimes gives you freedom and provide you with helper classes and functions to use freely.

Another way to see this library is as an agreggator of all that excellent libraries mentioned before.

In a nutshell, Datawords let’s you:

Train Word2Vec models (Gensim)
Build Indexes for texts (Annoy, SQLite)
Translate texts (Transformers)
Rank texts (PageRank)

Table of Contents

Installation
License

Installation

pip install datawords

To use transformes from HuggingFace please do:

pip install datawords[transformers]

Quickstart

deepnlp:

from datawords.deepnlp import translators
mn = translators.build_model_name("es", "en")
rsp = transform_mp("es", "en", model_path=fp, texts=["hola mundo", "adios mundo", "notias eran las de antes", "Messi es un dios para muchas personas"])

License

datawords is distributed under the terms of the MPL-2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
datawords		datawords
docs		docs
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
NOTICE.md		NOTICE.md
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datawords

Installation

Quickstart

License

About

Releases

Packages

Languages

License

algorinfo/datawords

Folders and files

Latest commit

History

Repository files navigation

datawords

Installation

Quickstart

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages