Text Clustering

This repository contains tools to easily embed and cluster texts in the Tibetan language as well as label clusters and produce visualizations of those labeled clusters.

Install

Install the library to get started:

pip install --upgrade bocluster

Usage

The pipeline can be used following the code block below.

from datasets import load_dataset
from bocluster.cluster import BoClusterClassifier

# load a Tibetan language text dataset
ds = load_dataset('billingsmoore/LotsawaHouse-bo-en', split='train')

# initilialize a BoClusterClassifier object
bcc = BoClusterClassifier()

# fit the classifier on a set of texts
bcc.fit(ds['bo'][:1000])

# if you want to treat all data points as members of clusters, with no data treated as outliers
bcc.classify_outliers()

# show a visualization of results
bcc.show()

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
dist		dist
src/bocluster		src/bocluster
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Clustering

Install

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

billingsmoore/bocluster

Folders and files

Latest commit

History

Repository files navigation

Text Clustering

Install

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages