limit1
and limit2
.limit1
to 1 and limit2
to sample size.fix_outliers
allows to label outliers to their closest clusters via mstree edges.max_ranking
controls precision vs productivity balance, after some value the precision and the result would not change.algorithm
can be set to 'slow' to further enhance the precision.import sklearn.datasets as datasets
import druhg
iris = datasets.load_iris()
XX = iris['data']
clusterer = druhg.DRUHG(max_ranking=50)
labels = clusterer.fit(XX).labels_
It will build the tree and label the points. Now you can manipulate clusters by relabeling.
labels = dr.relabel(limit1=1, limit2=len(XX)/2, fix_outliers=1)
ari = adjusted_rand_score(iris['target'], labels)
print ('iris ari', ari)
It will relabel the clusters, by restricting their size.
from druhg import DRUHG
import matplotlib.pyplot as plt
import pandas as pd, numpy as np
XX = pd.read_csv('chameleon.csv', sep='\t', header=None)
XX = np.array(XX)
clusterer = DRUHG(max_ranking=200)
clusterer.fit(XX)
plt.figure(figsize=(30,16))
clusterer.minimum_spanning_tree_.plot(node_size=200)
It will draw mstree with druhg-edges.
max_ranking
that can be used to decrease for a better performance.PyPI install, presuming you have an up to date pip:
pip install druhg
The package tests can be run after installation using the command:
pytest -s druhg
or
python -m pytest -s druhg
The tests may fail :-D
The druhg library supports both Python 2 and Python 3.
We welcome contributions in any form! Assistance with documentation, particularly expanding tutorials, is always welcome. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.
The druhg package is 3-clause BSD licensed.