Skip to content

Commit

Permalink
Merge pull request #13 from dayyass/add_pip
Browse files Browse the repository at this point in the history
add pip install / release v0.1.0
  • Loading branch information
dayyass committed Sep 20, 2021
2 parents b38b67d + 3bee084 commit b5ac873
Show file tree
Hide file tree
Showing 12 changed files with 42 additions and 14 deletions.
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[run]
branch = True
source = graph_clustering
source = graph_based_clustering

[report]
exclude_lines =
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@
venv
.idea
dist

*.egg-info/
14 changes: 6 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,14 @@ pip install graph-based-clustering

### Usage

**graph-based-clustering** has two clustering methods:
- ConnectedComponentsClustering
- SpanTreeConnectedComponentsClustering

Both of these methods has sklearn-like `fit/fit_predict` interface.
The library has sklearn-like `fit/fit_predict` interface.

#### ConnectedComponentsClustering

This method makes pairwise distances matrix on the input data, uses *threshold* (parameter given by the user) to binarize pairwise distances matrix and make undirected graph, and then finds connected components to perform the clustering.
This method computes pairwise distances matrix on the input data, and using *threshold* (parameter provided by the user) to binarize pairwise distances matrix makes an undirected graph in order to find connected components to perform the clustering.

Required arguments:
- **threshold** - threshold to binarize pairwise distances matrix and make undirected graph
- **threshold** - paremeter to binarize pairwise distances matrix and make undirected graph

Optional arguments:
- **metric** - sklearn.metrics.[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html) parameter (default: *"euclidean"*)
Expand All @@ -47,6 +43,7 @@ Optional arguments:
Example:

```python3
import numpy as np
from graph_based_clustering import ConnectedComponentsClustering

X = np.array([[0, 1], [1, 0], [1, 1]])
Expand All @@ -66,7 +63,7 @@ labels_pred = clustering.fit_predict(X)

#### SpanTreeConnectedComponentsClustering

This method makes pairwise distances matrix on the input data, consider this matrix as a graph, finds minimum spanning trees, and finaly, to perform the clustering, makes graph with *n_clusters* (parameter given by the user) connected components by removing *n_clusters - 1* edges with highest weights.
This method computes pairwise distances matrix on the input data, builds a graph on the obtained matrix, finds minimum spanning tree, and finaly, performs the clustering through dividing the graph into *n_clusters* (parameter given by the user) by removing *n-1* edges with the highest weights.

Required arguments:
- **n_clusters** - the number of clusters to find
Expand All @@ -78,6 +75,7 @@ Optional arguments:
Example:

```python3
import numpy as np
from graph_based_clustering import SpanTreeConnectedComponentsClustering

X = np.array([[0, 1], [1, 0], [1, 1]])
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion notebooks/plot_cluster_comparison.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"\n",
"import sys\n",
"sys.path.append(\"..\")\n",
"from graph_clustering import ConnectedComponentsClustering, SpanTreeConnectedComponentsClustering\n",
"from graph_based_clustering import ConnectedComponentsClustering, SpanTreeConnectedComponentsClustering\n",
"\n",
"np.random.seed(0)"
],
Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ coverage==5.5
jupyter==1.0.0
matplotlib==3.4.3
numpy==1.21.2
pandas==1.3.3
parameterized==0.8.1
pre-commit==2.15.0
scikit-learn==0.24.2
Expand Down
23 changes: 23 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[metadata]
name = graph-based-clustering
version = 0.1.0
author = Dani El-Ayyass
author_email = dayyass@yandex.ru
description = Graph-Based Clustering using connected components and spanning trees
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/dayyass/graph-based-clustering
project_urls =
Bug Tracker = https://github.com/dayyass/graph-based-clustering/issues
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent

[options]
packages = find:
python_requires = >=3.7
install_requires =
numpy >= 1.21.2
scikit-learn >= 0.24.2
scipy >= 1.7.1
9 changes: 6 additions & 3 deletions tests/test_graph_clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,21 @@
from sklearn.metrics import rand_score
from sklearn.preprocessing import StandardScaler

from graph_clustering.check import (
from graph_based_clustering.check import (
_check_matrix,
_check_matrix_is_square,
_check_square_matrix_is_symmetric,
check_adjacency_matrix,
check_symmetric,
)
from graph_clustering.main import (
from graph_based_clustering.main import (
ConnectedComponentsClustering,
SpanTreeConnectedComponentsClustering,
)
from graph_clustering.utils import _pairwise_distances, distances_to_adjacency_matrix
from graph_based_clustering.utils import (
_pairwise_distances,
distances_to_adjacency_matrix,
)

from .utils import prepare_sklearn_clustering_datasets

Expand Down

0 comments on commit b5ac873

Please sign in to comment.