## Balanced Iterative Reducing and Clustering using Hierarchies

**BIRCH and CHAMELEON both can be categorized under hierarchical clustering.**

BIRCH is used for large dataset that can't fit into the memory<br>
**BIRCH scales linearly : finds a good clustering with a single scan and improves the quality with a few additional scans**

**The Idea :** is to create a tree structure called **CLUSTERING FEATURE TREE**
* Incrementally construct a CF-Tree holding information for rough heirarchical-clustering and fine-clustering


**Clustering Feature (CF)**
BIRCH attempts to minimize the memory requirements of large datasets by summarizing the information contained in dense regions as Clustering Feature (CF) entries.
* In simple terms CF is a set of summary statistics that can be used to build a CF tree


**Phases in BIRCH**
* Phase - 1
    * Scan the DB to build an initial in-memory CF tree  [ *`hierarchical clustering`* ]
    * The leaf nodes of CF-tree holds many small and tight clusters  [ *`Data reduction & clustering`* ]
* Phase - 2
    * Use other clustering algorithms to cluster small and tight clusters
    * Merge dense clusters
    * and/or remove outliers




## Clustering Feature vector in BIRCH

CF = ( N , LS , SS )
* L
    * Number of data points
* LS
    * Linear sum of `N` points
* SS
    * Square sum of `N` points

<img src='./notes/notes-1.jpg'>

<img src='./notes/notes-2.jpg'>

## CF TREE

The CF-tree is a very compact representation of the dataset because each entry in a leaf node is not a single data point but a subcluster. Each nonleaf node contains at most B entries. 
* CF-Tree hold CF-vector but no raw data

**Hyper-parameters**
* `T` : cluster diameter, Threshold for leaf entry
* `B` : Braching factor , length of an internal node
* `L` : Length of a leaf node

<img src='./notes/notes-3.jpg'>

In [44]:
from itertools import cycle

import numpy as np
from matplotlib import pyplot as plt
import matplotlib.colors as colors

from sklearn.datasets import make_blobs
from sklearn.cluster import Birch

#### Generate some data

In [17]:
X, labels = make_blobs(n_samples=500, n_features=2, centers=6, cluster_std=0.7, random_state=0)
plt.scatter(X[:, 0], X[:, 1]);

<img src='./plots/sample-data.png'>

**BIRCH**


* threshold
    * The radius of the subcluster obtained by merging a new sample and the closest subcluster should be lesser than the threshold. Otherwise a new subcluster is started.
    * `default=0.5`
* branching_factor
    * Maximum number of CF subclusters in each node.
    * If a new samples enters such that the number of subclusters exceed the branching_factor then that node is split into two nodes with the subclusters redistributed in each.
    * `default=50`
* n_clusters
    * Number of clusters after the final clustering step, which treats the subclusters from the leaves as new samples.
    * If you give `None` for n_clusters then the final clustering step is not performed and the subclusters are returned as they are.
    * If a model is provided, the model is fit treating the subclusters as new samples and the initial data is mapped to the label of the closest subcluster.
    * If you give an interger as input then the model fit is AgglomerativeClustering with n_clusters set to be equal to the integer.
    * `default=3` :  AgglomerativeClustering with n_clusters set to `3`


In [18]:
birch = Birch(threshold=0.5, branching_factor=50, n_clusters=6)
birch.fit(X)
plt.scatter(X[:, 0], X[:, 1], c=birch.labels_);

<img src='./plots/sample-data-clustering.png'>

* BIRCH provides a clustering method for very large datasets. 
* It makes a large clustering problem plausible by concentrating on densely occupied regions, and creating a compact summary. 
* BIRCH can work with any given amount of memory, and the I/O complexity is a little more than one scan of data. 
* Other clustering algorithms can be applied to the subclusters produced by BIRCH.

<img src='./notes/notes-7.jpg'>