# **Introduction**

The BIRCH is a Clustering algorithm in machine learning. It stands for Balanced Reducing and Clustering using Hierarchies. In this article, I will take you through the concept of BIRCH Clustering in Machine Learning and its implementation using Python.

BIRCH is a clustering algorithm in machine learning that has been specially designed for clustering on a very large data set. It is often faster than other clustering algorithms like batch K-Means. It provides a very similar result to the batch K-Means algorithm if the number of features in the dataset is not more than 20.

When training the model using the BIRCH algorithm, it creates a tree structure with enough data to quickly assign each data point to a cluster. By storing all the data points in the tree, this algorithm allows the use of limited memory while working on a very large data set. In the section below, I will take you through its implementation by using the Python programming language.

#**BIRCH Clustering using Python**

The BIRCH algorithm starts with a threshold value, then learns from the data, then inserts data points into the tree. In the process, if it goes out of memory while learning from the data, it increases the threshold value and repeats the process. Now let’s see how to implement BIRCH clustering using Python. I’ll start this task by importing the necessary Python libraries and the dataset:

# **2. Preparing the Data**

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


We’re ready to start building our neural network!



# **3. Building the Model**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

data = pd.read_csv("/content/drive/MyDrive/Datasets/Customer Segmentation /customers.csv")
print(data.head())

The dataset that I am using here is based on customer segmentation. Now let’s prepare the data for implementing the clustering algorithm. Here I will rename the columns for simplicity and then I will only select two columns for implementing the BIRCH clustering algorithm using Python:

In [None]:
data["Income"] = data[["Annual Income (k$)"]]
data["Spending"] = data[["Spending Score (1-100)"]]
data = data[["Income", "Spending"]]
print(data.head())

So we have prepared the data and now let’s import the BIRCH class from the sklearn library in Python and use it on the data and have a look at the results by visualizing the clusters:

In [None]:
from sklearn.cluster import Birch
model = Birch(branching_factor=30, n_clusters=5, threshold=2.5)
model.fit(data)
pred = model.predict(data)
plt.scatter(data["Income"], data["Spending"], c=pred, cmap='rainbow', alpha=0.5, edgecolors='b')
plt.show()

# **References**

[BIRCH Clustering in Machine Learning](https://thecleverprogrammer.com/2021/03/15/birch-clustering-in-machine-learning/)