<a href="https://colab.research.google.com/github/BumaranChe/GRADIO_Introduction_DBSCAN_HIRARCHIALClustering_K-Mean_Clustering/blob/main/Gradio_Introduction_DBSCAN_HIERARCHIAL_K_Mean_Clustering(GitHub)_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#What is Gradio?
Gradio is an open-source Python library that allows you to quickly create customizable UI components for your machine learning models or any Python function. It automatically generates an intuitive web interface, making it easy for anyone to interact with your code without needing to write any frontend code.

#Core Concepts
* gradio.Interface: This is the main class you'll use. It takes a function, input components, and output components to build the UI.
* Input Components: These define how users will provide input to your function (e.g., text boxes, image uploaders, sliders).
* Output Components: These define how the results from your function will be displayed (e.g., text, images, plots).
* launch(): This method starts the Gradio app, making it accessible via a local URL and optionally a public shareable link.

`pip install gradio if you are not able to import import gradio`

#Step 1 : Import Gradio

In [None]:
import gradio as gr

#Step 2: Write your Python function

In [None]:
def greet(name):
  return f"Hello,{name}."

In [None]:
greet("Bumaran")

'Hello,Bumaran.'

#Step 3 : Create a Gradio Interface

In [None]:
#fn: The python function
#Input : What is the datatype you care going to input
#Output : What datatype are your going to generate

iface = gr.Interface(fn=greet,inputs='text',outputs='text')

#Launch the interface

iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://d200010f1dbe2b3477.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#Customize the interface

In [None]:
#gr.Textbox to
# Make input box with 2 lines(lines = 2)
#Add placeholder text
#label to provide input for the text box
#Title: The title for the app
#Provide the description of the app


iface = gr.Interface(fn=greet,
                     inputs=gr.Textbox(lines=1,placeholder="Enter your name here.....",label="Your Name"),
                     outputs=gr.Textbox(label='Greeting',placeholder="Output"),
                     title = 'Simple Greeter App',
                     description='This app takes your name and returns a friendly greeting.')

#Launch the interface

iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://337115840f4e8c570d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#Example 2 : Multiple input and output.(Calculator App)

In [None]:
def calculator(num1, operation, num2):
    if operation == "add":
        return num1 + num2
    elif operation == "subtract":
        return num1 - num2
    elif operation == "multiply":
        return num1 * num2
    elif operation == "divide":
        if num2 == 0:
            return "Cannot divide by zero!"
        return num1 / num2

In [None]:
calculator(2,"add",2)

4

In [None]:
#When we are creating multiple input we need to create a list -> Input = []
#gr.Number -> For numerical input
#gr.Radio-> Create radio buttons
#The conditions in the functions can be translated into the radio button


iface = gr.Interface(
    fn = calculator,
    inputs=[gr.Number(label = 'First Number'),
            gr.Radio(['add','subtract','multiply','divide'],label = "Operation"),
            gr.Number(label='Second Number')],
    outputs = gr.Number(label='Output'),
    title = 'Simple Calculator',
    description='Perform basic arithmatic operations.'
)

iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://d6979f77cd88778ad3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#Example 3 : Gradio app that performs K-Means Clustering on the Iris dataset.

In [None]:
from sklearn.datasets import load_iris#Inport the iris dataset
from sklearn.cluster import KMeans#Import the k-means clustering algo
import matplotlib.pyplot as plt
import pandas as pd





In [None]:
# Load Iris dataset

iris = load_iris()
df = pd.DataFrame(iris.data,columns = iris.feature_names)

In [None]:
#Create a function to perform K-Means cluster and plot the cluster

def kmean_clustering(n_clusters,x_axis,y_axis):
  kmeans = KMeans(n_clusters=n_clusters,random_state=0)
  kmeans.fit(df)

  df['Cluster'] = kmeans.labels_

  # Plotting
  plt.figure(figsize=(8, 5))
  for cluster in range(n_clusters):
        clustered_data = df[df['Cluster'] == cluster]
        plt.scatter(clustered_data[x_axis], clustered_data[y_axis], label=f'Cluster {cluster}')

  plt.scatter(kmeans.cluster_centers_[:, iris.feature_names.index(x_axis)],
                kmeans.cluster_centers_[:, iris.feature_names.index(y_axis)],
                color='black', marker='x', s=200, label='Centroids')

  plt.xlabel(x_axis)
  plt.ylabel(y_axis)
  plt.title(f'K-Means Clustering with {n_clusters} Clusters')
  plt.legend()
  plt.grid(True)

    # Save plot
  plt.savefig('cluster_plot.png')
  plt.close()

  return 'cluster_plot.png'

* Slider -> Creates the widget for slider.
* 1,10  -> Minimum and maximum values of the slider .
* value = 3 -> The default of the slider.
* Step = 1 -> This defines the increment or decrement step when the user moves the slider.
* label -> Label the slider.

In [None]:
# Gradio Interface
feature_list = iris.feature_names#Get all the feature names

interface = gr.Interface(
    fn = kmean_clustering,
    inputs = [
        gr.Slider(1,10,value=3,step=1,label = "Number of Clusters"),
        gr.Dropdown(feature_list,value = feature_list[0],label='X-axis Feature'),
        gr.Dropdown(feature_list, value=feature_list[1],label = 'Y-axis Feature')
    ],
    outputs=gr.Image(type='filepath'),
    title='K-Means Clustering on Iris Dataset',
    description="Select the number of clusters and features to visulize K-Means clustering on Iris dataset. "
)

interface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7074e6868295311286.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# Assignment -> Modify the code to implement dimension reduction with other unsupervised example.

# **Using DBSCAN, PCA and GRADIO**

# 1.0 Import necessary libraries

In [None]:
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.cluster import DBSCAN
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np
import gradio as gr
from PIL import Image
import io
from mpl_toolkits.mplot3d import Axes3D

# 2.0 Load dataset

In [None]:
iris = load_iris()
df = pd.DataFrame(iris.data,columns = iris.feature_names)

In [None]:
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [None]:
X=df.values

# 3.0 Create a function which includes parameter eps, min_samples, pca_components

In [None]:
def dbscan_iris(eps, min_samples, pca_components):
    # Step 1: Scale the data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Step 2: Apply PCA
    pca = PCA(n_components=pca_components)
    X_pca = pca.fit_transform(X_scaled) #X_pca the new data frame now

    # Step 3: Apply DBSCAN
    dbscan = DBSCAN(eps=eps, min_samples=min_samples)
    labels = dbscan.fit_predict(X_pca)

    # Step 4: Plot
    fig = plt.figure(figsize=(6, 4))

    if pca_components == 3:
        ax = fig.add_subplot(111, projection='3d')
        for label in np.unique(labels): # This line loops through each unique cluster label found in labels array, which is the output of DBSCAN
            cluster = X_pca[labels == label] # it returns the rows in X_pca where the label is same as labels array, this line helps to group the data by cluster
            if label == -1:
                ax.scatter(cluster[:, 0], cluster[:, 1],cluster[:, 2], color='grey', marker='x', label='Noise')#######333
            else:
                ax.scatter(cluster[:, 0], cluster[:, 1], cluster[:, 2], label=f'Cluster {label}')#cluster becomes the data frame now
        ax.set_xlabel("PCA 1")
        ax.set_ylabel("PCA 2")
        ax.set_zlabel("PCA 3")
    else:
        ax = fig.add_subplot(111)
        for label in np.unique(labels): # This line loops through each unique cluster label found in labels array, which is the output of DBSCAN
            cluster = X_pca[labels == label] # it returns the rows in X_pca where the label is same as labels array, this line helps to group the data by cluster
            if label == -1:
                ax.scatter(cluster[:, 0], cluster[:, 1], color='grey', marker='x', label='Noise')#######333
            else:
                ax.scatter(cluster[:, 0], cluster[:, 1], label=f'Cluster {label}')# cluster becomes the data frame now
        ax.set_xlabel("PCA 1")
        ax.set_ylabel("PCA 2")

    ax.set_title("DBSCAN Clustering with PCA")
    ax.legend()

    # Step 5: Save plot to buffer
    buf = io.BytesIO()
    plt.savefig(buf, format='png')
    buf.seek(0)
    plt.close(fig)  # Close the figure properly
    img = Image.open(buf)

    # Step 6: Return result
    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    return f"Number of clusters: {n_clusters}", img

This function plots 3d (3 Dimmension) if pca_components 3 selected via GRADIO interface. This function does not have x-axes and y-axes parameter. So, we cannot select the features. By default this function plots PCA1, PCA2 and PCA3 (as the name given in x-axes, y-axes and z-axes). If other than 3 pca_components selected, it will plot PCA1 (as x-axes) and PCA2 (as y-axes). The code is written in this way because after PCA (Dimension Reduction) we only able to view either 2D or 3D. So, PCA4 cannot be viewed together in one plot (2D or 3D) .

eps = Epsilon = Neighborhood radius. It defines how close points need to be to each other to be considered as part of the same cluster

min_samples = The minimum number of data points required within the eps radius for a point to be considered a core point.

pca_components = The number of principle components you want to keep or reduce your data.

Number of cluster is auto-calculated by DBSCAN function

# 4.0 Launch GRADIO

In [None]:
# Gradio Interface
demo = gr.Interface(
    fn=dbscan_iris,
    inputs=[
        gr.Slider(0.1, 2.0, value=0.5, label="DBSCAN eps"),
        gr.Slider(2, 10, value=5, label="DBSCAN min_samples", step=1),
        gr.Slider(1, 4, value=2, label="PCA components", step=1),
    ],
    outputs=[
        gr.Text(label="Message"),
        gr.Image(label="Cluster Plot")
    ],
    title="DBSCAN + PCA Clustering (Iris Dataset)",
    description="Adjust DBSCAN and PCA parameters to explore how clustering works on Iris dataset."
)

demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://75892186b7e8a263d2.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# **Another method DBSCAN + PCA + GRADIO**

# 1.0 Import necessary libraries and load data

In [None]:
import gradio as gr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import DBSCAN
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# 2.0 Function for DBSCAN clustering with PCA

In [None]:
# DBSCAN + PCA + plotting function
def dbscan_clustering_with_pca(eps, min_samples, x_axis, y_axis):
    # Step 1: Scale
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(df)

    # Step 2: PCA
    pca = PCA(n_components=4)
    X_pca = pca.fit_transform(X_scaled)
    pca_df = pd.DataFrame(X_pca, columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'])#pca_df is the ne data frame now

    # Step 3: DBSCAN
    dbscan = DBSCAN(eps=eps, min_samples=min_samples)
    labels = dbscan.fit_predict(pca_df)
    pca_df['Cluster'] = labels # adding labels into data frame

    # Step 4: Plot
    plt.figure(figsize=(8, 5))
    unique_labels = set(labels)# this line gets all the unique cluster labels from labels array (outut of DBSCAN) by converting it into a set(wich automatically removes duplicate)
    for label in unique_labels:# This line loops through each unique cluster label found in labels array, which is the output of DBSCAN
        clustered_data = pca_df[pca_df['Cluster'] == label]# it returns the rows in pca_df where the label is same as labels array, this line helps to group the data by cluster
        if label == -1:
            plt.scatter(clustered_data[x_axis], clustered_data[y_axis], color='grey', marker='x', label='Noise')# clustered_data becomes the new data frame now
        else:
            plt.scatter(clustered_data[x_axis], clustered_data[y_axis], label=f'Cluster {label}') # it gives option to select the features named above
                                                                                                  #--> pca_df = pd.DataFrame(X_pca, columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'])
    plt.xlabel(x_axis)
    plt.ylabel(y_axis)
    plt.title("DBSCAN Clustering with PCA")
    plt.legend()
    plt.grid(True)

    # Step 5 : Save plot
    plot_path = "dbscan_plot.png"
    plt.savefig(plot_path)
    plt.close()

    # Step 6: Cluster info
    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    n_noise = list(labels).count(-1)
    info = f"Clusters: {n_clusters} | Noise points: {n_noise}"

    return info, plot_path

In this function we use parameter eps, min_samples, x-axes, y-axes. We use x_axes and y_axes so that we can select the features (PCA1,PCA2,PCA3,PCA4) and plot the graph accorrding it. Normally we don't choose the least important principle components (ex:PC4). But in above code we allow to choose whichever PCA that can be plotted against.

# 3.0 Launch GRADIO

In [None]:
# Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("## DBSCAN Clustering with PCA (Iris Dataset)")

    with gr.Row():
        eps = gr.Slider(0.1, 2.0, value=0.5, step=0.1, label="Epsilon (eps)")
        min_samples = gr.Slider(1, 10, value=5, step=1, label="Min Samples")

    with gr.Row():
        x_axis = gr.Dropdown(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], value='sepal length (cm)', label="X Axis (PCA)")
        y_axis = gr.Dropdown(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], value='sepal width (cm)', label="Y Axis (PCA)")

    btn = gr.Button("Run DBSCAN")
    output_text = gr.Text(label="Clustering Info")
    output_img = gr.Image(type='filepath', label="Cluster Plot")

    btn.click(fn=dbscan_clustering_with_pca,
              inputs=[eps, min_samples, x_axis, y_axis],
              outputs=[output_text, output_img])

# To launch the app
demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://2263e6745ece398129.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# **Using Hierarchial Clustering, PCA and GRADIO**

# 1.0 Import necessary libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import gradio as gr

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from scipy.cluster.hierarchy import linkage, fcluster
from sklearn.metrics import silhouette_score

# 2.0 Load dataset

In [None]:
iris = load_iris()
df = pd.DataFrame(iris.data,columns = iris.feature_names)

In [None]:
X=df.values

# 3.0 Create function

In [None]:
def hierarchical_clustering(n_clusters, pca_components):

    #1.Scale the data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    #2.Apply PCA to reduce features
    pca = PCA(n_components=pca_components)
    X_pca = pca.fit_transform(X_scaled) # it becomes the new data frame
    pca_columns = [f"PC{i+1}" for i in range(pca_components)]# it creates a list of strings
    df_pca = pd.DataFrame(X_pca, columns=pca_columns) # it becomes the new data frame now

    #3.Hierarchical clustering
    Z = linkage(X_pca, method='ward')
    selected_labels = fcluster(Z, t=n_clusters, criterion='maxclust')
    df_pca["Cluster"] = selected_labels

    #4.estimate number of clustering(silhouette method) # this code here counts the best number of clusters for hierarechial clustering
    best_k = 0
    best_score = -1
    for k in range(2, 10):
            temp_labels = fcluster(Z, t=k, criterion='maxclust')
            score = silhouette_score(X_pca, temp_labels)
            if score > best_score:
                best_score = score
                best_k = k

    #5.Plot
    fig=plt.figure(figsize=(8, 5))
    if pca_components == 3:
        ax = fig.add_subplot(111, projection='3d')
        for label in np.unique(selected_labels):# This line loops through each unique cluster label found in selected_labels array, which is the output of hierarchial clustering
            cluster = X_pca[selected_labels == label]# it returns the rows in X_pca where the label is same as labels array, this line helps to group the data by cluster
            ax.scatter(cluster[:, 0], cluster[:, 1], cluster[:, 2], label=f'Cluster {label}')# cluster becomes a new data frame now
        ax.set_xlabel("PCA 1")
        ax.set_ylabel("PCA 2")
        ax.set_zlabel("PCA 3")
    else:
        ax = fig.add_subplot(111)
        for label in np.unique(selected_labels):
            cluster = X_pca[selected_labels == label]
            ax.scatter(cluster[:, 0], cluster[:, 1], label=f'Cluster {label}')# cluster becomes a new data frame now
        ax.set_xlabel("PCA 1")
        ax.set_ylabel("PCA 2")

    ax.set_title("Hirarchial Clustering with PCA")
    ax.legend()

    # Step 6: Save to buffer
    buf = io.BytesIO()
    plt.savefig(buf, format='png')
    buf.seek(0)
    plt.close(fig)
    img = Image.open(buf).convert("RGB")  # Ensure PIL Image in RGB


    # Step 7: Return result
    n_clusters = len(set(selected_labels)) - (1 if -1 in selected_labels else 0)
    return (f'Selected number of clustering {n_clusters}',f'Best number of cluster {best_k}',img)

In this function, we have 2 parameters which are n_clusters and pca_components. In hierarchial clustering, we need to indentify the number of clusters. In the code above, I simply use silhouette_score to find out the number of clusters and PCA. We can select pca components(either from 1 to 4) but it will only display 3d plot if pca_components 3 is selected else it will only plot PCA1 and PCA2. If pca_components 1 is selected it will prompt error.

# 4.0 Launch GRADIO

In [None]:
with gr.Blocks() as demo:
    gr.Markdown("### Hierarchical Clustering with PCA and 2D/3D Plot")

    with gr.Row():
        n_clusters = gr.Slider(2, 10, value=3, step=1, label="Number of Clusters")
        pca_components = gr.Slider(1, 4, value=2, step=1, label="PCA Components")

    btn = gr.Button("Run Hierarchical Clustering")

   # output_text = gr.Textbox(label="Cluster Info")
    output_text1 = gr.Text(label="Selected number of clustering")
    output_text2 = gr.Text(label="Best number of cluster: ")
    output_img = gr.Image(type="pil", label="Cluster Plot")

    btn.click(fn=hierarchical_clustering,
              inputs=[n_clusters, pca_components],
              outputs=[output_text1, output_text2, output_img])

demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://b8258c6bb380841c92.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Since we apply PCA dimension reduction method, it shows that the best number of cluster is 2. Based on iris data set, there should be only 3 clusters. Although we use ward (it should be minimizing the variance within clusters) but in our case the closest cluster variances decreases after dimension reduction (PCA).

# **Another method using Hirerachical clustering + PCA + GRADIO**

# 1.0 Import necessary libraries and load data

In [16]:
import gradio as gr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import AgglomerativeClustering

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# 2.0 Function

In [19]:
# Function: hierarchical clustering + PCA + plotting
def hierarchical_clustering(n_clusters, x_axis, y_axis):
    # Step 1: Standardize
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(df)

    # Step 2: PCA
    pca = PCA(n_components=4)
    X_pca = pca.fit_transform(X_scaled)
    pca_df = pd.DataFrame(X_pca, columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'])# it becomes the new data frame now

    # Step 3: Hierarchical clustering
    model = AgglomerativeClustering(n_clusters=n_clusters)
    labels = model.fit_predict(pca_df)
#    labels = labels+1 # +1 ensures label start with cluster 1
    pca_df['Cluster'] = labels

    # Step 4: Plot
    plt.figure(figsize=(8, 5))
    for cluster in range(n_clusters):#loops through all cluster labels from 0 to n_cluster-1 (cluster wise filtering)
        clustered_data = pca_df[pca_df['Cluster'] == cluster]#filters the data frame pca_df to only include rows where the 'Cluster' columns equals the current cluster number
        plt.scatter(clustered_data[x_axis], clustered_data[y_axis], label=f'Cluster {cluster}')# clustered_data becomes new data frame now

    plt.xlabel(x_axis)
    plt.ylabel(y_axis)
    plt.title(f'Hierarchical Clustering with {n_clusters} Clusters')
    plt.legend()
    plt.grid(True)

    plot_path = "hierarchical_plot.png"
    plt.savefig(plot_path)
    plt.close()

    return plot_path

# 3.0 Launch GRADIO

In [20]:
# Gradio Interface
with gr.Blocks() as demo:
    gr.Markdown("## Hierarchical Clustering + PCA (Iris Dataset)")

    with gr.Row():
        n_clusters = gr.Slider(2, 10, value=3, step=1, label="Number of Clusters")

    with gr.Row():
        x_axis = gr.Dropdown(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], value='sepal length (cm)', label="X Axis (PCA)")
        y_axis = gr.Dropdown(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], value='sepal width (cm)', label="Y Axis (PCA)")

    btn = gr.Button("Run Clustering")
    output_img = gr.Image(type='filepath', label="Cluster Plot")

    btn.click(fn=hierarchical_clustering,
              inputs=[n_clusters, x_axis, y_axis],
              outputs=output_img)

demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://2d5b4788be7d5982cc.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Meanwhile for Hirerachial clustering, we need to determine the number of clustering.  Hirerachial clustering computes distance between all clusters (typically using Euclidean distance), then merges two closest cluster according to linkage. For this function, we are using 3 parameters which are n_cluster, x_axes and y_axes. We need to deterimine the number of cluster and it will plot automatically. We need to select PCA type (features type for x_axes and y_axes) too before it auto plots the data points.

# **Another method using hirarchial clustering (linkage)**

# **1.0 Import necessary libraries**

In [13]:
import gradio as gr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from scipy.cluster.hierarchy import linkage, fcluster

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# **2.0 Create function**

In [14]:
def hierarchical_clustering(n_clusters, x_axis, y_axis):
    # Step 1: Standardize
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(df)

    # Step 2: PCA
    pca = PCA(n_components=4)
    X_pca = pca.fit_transform(X_scaled)
    pca_df = pd.DataFrame(X_pca, columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'])# it becomes the new data frame now

    # Step 3: Hierarchical clustering
    Z = linkage(X_pca, method='ward')
    labels = fcluster(Z, t=n_clusters, criterion='maxclust')
   # labels=labels+1
    pca_df['Cluster'] = labels

    # Step 4: Plot
    plt.figure(figsize=(8, 5))
    for cluster in sorted(pca_df['Cluster'].unique()):
        clustered_data = pca_df[pca_df['Cluster'] == cluster]#filters the data frame pca_df to only include rows where the 'Cluster' columns equals the current cluster number
        plt.scatter(clustered_data[x_axis], clustered_data[y_axis], label=f'Cluster {cluster}')# clustered_data becomes new data frame now

    plt.xlabel(x_axis)
    plt.ylabel(y_axis)
    plt.title(f'Hierarchical Clustering with {n_clusters} Clusters')
    plt.legend()
    plt.grid(True)

    plot_path = "hierarchical_plot.png"
    plt.savefig(plot_path)
    plt.close()

    return plot_path

# **3.0 Launch GRADIO**

In [15]:
# Gradio Interface
with gr.Blocks() as demo:
    gr.Markdown("## Hierarchical Clustering + PCA (Iris Dataset)")

    with gr.Row():
        n_clusters = gr.Slider(2, 10, value=3, step=1, label="Number of Clusters")

    with gr.Row():
        x_axis = gr.Dropdown(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], value='sepal length (cm)', label="X Axis (PCA)")
        y_axis = gr.Dropdown(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], value='sepal width (cm)', label="Y Axis (PCA)")

    btn = gr.Button("Run Clustering")
    output_img = gr.Image(type='filepath', label="Cluster Plot")

    btn.click(fn=hierarchical_clustering,
              inputs=[n_clusters, x_axis, y_axis],
              outputs=output_img)

demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://4edf7c363dbd3b9058.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


