# The Iris dataset
This notebook uses the iris dataset included with sktlearn package 
https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor).
Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. 

In [1]:
from sklearn.datasets import load_iris
import numpy as np
from floatview import GlueManagerWidget
import pandas as pd
iris = load_iris()
data = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
data

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2.0
146,6.3,2.5,5.0,1.9,2.0
147,6.5,3.0,5.2,2.0,2.0
148,6.2,3.4,5.4,2.3,2.0


GlueManagerWidget has a modal mode, this mode disable plots from being included into the main cell, display_console will disable the main GUI to create new plots

In [2]:
gmw = GlueManagerWidget(data, modal=True, label="IrisDataSet", display_console=False)

# Views

## Scatter plots

First two scatter plots are created to compare petal and sepal properties

In [3]:
view = gmw.gluemanager.newView(
    "scatter",
    ["petal length (cm)", "petal width (cm)"],
    "Scatter"
)

HBox(children=(ToggleButton(value=False, description='Options', icon='cog'), Button(description='Close', icon=…

HBox(children=(Tab(children=(Output(), Output()), layout=Layout(display='none'), _titles={'0': 'Plot', '1': 'L…

In [4]:
view = gmw.gluemanager.newView(
    "scatter",
    ["sepal length (cm)", "sepal width (cm)"],
    "Scatter"
)

HBox(children=(ToggleButton(value=False, description='Options', icon='cog'), Button(description='Close', icon=…

HBox(children=(Tab(children=(Output(), Output()), layout=Layout(display='none'), _titles={'0': 'Plot', '1': 'L…

## Table

A table is added to display all information on the dataset

In [5]:
view = gmw.gluemanager.newView(
    "table",
    ["petal length (cm)", "petal width (cm)", "sepal length (cm)", "sepal width (cm)", "target"],
    "Table"
)

HBox(children=(ToggleButton(value=False, description='Options', icon='cog'), Button(description='Close', icon=…

HBox(children=(Tab(children=(Output(), Output()), layout=Layout(display='none'), _titles={'0': 'Plot', '1': 'L…

## Sunburst 
Sunburst visualization allows interactive exploration on n components on the dataset

In [6]:
view = gmw.gluemanager.newView(
    "sunburst",
    ["petal length (cm)", "petal width (cm)", "sepal length (cm)", "sepal width (cm)", "target"],
    "Sunburst"
);

HBox(children=(ToggleButton(value=False, description='Options', icon='cog'), Button(description='Close', icon=…

HBox(children=(Tab(children=(Output(), Output()), layout=Layout(display='none'), _titles={'0': 'Plot', '1': 'L…

## Parallel

Parallel views also allow visualization of multiple components on the dataset

In [7]:
view = gmw.gluemanager.newView(
    "parallels",
    ["petal length (cm)", "petal width (cm)", "sepal length (cm)", "sepal width (cm)", "target"],
    "Parallel"
);

HBox(children=(ToggleButton(value=False, description='Options', icon='cog'), Button(description='Close', icon=…

HBox(children=(Tab(children=(Output(), Output()), layout=Layout(display='none'), _titles={'0': 'Plot', '1': 'L…

# Creating subsets

Lets create 3 clusters using the Agglomerative Clustering algorithm

In [8]:
from sklearn.cluster import AgglomerativeClustering
n_clusters = 3
cluster = AgglomerativeClustering(n_clusters=n_clusters, affinity='euclidean', linkage='ward') 
df = gmw.gluemanager.data.to_dataframe()
diff = (df.columns.difference(["petal length (cm)", "petal width (cm)", "sepal length (cm)", "sepal width (cm)"]))
cluster.fit_predict(df.drop(diff, axis=1)) 
for i in range(n_clusters):
  state = np.nonzero(cluster.labels_== i)
  gmw.gluemanager.updateSelection(state)
  gmw.gluemanager.createSubsetFromSelection(label='cluster'+str(i))

