# Automatically organizing a set of images into groups

The KNN classifier makes it easy to organize a previously unknown set of images into groups that are semantically consistent. In this section we will look at a workflow that does just that.
We are going to use images from the USDA Pomological Watercolors dataset. This is a set of almost 7500 watercolor paintings of fruit, painted between 1886 to 1942.

## Data Preparation

In [None]:
import boonsdk
from boonsdk import app_from_env, FileImport

%env BOONAI_APIKEY_FILE = apikey.json
app = app_from_env()

with open('fruits.json', 'r') as file:
        fruits = json.load(file)

files = []
for f in fruits:
    name = f['url'].split('id=')[1]
    imageURL = 'http://naldc-legacy.nal.usda.gov/pom/' + name + '/screen.jpg'
    files.append(FileImport(imageURL))

# Import all the files. For a partial import, comment out this part and uncomment the code below
while files:
    app.assets.batch_import_files(files[:100])
    files = files[100:]

## Preliminary Analysis

We'll use Boonlab to plot a t-sne graph of the assets:

In [None]:
from boonlab.plot import plot_tsne

x = plot_tsne(thumbs=True, nClusters=1, verbose=False)

Since there seem to be six clusters, let's make the plot again, asking it to cluster with nClusters=6:

In [None]:
x = plot_tsne(thumbs=True, nClusters=6, verbose=False)

## Creating and Training the Model

Here we create an empty dataset, then a model linked to this dataset, and then train and apply.

Since we are training a KNN model without labels, the trainer will cluster the assets and then create labels automatically. We set n_clusters to 6 because that was the number of clusters we saw in our analysis above.

In [None]:
# Create an empty dataset
dataset = app.datasets.create_dataset(
    "fruits", 
    boonsdk.DatasetType.Classification
)

# Create a model and attach our empty dataset
model = app.models.create_model(
    "fruit-groups",
    boonsdk.ModelType.KNN_CLASSIFIER,
    dataset=dataset
)

# Launch the training job
training_job = app.models.train_model(model, train_args={'n_clusters': 6})

# Wait for training to complete
app.jobs.wait_on_job(training_job)

# Apply the model to our assets
apply_job = app.models.apply_model(model)

# Wait for apply to complete
app.jobs.wait_on_job(apply_job)


## Visualize Results

Once the apply_model job above is done, we can see the results here. It is also easy and convenient to use the Visualizer in the Boon AI console to see the results.

In [None]:
import ipyplot
from PIL import Image

images = []
labels = []

for group in range(0, 6):
    search = {"size": 6, "query":{"bool":{"must":[{"simple_query_string":{"query":"auto group " + str(group)}}]}}}

    for asset in app.assets.search(search):
        thumbnail = asset.get_files(category="web-proxy")[0]
        images.append(Image.open(app.assets.download_file(thumbnail)))
        labels.append(asset.get_attr("analysis.fruit-groups.predictions")[0]['label'])

ipyplot.plot_images(images, labels, img_width=150)