This notebook is adapted from the "how-to-use-scivision" notebook in the repository https://github.com/scivision-gallery/scivision_examples to be used in the Turing REG/RDS Connections Workshop.
The purpose is to use the honeybee dataset and pretrained model that will have been added to the Scivision catalog.

In [None]:
from scivision import default_catalog, load_pretrained_model

Construct a dataframe containing the contents of the models catalog:

In [None]:
models_catalog = default_catalog.models.to_dataframe()
models_catalog

Our newly-added "bee-species" model is there as the last entry.  We can use the "url" column, specifying the Github repo that contains the model, to load the model itself: 

In [None]:
# Inspecting model entry and its metadata in the default catalog
model_repo = models_catalog[models_catalog.name == "bee-species"].url.item()
model = load_pretrained_model(model_repo, model='efficientNetB3', allow_install=True)

How did we know to use "efficientNetB3" as the model name in the command above?   Good question!!  This is the name of the class defined in `model.py` in the model_repo.  Not sure if that name is available through the scivision interface.

Now we will search the catalog again for data sources that are compatible with our model.   By "compatible", we mean that the dataset is suitable for the "tasks" that the model can perform.

In [None]:
compatible_datasources = default_catalog.compatible_datasources("bee-species").to_dataframe()
compatible_datasources

Happily, our newly-added bee dataset is there.  We can use the "url" column to load the dataset:

In [None]:
target_datasource = compatible_datasources.loc[compatible_datasources['name'] == 'data-007']
data_url = target_datasource['url'].item()

In [None]:
from scivision import load_dataset
data_config = load_dataset(data_url)
data_config

Now, using Intake and Dask, we get the images themselves in a form that we can iterate over.

In [None]:
images = data_config.honeybee().to_dask()

We can take a look at an example image using matplotlib's imshow function:

In [None]:
import matplotlib.pyplot as plt
plt.imshow(images[44])

And finally, we can run our model on this image, to see what category it predicts for it:

In [None]:
## This step seems pretty flaky - if we first try to predict in image that is bigger than (70,70), it complains.  
## if we then do one that is smaller than (70,70) and then do one any size, it seems to work! :( 
model.predict(images[0])

In fact, for this dataset, we do also have the true labels in a CSV file along with the image zipfile in Zenodo.
We can download this, and could potentially use it to see how often our model got the correct label.


In [None]:
import pandas as pd
df = pd.read_csv("https://zenodo.org/record/7101934/files/bee_data.csv?download=1")
true_labels = df.subspecies.values