# TreeOfLifeClassifier Subsetting
The TreeOfLifeClassifier creates predictions based on embeddings for the TOL taxon labels BioCLIP was trained on.

The `get_label_data()` method can be used to see the entire list of taxon label data.

The `apply_filter()` method can be used to limit the embeddings used by passing a __filter__ (boolean list of embeddings to keep).

The `create_taxa_filter()` method can be used to create filters for taxa values within TOL. Unknown taxa values will result in an error.

In [None]:
!pip install pybioclip --quiet

In [1]:
from bioclip import TreeOfLifeClassifier, Rank
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Run simple prediction

In [2]:
classifier = TreeOfLifeClassifier()

In [3]:
preds = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
pd.DataFrame(preds)

Unnamed: 0,file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
0,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.935603
1,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05617
2,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126
3,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.002496
4,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.000501


## Predict for specific families
Use `create_taxa_filter()` to subset the embeddings for two family values.

In [4]:
taxa_filter = classifier.create_taxa_filter(
    Rank.FAMILY,
    ['Ursidae','Felidae']
)
classifier.apply_filter(taxa_filter)
preds = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
pd.DataFrame(preds)

Unnamed: 0,file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
0,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.936124
1,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.056203
2,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004129
3,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.002497
4,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.000501


## Display TOL label data

In [5]:
label_data = classifier.get_label_data()
label_data

Unnamed: 0,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name
0,Animalia,Arthropoda,Insecta,Lepidoptera,Lycaenidae,Orthomiella,rantaizana,Orthomiella rantaizana,Chinese Straight-wing Blue
1,Animalia,Arthropoda,Insecta,Lepidoptera,Lycaenidae,Orthomiella,sinensis,Orthomiella sinensis,
2,Animalia,Arthropoda,Insecta,Lepidoptera,Lycaenidae,Orthomiella,pontis,Orthomiella pontis,Straightwing Blue
3,Animalia,Arthropoda,Insecta,Lepidoptera,Lycaenidae,Orthomiella,lucida,Orthomiella lucida,
4,Animalia,Arthropoda,Insecta,Lepidoptera,Lycaenidae,Hypolycaena,erylus,Hypolycaena erylus,Common Tit
...,...,...,...,...,...,...,...,...,...
384485,Archaea,Crenarchaeota,Thermoprotei,Desulfurococcales,Desulfurococcaceae,Ignicoccus,hospitalis,Ignicoccus hospitalis,
384486,Archaea,Euryarchaeota,Halobacteria,Halobacteriales,Halobacteriaceae,Halobacterium,salinarum,Halobacterium salinarum,
384487,Archaea,Euryarchaeota,Halobacteria,Haloferacales,Haloferacaceae,Haloquadratum,walsbyi,Haloquadratum walsbyi,
384488,Bamfordvirae,Nucleocytoviricota,Megaviricetes,Pimascovirales,Iridoviridae,Iridovirus,invertebrate iridescent virus 31,Iridovirus invertebrate iridescent virus 31,


## Create custom filter
Demonstrates creating a custom filter by removing two species

In [6]:
label_data = classifier.get_label_data()
taxa_filter = ~label_data.species.isin(["Ursus arctos", "Ursus arctos syriacus"])

In [7]:
classifier.apply_filter(taxa_filter)
preds = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
pd.DataFrame(preds)

Unnamed: 0,file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
0,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.501536
1,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.303386
2,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.060895
3,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Artiodactyla,Cervidae,Cervus,canadensis sibericus,Cervus canadensis sibericus,,0.038811
4,Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos nelsoni,Ursus arctos nelsoni,,0.022875
