# Find Label Mistakes With Embeddings

FiftyOne provides a powerful [embeddings visualization](https://docs.voxel51.com/user_guide/brain.html#visualizing-embeddings) capability that you can use to generate low-dimensional representations of the samples and objects in your datasets. By leveraging embeddings you can identify anomolous/incorrect image labels that our hiding in your dataset. Let's find out how!

## Setup

If you haven’t already, install FiftyOne:

In [None]:
!pip install fiftyone

In this tutorial, we’ll use Hugging Face Hub Integration, some PyTorch models to generate embeddings, and we’ll use the (default) [UMAP method](https://github.com/lmcinnes/umap) to generate embeddings, so we’ll need to install the corresponding packages:

In [None]:
!pip install torch torchvision umap-learn "huggingface_hub>=0.20.0"

## Loading BDD100k

We will be using the [Berkeley Deep Drive (BDD)](https://huggingface.co/datasets/dgural/bdd100k) dataset as our example for this recipe. It is a high quality driving dataset that includes several label types, including classification labels! We can load it in FiftyOne format directly from [HuggingFace Hub](https://docs.voxel51.com/integrations/huggingface.html#loading-datasets-from-the-hub)!

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub("dgural/bdd100k")

session = fo.launch_app(dataset)

![bdd100k](../assets/bdd100k.png)

Our next step is to add our embedding visualization using [`fob.compute_visualization`](https://docs.voxel51.com/api/fiftyone.brain.html?highlight=compute_visualization#fiftyone.brain.compute_visualization)! We will use a strong embedding model like OpenAI's CLIP model to generate our embeddings and use UMAP to generate the 2D visualization. If interested, learn more about visualizing embeddings [here](https://docs.voxel51.com/user_guide/brain.html#visualizing-embeddings)!

## Using Embeddings

In [None]:
import fiftyone.brain as fob

results = fob.compute_visualization(
    dataset,
    model="clip-vit-base32-torch",
    brain_key="img_viz",
)

We can then open up our app again and view our embeddings using the embeddings tab! We can also go ahead and color them by `timeofday.label`

In [None]:
session.show()

![open-embeddings](../assets/open-bdd-embeddings.gif)

You can even split the view to begin lasso selecting groups of samples!

![lasso-embeddings](../assets/lasso-select.gif)

## Finding Mistakes

In order to find mistakes, we need to dive a little deeper into our embeddings. Luckily, finding classification mistakes with embeddings + FiftyOne is easy! We start by looking for outliers based on the colors of their labels. You can even turn on and off classes to make this even easier!

![find-mistakes](../assets/find-class-mistakes-embeddings.gif)

When we turn of every class but the night class above, we can find many `night` samples that were hiding amongst the samples labeled `day`! Closer inspection finds that many of these of these are actually mislabeled `day` samples! In FiftyOne, we can tag our samples and export them for annotation job with one of labeling integrations: [CVAT](https://docs.voxel51.com/integrations/cvat.html), [Label Studio](https://docs.voxel51.com/integrations/labelstudio.html), [V7](https://docs.voxel51.com/integrations/v7.html), or [LabelBox](https://docs.voxel51.com/integrations/labelbox.html)!