# Run CleanVision on a Hugging Face dataset

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cleanlab/cleanvision/blob/main/docs/source/tutorials/huggingface_dataset.ipynb) 

In [None]:
!pip install -U pip
!pip install "cleanvision[huggingface] @ git+https://github.com/cleanlab/cleanvision.git"

**After you install these packages, you may need to restart your notebook runtime before running the rest of this notebook.**

In [None]:
from datasets import load_dataset
from cleanvision import Imagelab

### 1. Download dataset

[cats_vs_dogs](https://huggingface.co/datasets/cats_vs_dogs) is a subset of Assira dataset which contains millions of images of pets classified into cats and dogs.

Please note though this a classification dataset, CleanVision can be used to audit images from any type of dataset (including supervised or unsupervised learning).

Load train split of the dataset.

In [None]:
dataset = load_dataset("cats_vs_dogs", split="train")

See more information on the dataset like features and number of examples

In [None]:
dataset

`dataset.features` is a `dict[column_name, column_type]` that contains information about the different columns in the dataset and the type of each column. Use `dataset.features` to find the key that contains the Image feature.

In [None]:
dataset.features

### 2. View sample images in the dataset

Initialize Imagelab

In [None]:
imagelab = Imagelab(hf_dataset=dataset, image_key="image")

In [None]:
imagelab.visualize()

### 3. Run CleanVision

In [None]:
imagelab.find_issues()

### 4. View Results
Get a report of all the issues found

In [None]:
imagelab.report()

View more information about each image, such as what types of issues it exhibits and its quality score with respect to each type of issue.

In [None]:
imagelab.issues

Get indices of all **blurry** images in the dataset sorted by their blurry score.

In [None]:
indices = imagelab.issues.query('is_blurry_issue').sort_values(by='blurry_score').index.tolist()

View the 8th blurriest image in the dataset

In [None]:
dataset[indices[8]]['image']

View global information about each issue, such as how many images in the dataset suffer from this issue.

In [None]:
imagelab.issue_summary

**For more detailed guide on how to use CleanVision, check the [tutorial notebook](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/tutorial.ipynb).**