# Run CleanVision on Torchvision dataset

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cleanlab/cleanvision/blob/main/examples/torchvision_dataset.ipynb) 

In [None]:
!pip install -U pip
!pip install cleanvision[pytorch]

**After you install these packages, you may need to restart your notebook runtime before running the rest of this notebook.**

In [None]:
from torchvision.datasets import CIFAR10
from torch.utils.data import ConcatDataset
from cleanvision.imagelab import Imagelab

### Download dataset and concatenate all splits

Since we're interested in generally understanding what issues plague our data, we merge the training and test sets into one larger dataset before running CleanVision. You could alternatively just run the package on these two sets of data separately to obtain two different reports.

[CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) is classification dataset, but CleanVision can be used to audit images from any type of dataset (including supervised or unsupervised learning).

Load all splits of the CIFAR10 dataset

In [None]:
train_set = CIFAR10(root="./", download=True)
test_set = CIFAR10(root="./", train=False, download=True)

Concatenate train and test splits

In [None]:
dataset = ConcatDataset([train_set, test_set])

A sample from the dataset

In [None]:
dataset[0]

Let's look at the first image in this dataset

In [None]:
dataset[0][0]

### Run CleanVision

In [None]:
imagelab = Imagelab(torchvision_dataset=dataset)

We set `n_jobs = 1` as CleanVision parallelization may interact with torch dataloaders in unexpected ways.

In [None]:
imagelab.find_issues(n_jobs=1)

Get a report of all the issues found

In [None]:
imagelab.report()

View more information about each image, such as what types of issues it exhibits and its quality score with respect to each type of issue.

In [None]:
imagelab.issues

Get indices of all **dark** images in the dataset sorted by their dark score.

In [None]:
indices = imagelab.issues.query('is_dark_issue').sort_values(by='dark_score').index.tolist()

View the 5th darkest image in the dataset

In [None]:
dataset[indices[5]][0]

View global information about each issue, such as how many images in the dataset suffer from this issue.

In [None]:
imagelab.issue_summary

**For more detailed guide on how to use CleanVision, check the [tutorial notebook](https://github.com/cleanlab/cleanvision/blob/main/examples/tutorial.ipynb).**