# Exploring Machine Learning Datasets

This class we'll introduce you further to the idea of a dataset, exploring the ways in which they construct an image of the world for machine learning models. To do so, we'll use [FiftyOne](https://github.com/voxel51/fiftyone), an application that allows us to view datasets, their annotations, and a lot more.

In [1]:
# installing fiftyone
!pip3 install fiftyone
# !pip3 install torch torchvision
!!fiftyone plugins download https://github.com/voxel51/fiftyone-plugins



['Downloading voxel51/fiftyone-plugins...',
 '',
 '',
 '',
 '  260.5Kb [10.7ms elapsed, ? remaining, 23.8Mb/s] ',
 '                                                  ',
 '',
 '    3.6Mb [143.4ms elapsed, ? remaining, 24.8Mb/s] ',
 '                                                   ',
 '',
 '   11.3Mb [246.3ms elapsed, ? remaining, 45.8Mb/s] ',
 '                                                   ',
 '',
 '   16.6Mb [355.2ms elapsed, ? remaining, 46.7Mb/s] ',
 '                                                   ',
 '',
 '   21.7Mb [459.7ms elapsed, ? remaining, 47.2Mb/s] ',
 '                                                   ',
 '',
 '   26.6Mb [560.2ms elapsed, ? remaining, 47.4Mb/s] ',
 '                                                   ',
 '',
 '   31.8Mb [666.6ms elapsed, ? remaining, 47.7Mb/s] ',
 '                                                   ',
 '',
 '   36.7Mb [767.6ms elapsed, ? remaining, 47.8Mb/s] ',
 '                                                   ',
 '',
 '   41

In [2]:
# here we import fiftyone and fiftyone zoo, a quick way to access ML datasets via fiftyone.
import fiftyone as fo
import fiftyone.zoo as foz

In [3]:
# this function allows us to list all the datasets available in fiftyone zoo
foz.list_zoo_datasets()

['activitynet-100',
 'activitynet-200',
 'bdd100k',
 'caltech101',
 'caltech256',
 'cifar10',
 'cifar100',
 'cityscapes',
 'coco-2014',
 'coco-2017',
 'fashion-mnist',
 'fiw',
 'hmdb51',
 'imagenet-2012',
 'imagenet-sample',
 'kinetics-400',
 'kinetics-600',
 'kinetics-700',
 'kinetics-700-2020',
 'kitti',
 'kitti-multiview',
 'lfw',
 'mnist',
 'open-images-v6',
 'open-images-v7',
 'places',
 'quickstart',
 'quickstart-3d',
 'quickstart-geo',
 'quickstart-groups',
 'quickstart-video',
 'sama-coco',
 'ucf101',
 'voc-2007',
 'voc-2012']

In [4]:
# this function lists all currently downloaded datasets
foz.list_downloaded_zoo_datasets()

{'coco-2017': ('/Users/muradkhan/fiftyone/coco-2017',
  <fiftyone.zoo.datasets.ZooDatasetInfo at 0x11905bfe0>),
 'quickstart-video': ('/Users/muradkhan/fiftyone/quickstart-video',
  <fiftyone.zoo.datasets.ZooDatasetInfo at 0x108203f50>),
 'quickstart-groups': ('/Users/muradkhan/fiftyone/quickstart-groups',
  <fiftyone.zoo.datasets.ZooDatasetInfo at 0x119eef980>),
 'imagenet-sample': ('/Users/muradkhan/fiftyone/imagenet-sample',
  <fiftyone.zoo.datasets.ZooDatasetInfo at 0x11a5c3ad0>),
 'quickstart': ('/Users/muradkhan/fiftyone/quickstart',
  <fiftyone.zoo.datasets.ZooDatasetInfo at 0x11a5c3b00>),
 'quickstart-3d': ('/Users/muradkhan/fiftyone/quickstart-3d',
  <fiftyone.zoo.datasets.ZooDatasetInfo at 0x11a5e3590>)}

In [None]:
# this function loads a specific dataset from fiftyone zoo. We can load any dataset by passing the name of the dataset as a string.
foz.load_zoo_dataset("quickstart")
# foz.list_zoo_datasets("detection")

Dataset already downloaded
Loading existing dataset 'quickstart'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


Name:        quickstart
Media type:  image
Num samples: 200
Persistent:  True
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    ground_truth:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    uniqueness:       fiftyone.core.fields.FloatField
    predictions:      fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)

# Explore the App

In [None]:
#select your initial dataset
dataset = foz.load_zoo_dataset("quickstart")

Dataset already downloaded
Loading existing dataset 'quickstart-groups'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use


In [8]:
#launch a local session of the fiftyone app
session = fo.launch_app(dataset, auto=False)


Session launched. Run `session.show()` to open the App in a cell output.


In [9]:
session.show

<bound method Session.show of Dataset:         quickstart-groups
Media type:      group
Num groups:      200
Selected groups: 0
Selected labels: 0
Session URL:     http://localhost:5151/>