# Loading Classification Data

FiftyOne has many awesome features to leverage in your AI workflows, but we all need to start somewhere! In this recipe, we will briefly show how to load Classification datasets in both a classification tree structure as well as from a CSV.

## Setup

If you haven't already, install FiftyOne with the following:

In [None]:
!pip install fiftyone

We will also be downloading some data from [Kaggle](https://www.kaggle.com/) for our example. Feel free to follow along by downloading using the API command below, downloading from the [link](https://www.kaggle.com/datasets/sshikamaru/fruit-recognition), or using your own data!

In [None]:
!kaggle datasets download -d sshikamaru/fruit-recognition

In [None]:
!unzip fruit-recognition.zip -d fruit-recognition

## Image Classification Directory Trees

Below we can see that our classification data is stored in directories with the name of the class that images belong to. In FiftyOne, we call this an `ImageClassificationDirectoryTree`.

In [4]:
!ls fruit-recognition/train/train

'Apple Braeburn'       Clementine       Orange		 Pomegranate
'Apple Granny Smith'   Corn	        Papaya		'Potato Red'
 Apricot	      'Cucumber Ripe'  'Passion Fruit'	 Raspberry
 Avocado	      'Grape Blue'      Peach		 Strawberry
 Banana		       Kiwi	        Pear		 Tomato
 Blueberry	       Lemon	       'Pepper Green'	 Watermelon
'Cactus fruit'	       Limes	       'Pepper Red'
 Cantaloupe	       Mango	        Pineapple
 Cherry		      'Onion White'     Plum


FiftyOne has a built-in data importer for [`ImageClassificationDirectoryTree`](https://docs.voxel51.com/user_guide/dataset_creation/datasets.html#imageclassificationdirectorytree) that we can leverage here! Using the code block below, we can quickly load in our of fruit dataset with all the labels attached.

In [None]:
import fiftyone as fo

name = "Fruit Recognition"
dataset_dir = "fruit-recognition/train/train"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.ImageClassificationDirectoryTree,
    name=name,
)

session = fo.launch_app(dataset.shuffle())

![fruits](../assets/fruits.png)

We can also verify the succesfuly importing of our dataset by printing our the dataset!

In [10]:
dataset

Name:        Fruit Recognition
Media type:  image
Num samples: 16854
Persistent:  False
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)

## Loading Classification Data from CSV/Pandas

Sometimes classification data is stored in a different method that a directory tree. One of the most common alternatives is storing the classification data in a CSV or Pandas table. In order to handle these custom cases, we can easily build a custom ingestor that takes in our images and labels and creates a classification dataset. Once again to get started, we will be downloading from kaggle the [english-handwritten-characters-dataset](https://www.kaggle.com/datasets/dhruvildave/english-handwritten-characters-dataset).

In [None]:
!kaggle datasets download -d dhruvildave/english-handwritten-characters-dataset

In [None]:
!unzip english-handwritten-characters-dataset.zip -d english-handwritten-characters-dataset

Unlike last time, the dataset is stored with a csv and a single directory of images. The CSV dictates the label that each image receives

In [9]:
!ls english-handwritten-characters-dataset

english.csv  Img


To begin, we can load in our csv to a pandas dataframe for easy use later. Print out the first few rows to get an idea of the shape of the table.

In [11]:
import pandas as pd

# Load the labels CSV file into a DataFrame
df = pd.read_csv("english-handwritten-characters-dataset/english.csv")

# Print the first few rows (head) of the DataFrame
print(df.head())

                image label
0  Img/img001-001.png     0
1  Img/img001-002.png     0
2  Img/img001-003.png     0
3  Img/img001-004.png     0
4  Img/img001-005.png     0


Next, we create a dataset that will be the base of our new classification dataset. We use `fo.Dataset.from_dir` to load in our images directory, `Img`, into FiftyOne _without_ labels to start. We will add them in after this step! We can see when we launch the app here that there are no labels yet.

In [None]:
name = "English Handwriting"
dataset_dir = "english-handwritten-characters-dataset/Img"

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.ImageDirectory,
    name=name,
)

session = fo.launch_app(dataset)

![class-no-labels](../assets/class-no-labels.png)

Now it is time to add our classification labels into our dataset! To accomplish this, we iterate through each sample in our dataset and grab the corresponding label from our Pandas DataFrame. We can add any field to a sample, even [classification labels](https://docs.voxel51.com/user_guide/using_datasets.html#classification), by using `sample["field_name"] = value`. Let's go ahead and load those labels in and visualize our results!

In [None]:
for sample in dataset:

    # Grab file name to lookup in Pandas DataFrame
    image_name = "Img/" + sample.filepath.split("/")[-1]
    
    row = df[df["image"] == image_name]
    
    # If the row was found, continue
    if len(row) > 0:
        
        # Grab the label
        label = row["label"].item()
        
        # Add the label to your sample
        sample["ground_truth"] = fo.Classification(label=label)
        sample.save()

session.show()

![class-labels](../assets/class-labels.png)