<td>
   <a target="_blank" href="https://www.clarifai.com/" ><img src="https://upload.wikimedia.org/wikipedia/commons/b/bc/Clarifai_Logo_FC_Web.png" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Clarifai/examples/blob/main/datasets/export/dataset_export.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab"></a>
</td>

# Dataset Export

A dataset is a collection of data examples you can use to annotate and train, test, evaluate your machine learning models. With Clarifai datasets, you can manage the datasets you want to use for visual search, training, and evaluation.

This notebook contains
- Examples of how to export your datasets from clarifai app using features from `Dataset`. Converting that dataset to other annotation formats

# Table of Contents
* [Requirements](#requirements)

* [Upload Dataset](#upload_dataset)

* [Dataset Export](#dataset_export)

* [Export Clarifai to Other Formats](#export_annotation)
    * [COCO](#coco)
    * [YOLO](#yolo)

* [Supported Formats](#formats)
 

## Requirements <a id="requirements"></a>

In [None]:
!pip install -U clarifai

In [None]:
import os
#Replace your PAT
os.environ['CLARIFAI_PAT'] = "PAT"

*Note: Guide to get your [PAT](https://docs.clarifai.com/clarifai-basics/authentication/personal-access-tokens)*

### For Colab

Note: To use the data from the example repo itself

In [None]:
!git clone https://github.com/Clarifai/examples.git
%cd examples/datasets/upload/

## Dataset Interface

In [None]:
from clarifai.client.dataset import Dataset
#replace your "user_id", "app_id", "dataset_id".
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")

## Upload Dataset  <a id="upload_dataset"></a>

- Examples of how to upload your local directory datasets into Clarifai App using `module_dir` feature from `Dataset`.

### Object Detection - VOC - 2012

In [None]:
#importing load_module_dataloader for calling the dataloader object in dataset.py in the local data folder
from clarifai.datasets.upload.utils import load_module_dataloader

In [8]:
voc_dataloader = load_module_dataloader('../upload/image_detection/voc')
dataset.upload_dataset(dataloader=voc_dataloader)

Uploading Dataset: 100%|██████████████████████████████████████████████| 1/1 [00:08<00:00,  8.24s/it]


### Creating a dataset version

In [None]:
dataset_demo_version = dataset.create_version()

## Dataset Export    <a id="dataset_export"></a>

Exporting the Dataset to your local path

In [None]:
dataset_demo_version.export(save_path='output_demo.zip')

Extract the zip file

In [14]:
import zipfile

def extract_zip_file(file_path, extract_path='.'):
    with zipfile.ZipFile(file_path, 'r') as zip_ref:
        zip_ref.extractall(extract_path)

# Usage
extract_zip_file('output_demo.zip')

Preview of the exported folder

In [15]:
import os

def list_files(startpath):
    for root, dirs, files in os.walk(startpath):
        level = root.replace(startpath, '').count(os.sep)
        indent = ' ' * 4 * (level)
        print('{}{}/'.format(indent, os.path.basename(root)))
        subindent = ' ' * 4 * (level + 1)
        for f in files:
            print('{}{}'.format(subindent, f))

list_files('./all')

all/
    annotations/
        example-notebook-2009_004382.json
        example-notebook-2008_003182.json
        example-notebook-2008_000853.json
        example-notebook-2008_008526.json
        example-notebook-2012_000690.json
        example-notebook-2007_000464.json
        example-notebook-2011_006412.json
        example-notebook-2011_000430.json
        example-notebook-2011_001610.json
        example-notebook-2009_004315.json
    inputs/
        example-notebook-2009_004315.png
        example-notebook-2011_001610.png
        example-notebook-2011_000430.png
        example-notebook-2008_008526.png
        example-notebook-2009_004382.png
        example-notebook-2007_000464.png
        example-notebook-2012_000690.png
        example-notebook-2008_000853.png
        example-notebook-2011_006412.png
        example-notebook-2008_003182.png


# Export Clarifai to Other Formats  <a id="export_annotation"></a>

## Requirements <a id="requirements"></a>

### Installing clarifai-datautils  

In [None]:
!pip install -U clarifai-datautils

Refer to the clarifai-datautils repo for more info: https://github.com/Clarifai/clarifai-python-datautils

## Import Clarifai Dataset

In [17]:
from clarifai_datautils import ImageAnnotations
clarifai_dataset = ImageAnnotations.import_from(path='./all',format= 'clarifai')
print(clarifai_dataset)

  from .autonotebook import tqdm as notebook_tqdm


Dataset
	size=10
	source_path=None
	annotated_items_count=10
	annotations_count=18
subsets
	default: # of items=10, # of annotated items=10, # of annotations=18, annotation types=['bbox']
infos
	categories
	label: ['bird', 'person', 'dog', 'sofa', 'bottle', 'cat', 'cow', 'horse']



## Export to COCO Format    <a id="coco"></a>

**save_images** param will save the images too.

In [18]:
clarifai_dataset.export_to(path='./clarifai_to_coco',format='coco_detection',save_images=True)

Preview of the converted COCO Format

In [19]:
list_files('./clarifai_to_coco')

clarifai_to_coco/
    images/
        default/
            example-notebook-2009_004315.png
            example-notebook-2011_001610.png
            example-notebook-2011_000430.png
            example-notebook-2008_008526.png
            example-notebook-2009_004382.png
            example-notebook-2007_000464.png
            example-notebook-2012_000690.png
            example-notebook-2008_000853.png
            example-notebook-2011_006412.png
            example-notebook-2008_003182.png
    annotations/
        instances_default.json


## Export to YOLO Format    <a id="yolo"></a>

**save_images** param will save the images too.

In [21]:
clarifai_dataset.export_to(path='./clarifai_to_yolo',format='yolo',save_images=True)

Preview of the converted YOLO Format

In [22]:
list_files('./clarifai_to_yolo')

clarifai_to_yolo/
    obj.data
    train.txt
    obj.names
    obj_train_data/
        example-notebook-2009_004315.png
        example-notebook-2011_000430.txt
        example-notebook-2011_001610.png
        example-notebook-2008_008526.txt
        example-notebook-2009_004315.txt
        example-notebook-2011_000430.png
        example-notebook-2011_001610.txt
        example-notebook-2008_008526.png
        example-notebook-2009_004382.png
        example-notebook-2007_000464.png
        example-notebook-2012_000690.txt
        example-notebook-2011_006412.txt
        example-notebook-2008_000853.txt
        example-notebook-2008_003182.txt
        example-notebook-2009_004382.txt
        example-notebook-2007_000464.txt
        example-notebook-2012_000690.png
        example-notebook-2008_000853.png
        example-notebook-2011_006412.png
        example-notebook-2008_003182.png


### Note: Here is the list of [Supported Formats](https://github.com/Clarifai/examples/tree/main/Data_Utils#supported-formats)    <a id="formats"></a>

## Clarifai Resources

**Website**: [https://www.clarifai.com](https://www.clarifai.com/)

**Demo**: [https://clarifai.com/demo](https://clarifai.com/demo)

**Sign up for a free Account**: [https://clarifai.com/signup](https://clarifai.com/signup)

**Developer Guide**: [https://docs.clarifai.com](https://docs.clarifai.com/)

**Clarifai Community**: [https://clarifai.com/explore](https://clarifai.com/explore)

**Python SDK Docs**: [https://docs.clarifai.com/python-sdk/api-reference](https://docs.clarifai.com/python-sdk/api-reference)

---