<td>
   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/export_data.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
</td>

<td>
<a href="https://github.com/Labelbox/labelbox-python/tree/master/examples/basics/export_data.ipynb" target="_blank"><img
src="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a>
</td>

# Export data
How to export data for projects, datasets, slices, data rows and models, with examples for each type of v2 export along with details on optional parameters and filters.

In [None]:
!pip install -q "labelbox[data]"
!pip install -q urllib3 

In [None]:
import labelbox as lb
import urllib.request
from PIL import Image
import time

# API Key and Client
See the developer guide for [creating an API key](https://docs.labelbox.com/reference/create-api-key).

In [None]:
API_KEY = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VySWQiOiJjbG9vcmRpaGUwMDkyMDcza2Nvcm5jajdnIiwib3JnYW5pemF0aW9uSWQiOiJjbG9vcmRpZ3cwMDkxMDcza2M2cG9oeWFiIiwiYXBpS2V5SWQiOiJjbHBzcjUzNmEwMnVrMDcxNmU3cjI2Nmx2Iiwic2VjcmV0IjoiNzhkNjg2ZDlkMGRmZTExMWY5OTg4OTFiMTk5NmVkZTkiLCJpYXQiOjE3MDE4MDU2NDQsImV4cCI6MjMzMjk1NzY0NH0.DvEOBIEjKQAej7403Y55dh5p-6qbOonos5sPf9f1LWE"
client = lb.Client(api_key=API_KEY)

## Export data rows from a project
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) developer guide.

### Parameters
When you export data rows from a project, you may choose to include or exclude certain attributes, including:
- `attachments`
- `metadata_fields`
- `data_row_details`
- `project_details`
- `label_details`
- `performance_details`
- `interpolated_frames`
    - Only applicable for video data rows.

### Filters
When you export data rows from a project, you can specify the included data rows with the following filters:
- `last_activity_at`
- `label_created_at`
- `data_row_ids`
- `batch_ids`

#### Filter details
You can set the range for `last_activity_at` and `label_created_at` in the following formats: 
- `YYYY-MM-DD`
- `YYYY-MM-DD hh:mm:ss`
- `YYYY-MM-DDThh:mm:ss±hhmm` (ISO 8601)

The ISO 8061 format allows you to specify the timezone, while the other two formats assume timezone from the user's workspace settings.

The `last_activity_at` filter captures the creation and modification of labels, metadata, workflow status, comments, and reviews.

If you wish to specify data rows to export, uncomment the `data_row_ids` filter and provide a list of applicable IDs. The data rows must be part of a batch attached to the project in question. You can provide up to 2,000 data row IDs.

The `batch_ids` filter allows you to specify data rows for export based on their associated batch ID. This is particularly useful when `data_row_ids` is not sufficient due to 2,000 data row IDs limit. 

In [None]:
# Insert the project ID of the project from which you wish to export data rows.
PROJECT_ID = "clpu7vm3w07uc07xf2aoe8rk4"
project = client.get_project(PROJECT_ID)

#### Export V2 Method

In [None]:
# Set the export params to include/exclude certain fields. 
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  # "data_row_ids": ["<data_row_id>", "<data_row_id>"],
  # "batch_ids": ["<batch_id>", "<batch_id>"],
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

#### Stream Task Export Method
The return type of this method is an ExportTask, instead of a Task. This is just a wrapper around Task, and most of its features are also present in ExportTask.
This allows streaming of task results and errors.

In [None]:
# Set the export params to include/exclude certain fields. 
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  # "data_row_ids": ["<data_row_id>", "<data_row_id>"],
  # "batch_ids": ["<batch_id>", "<batch_id>"],
}

client.enable_experimental = True

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)


if export_task.has_errors():
  export_task.get_stream(
  
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

In [None]:
# Uncomment to get stream results as a written file

# Provide results with file converter

# if export_task.has_errors():
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./errors.txt"),
#         stream_type=lb.StreamType.ERRORS
#     ).start()

# if export_task.has_result(): 
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./result.txt"),
#         stream_type=lb.StreamType.RESULT
#     ).start()

## Export data rows from a dataset
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) developer guide.

### Parameters
When you export data rows from a dataset, you may choose to include or exclude certain attributes, including:
- `attachments`
- `metadata_fields`
- `data_row_details`
- `project_details`
- `label_details`
- `performance_details`
- `interpolated_frames`
    - Only applicable for video data rows.
- `project_ids`
    - Accepts a list of project IDs. If provided, the labels created _in these projects_ on the exported data rows will be included. 
- `model_run_ids`
    - Accepts a list of model run IDs. If provided, the labels and predicitions created _in these model runs_ will be included. 

### Filters
When you export data rows from a project, you can specify the included data rows with the following filters:
- `last_activity_at`
- `label_created_at`
- `data_row_ids`

See the _Export data rows from a project_ section above for additional details on each filter. 

In [None]:
# Insert the dataset ID of the dataset from which you wish to export data rows.
DATASET_ID = ""
dataset = client.get_dataset(DATASET_ID)

#### Export V2 Method

In [None]:
# Set the export params to include/exclude certain fields.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  # "project_ids": ["<project_id>", "<project_id>"],
  # "model_run_ids": ["<model_run_id>", "<model_run_id>"]  
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}

export_task = dataset.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

#### Stream Task Export Method
The return type of this method is an ExportTask, instead of a Task. This is just a wrapper around Task, and most of its features are also present in ExportTask.
This allows streaming of task results and errors.

In [None]:
# Set the export params to include/exclude certain fields.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  # "project_ids": ["<project_id>", "<project_id>"],
  # "model_run_ids": ["<model_run_id>", "<model_run_id>"]  
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}

client.enable_experimental = True

export_task = dataset.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)


if export_task.has_errors():
  export_task.get_stream(
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

In [None]:
# Uncomment to get stream results as a written file

# Provide results with file converter

# if export_task.has_errors():
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./errors.txt"),
#         stream_type=lb.StreamType.ERRORS
#     ).start()

# if export_task.has_result(): 
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./result.txt"),
#         stream_type=lb.StreamType.RESULT
#     ).start()

## Export data rows from a slice
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) developer guide.

### Parameters
When exporting from a slice, you can apply the same parameters as exporting from a dataset.

### Filters
No filters are applicable to exports from a slice. All the data rows of the slice must be exported.

In [None]:
# Insert the Catalog slice ID of the slice from which you wish to export data rows.
CATALOG_SLICE_ID = ""
catalog_slice = client.get_catalog_slice(CATALOG_SLICE_ID)

#### Export V2 Method

In [None]:
# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  # "project_ids": ["<project_id>", "<project_id>"],
  # "model_run_ids": ["<model_run_id>", "<model_run_id>"]
}

export_task = catalog_slice.export_v2(params=export_params)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

#### Stream Task Export Method
The return type of this method is an ExportTask, instead of a Task. This is just a wrapper around Task, and most of its features are also present in ExportTask.
This allows streaming of task results and errors.

In [None]:
# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  # "project_ids": ["<project_id>", "<project_id>"],
  # "model_run_ids": ["<model_run_id>", "<model_run_id>"]
}


client.enable_experimental = True

export_task = catalog_slice.export(params=export_params)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)

if export_task.has_errors():
  export_task.get_stream(
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

In [None]:
# Uncomment to get stream results as a written file

# Provide results with file converter

# if export_task.has_errors():
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./errors.txt"),
#         stream_type=lb.StreamType.ERRORS
#     ).start()

# if export_task.has_result(): 
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./result.txt"),
#         stream_type=lb.StreamType.RESULT
#     ).start()

## Export data rows from a model run
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) developer guide.

### Parameters
- `attachments`
- `metadata_fields`
- `data_row_details`
- `interpolated_frames`
    - Only applicable for video data rows.
- `predictions`
    - If true, all predictions made in the model run will be included for each data row in the export.

### Filters
No filters are applicable to exports from a model run. All the data rows of the model run must be exported.


In [None]:
# Insert the model run ID of the model run from which you wish to export data rows.
MODEL_RUN_ID = ""
model_run = client.get_model_run(MODEL_RUN_ID)

#### Export V2 Method

In [None]:
# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "interpolated_frames": True,
  "predictions": True
}

export_task = model_run.export_v2(params=export_params)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

#### Stream Task Export Method
The return type of this method is an ExportTask, instead of a Task. This is just a wrapper around Task, and most of its features are also present in ExportTask.
This allows streaming of task results and errors.

In [None]:
# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "interpolated_frames": True,
  "predictions": True
}

client.enable_experimental = True

export_task = model_run.export(params=export_params)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)

if export_task.has_errors():
  export_task.get_stream(
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

In [None]:
# Uncomment to get stream results as a written file

# Provide results with file converter
# if export_task.has_errors():
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./errors.txt"),
#         stream_type=lb.StreamType.ERRORS
#     ).start()

# if export_task.has_result(): 
#     export_task.get_stream(
#         converter=lb.FileConverter(file_path="./result.txt"),
#         stream_type=lb.StreamType.RESULT
#     ).start()

## Export Data Row
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) developer guide.

### Parameters
When exporting data rows, you can apply the same parameters as exporting from a project.

### Filters
No filters are applicable to export data rows. All the data rows specified in the export task are included.

In [None]:
# Insert the global key of the data row you wish to export
DATA_ROW_GLOBAL_KEY = ""

#### Export V2 Method

In [None]:
# Set the export params to include/exclude certain fields.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Provide a list of data row global keys
export_task = lb.DataRow.export_v2(client=client, global_keys=[DATA_ROW_GLOBAL_KEY], params=export_params)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

#### Stream Task Export Method
The return type of this method is an ExportTask, instead of a Task. This is just a wrapper around Task, and most of its features are also present in ExportTask.
This allows streaming of task results and errors.

In [None]:
# Set the export params to include/exclude certain fields.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

client.enable_experimental = True

# Provide a list of data row global keys
export_task = lb.DataRow.export(client=client, global_keys=[DATA_ROW_GLOBAL_KEY], params=export_params)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)

if export_task.has_errors():
  export_task.get_stream(
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

## How to access a `mask` URL 

Annotations of the kind `ImageSegmentationMask` and `VideoSegmentationMask` can only be present in labels made on image or video data rows, respectively. In order to access the mask data, you must pass your Labelbox API key stored in `client.headers` in an API request.

When you grab a URL from the mask annotation in the export, the `project_id` and `feature_id` will already be in place. Here, we provide the framework for structuring a URL with any project ID and feature ID.

In [None]:
# Provide a project ID and feature ID. Alternatively, replace the entire mask_url with a URL grabbed from your export.
project_id = ""
feature_id = ""

mask_url = f"https://api.labelbox.com/api/v1/projects/{project_id}/annotations/{feature_id}/index/1/mask"

In [None]:
# Make the API request 
req = urllib.request.Request(mask_url, headers=client.headers)

In [None]:
# Print the image of the mask
image = Image.open(urllib.request.urlopen(req))
image
