<td>   <a target="_blank" href="https://labelbox.com" ><img src="https://labelbox.com/blog/content/images/2021/02/logo-v4.svg" width=256/></a></td>

<td><a href="https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/basics/export_v1_migration_support.ipynb" target="_blank"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td>
<td><a href="https://github.com/Labelbox/labelbox-python/tree/develop/examples/basics/export_v1_migration_support.ipynb" target="_blank"><imgsrc="https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white" alt="GitHub"></a></td>

# Export V1 to V2 migration 

**Export V1 will no longer be available in any version of the SDK starting on April 2024**. We recommend users to plan accordingly. 

This notebook is designed to help users identify alternative V2 export methods that can serve as replacements for V1 methods.

### Key changes included in export V2 methods ( ``export()`` and ``export_v2()``):
1. Added flexibility to only export the data that is needed. The new methods include parameters and filters to give you more granular control over your exports.
2. Added functionality to stream an **unlimited** number of data rows using ``export()`` (available on SDK >=3.56). Upgrading to `export()` is recommended as it is a more scalable solution.

For complete details on how to use export V2 methods please see the [Export V2 methods](https://docs.labelbox.com/reference/label-export#export-v2-methods) documentation.

###  Export V1 deprecated methods: 
Project methods : 
1. ```project.export_labels()```
2. ```project.label_generator()```
3. ```project.export_queued_data_rows()```

Dataset methods: 
1. ```dataset.export_data_rows()```

Batch methods: 
1. ```batch.export_data_rows()```

Model methods :
1. ```model_run.export_labels()```



# Imports

In [None]:
%pip install -q "labelbox[data]"

In [None]:
import labelbox as lb
import pprint

pp = pprint.PrettyPrinter(width=30, compact=True)

## API Key and Client
See the developer guide for [creating an API key](https://docs.labelbox.com/reference/create-api-key).

In [None]:
API_KEY = ""
client = lb.Client(api_key=API_KEY)
client.enable_experimental = (
    True  ## This is required if using the export() streamable method
)

# Exports V1 to V2 guidelines
The follow sections will demonstrate how to use the export V2 methods to fetch data from your projects, datasets, batches and model runs. 

## Export labels from a project


In [None]:
PROJECT_ID = ""
project = client.get_project(PROJECT_ID)

##### Export V1 (deprecated)  
1. ```project.export_labels()``` 
    - Parameters:  
        - ```download: bool = False```
        - ```timeout_seconds: int = 1800```
    - Output : (str | List[Dict[Any, Any]] | None)

For a comprehensive example of Export V1 ``project.export_labels()`` output, please refer to our documentation: [Export V1 sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export)

2. ```project.label_generator()```
    - Parameters: 
        - ```timeout_seconds: int = 600```
    - Output: LabelGenerator

In [None]:
# Single entry from the output of project.label_generator() (deprecated)
# Label objects will not be deprecated.
single_output_from_generator = """

Label(
    uid='clrf5csho2ihx07ilffgp2fzj',
    data=ImageData(
        im_bytes=None,
        file_path=None,
        url='https://storage.googleapis.com/labelbox-datasets/image_sample_data/2560px-Kitano_Street_Kobe01s5s4110.jpeg',
        arr=None
    ),
    annotations=[
        ObjectAnnotation(
            confidence=None,
            name='bounding_box',
            feature_schema_id='clrf5ck4a0b9b071paa9ncu15',
            extra={
                'instanceURI': 'https://api.labelbox.com/masks/feature/clrf5csvi6ofm07lsf9pygwvi?token=<token>'
                'color': '#ff0000',
                'feature_id': 'clrf5csvi6ofm07lsf9pygwvi',
                'value': 'bounding_box',
                'page': None,
                'unit': None
            },
            value=Rectangle(
                extra={},
                start=Point(extra={}, x=2096.0, y=1264.0),
                end=Point(extra={}, x=2240.0, y=1689.0)
            ),
            classifications=[]
        ),
        # Add more annotations as needed
        # ...
    ],
    extra={
        'Created By': 'aovalle@labelbox.com',
        'Project Name': 'Image Annotation Import Demo',
        'Created At': '2024-01-15T16:35:59.000Z',
        'Updated At': '2024-01-15T16:51:56.000Z',
        'Seconds to Label': 66.0,
        'Agreement': -1.0,
        'Benchmark Agreement': -1.0,
        'Benchmark ID': None,
        'Dataset Name': 'image-demo-dataset',
        'Reviews': [],
        'View Label': 'https://editor.labelbox.com?project=clrf5ckex09m9070x1te223u5&label=clrf5csho2ihx07ilffgp2fzj',
        'Has Open Issues': 0.0,
        'Skipped': False,
        'media_type': 'image',
        'Data Split': None,
        'Global Key': '2560px-Kitano_Street_Kobe01s5s41102.jpeg'
    }
)

"""

##### Export V2 

For complete details on the supported filters abd parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) documentation.

1. ```project.export()``` : Starting from SDK version 3.56, a streamable method is available, this method allows you to stream unlimited number of data rows. However, if you are using an earlier version, you can still utilize the ```export_v2()``` function with identical parameters. It's important to note that the output task type differs, and streaming data methods are not included in `export_v2()`.

    - Parameters:  
        - ```"label_details": True```
        - ```"attachments": True```
        - ```"data_row_details": True```
        - ```"project_details": True```
        - ```"label_details": True```
        - ```"performance_details": True```
    - Output: 
       - ```ExportTask```
            - `ExportTask.has_result()` return type:  bool 
            - `ExportTask.has_errors()` return type: bool
            - `ExportTask.get_stream()` return type: Stream[JsonConverterOutput]

In [None]:
## Set the export parameters to only export labels
export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "label_details": True,
    "performance_details": True,
}
# You also have the option to include additional filtering to narrow down the list of labels
filters = {}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one


# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
    print(output.json_str)


if export_task.has_errors():
    export_task.get_stream(converter=lb.JsonConverter(),
                           stream_type=lb.StreamType.ERRORS).start(
                               stream_handler=lambda error: print(error))

if export_task.has_result():
    export_json = export_task.get_stream(
        converter=lb.JsonConverter(), stream_type=lb.StreamType.RESULT).start(
            stream_handler=json_stream_handler)

print(
    "file size: ",
    export_task.get_total_file_size(stream_type=lb.StreamType.RESULT),
)
print(
    "line count: ",
    export_task.get_total_lines(stream_type=lb.StreamType.RESULT),
)

## Export queued ("To Label") data rows from a project

##### Export V1 (deprecated):  
1. ``project.export_queued_data_rows()`` :
    - Parameters: 
        - ``timeout_seconds: int = 120``
        - ``include_metadata: bool = False``
    - Output: List[Dict[str, str]]

In [None]:
# Single entry from the output of project.export_queued_data_rows() (deprecated)
single_output_example = """
[
  {'id': 'clpouak6nap2g0783ajd1d6pf',
 'createdAt': '2023-12-03T02:04:34.062Z',
 'updatedAt': '2023-12-03T02:05:33.797Z',
 'externalId': None,
 'globalKey': 'b57c9ab2-304f-4c17-ba5f-c536f39a6a46',
 'metadataFields': [],
 'customMetadata': [],
 'rowData': 'https://storage.googleapis.com/labelbox-developer-testing-assets/image/data_files/santa.jpeg',
 'mediaAttributes': {'assetType': 'image',
  'contentLength': 305973,
  'height': 1333,
  'mimeType': 'image/jpeg',
  'subType': 'jpeg',
  'superType': 'image',
  'width': 2000}}
]

"""

##### Export V2
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) documentation.

1. ```project.export()``` : Starting from SDK version 3.56, a streamable method is available, this method allows you to stream unlimited number of data rows. However, if you are using an earlier version, you can still utilize the ```export_v2()``` function with identical parameters. It's important to note that the output task type differs, and streaming data methods are not included in `export_v2()`.

    - Parameters (Minimum required parameters):  
      - ```"data_row_details": True```
      - ```"project_details": True```
    - Required filters: 
      - ``` "workflow_status": "ToLabel"```
    - Output: 
       - ```ExportTask```
            - `ExportTask.has_result()` return type:  bool 
            - `ExportTask.has_errors()` return type: bool
            - `ExportTask.get_stream()` return type: Stream[JsonConverterOutput]

In [None]:
export_params = {
    "attachments": True,  # Set to true if you want to export attachments
    "metadata_fields": True,  # Set to true if you want to export metadata
    "data_row_details": True,
    "project_details": True,
}
filters = {
    "workflow_status":
        "ToLabel"  ## Using this filter will only export queued data rows
}

# An ExportTask is returned, this provides additional information about the status of your task, such as
# any errors encountered and includes additional methods to stream your data

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one


# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
    print(output.json_str)


if export_task.has_errors():
    export_task.get_stream(converter=lb.JsonConverter(),
                           stream_type=lb.StreamType.ERRORS).start(
                               stream_handler=lambda error: print(error))

if export_task.has_result():
    export_json = export_task.get_stream(
        converter=lb.JsonConverter(), stream_type=lb.StreamType.RESULT).start(
            stream_handler=json_stream_handler)

print(
    "file size: ",
    export_task.get_total_file_size(stream_type=lb.StreamType.RESULT),
)
print(
    "line count: ",
    export_task.get_total_lines(stream_type=lb.StreamType.RESULT),
)

## Export data rows from a Dataset 

In [None]:
DATASET_ID = ""
dataset = client.get_dataset(DATASET_ID)

#### Export V1 (deprecated): 

1. ```dataset.export_data_rows()```
  - Parameters:  
    - ``timeout_seconds=120``
    - ``include_metadata: bool = True``
  - Output: 
    - Data row object generator


In [None]:
# Single entry from the output of dataset.export_data_rows() (deprecated)
# Data row objects will not be deprecated.

single_output_from_data_row_generator = """
<DataRow {
    "created_at": "2023-12-03 02:04:34.062000+00:00",
    "external_id": null,
    "global_key": "b57c9ab2-304f-4c17-ba5f-c536f39a6a46",
    "media_attributes": {
        "assetType": "image",
        "contentLength": 305973,
        "height": 1333,
        "mimeType": "image/jpeg",
        "subType": "jpeg",
        "superType": "image",
        "width": 2000
    },
    "metadata": [],
    "metadata_fields": [],
    "row_data": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/data_files/santa.jpeg",
    "uid": "clpouak6nap2g0783ajd1d6pf",
    "updated_at": "2023-12-03 02:05:33.797000+00:00"
}>
"""

#### Export V2
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) documentation.

1. ```project.export()``` : Starting from SDK version 3.56, a streamable method is available, this method allows you to stream unlimited number of data rows. However, if you are using an earlier version, you can still utilize the ```export_v2()``` function with identical parameters. It's important to note that the output task type differs, and streaming data methods are not included in `export_v2()`.

    - Parameters (minimum required parameters):  
      - ``"data_row_details": True``
    - Output: 
       - ```ExportTask```
            - `ExportTask.has_result()` return type:  bool 
            - `ExportTask.has_errors()` return type: bool
            - `ExportTask.get_stream()` return type: Stream[JsonConverterOutput]

In [None]:
export_params = {
    "attachments": True,  # Set to true if you want to export attachments
    "metadata_fields": True,  # Set to true if you want to export metadata
    "data_row_details": True,
}
filters = {}

# A task is returned, this provides additional information about the status of your task, such as
# any errors encountered
export_task = dataset.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one


# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
    print(output.json_str)


if export_task.has_errors():
    export_task.get_stream(converter=lb.JsonConverter(),
                           stream_type=lb.StreamType.ERRORS).start(
                               stream_handler=lambda error: print(error))

if export_task.has_result():
    export_json = export_task.get_stream(
        converter=lb.JsonConverter(), stream_type=lb.StreamType.RESULT).start(
            stream_handler=json_stream_handler)

print(
    "file size: ",
    export_task.get_total_file_size(stream_type=lb.StreamType.RESULT),
)
print(
    "line count: ",
    export_task.get_total_lines(stream_type=lb.StreamType.RESULT),
)

## Export data rows from a batch

#### Export V1 (deprecated): 
1. ```batch.export_data_rows()```
  - Parameters:  
    - ``timeout_seconds=120``
    - ``include_metadata: bool = True``
  - Output: 
    - Data row object generator

In [None]:
# Single output from batch.export_data_rows() method (deprecated)
# Data row objects will not be deprecated

single_output_from_data_row_generator = """
<DataRow {
    "created_at": "2023-12-03 02:04:34.062000+00:00",
    "external_id": null,
    "global_key": "b57c9ab2-304f-4c17-ba5f-c536f39a6a46",
    "media_attributes": {
        "assetType": "image",
        "contentLength": 305973,
        "height": 1333,
        "mimeType": "image/jpeg",
        "subType": "jpeg",
        "superType": "image",
        "width": 2000
    },
    "metadata": [],
    "metadata_fields": [],
    "row_data": "https://storage.googleapis.com/labelbox-developer-testing-assets/image/data_files/santa.jpeg",
    "uid": "clpouak6nap2g0783ajd1d6pf",
    "updated_at": "2023-12-03 02:05:33.797000+00:00"
}>
"""

#### Export V2
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) documentation.

1. ```project.export()``` : Starting from SDK version 3.56, a streamable method is available, this method allows you to stream unlimited number of data rows. However, if you are using an earlier version, you can still utilize the ```export_v2()``` function with identical parameters. It's important to note that the output task type differs, and streaming data methods are not included in `export_v2()`.

    - Required parameters:  
      - ```"data_row_details": True```,
      - ```"batch_ids": [<batch_id>] ```
    - Output: 
       - ```ExportTask```
            - `ExportTask.has_result()` return type:  bool 
            - `ExportTask.has_errors()` return type: bool
            - `ExportTask.get_stream()` return type: Stream[JsonConverterOutput]

In [None]:
# Find the batch ID by navigating to "Batches" -->  "Manage batches" --> "Copy Batch ID"
BATCH_ID = ""

In [None]:
export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "performance_details": True,
    "batch_ids": [
        BATCH_ID
    ],  # Include batch ids if you only want to export specific batches, otherwise,
    # you can export all the data without using this parameter
}
filters = {}

# A task is returned, this provides additional information about the status of your task, such as
# any errors encountered
export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one


# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
    print(output.json_str)


if export_task.has_errors():
    export_task.get_stream(converter=lb.JsonConverter(),
                           stream_type=lb.StreamType.ERRORS).start(
                               stream_handler=lambda error: print(error))

if export_task.has_result():
    export_json = export_task.get_stream(
        converter=lb.JsonConverter(), stream_type=lb.StreamType.RESULT).start(
            stream_handler=json_stream_handler)

print(
    "file size: ",
    export_task.get_total_file_size(stream_type=lb.StreamType.RESULT),
)
print(
    "line count: ",
    export_task.get_total_lines(stream_type=lb.StreamType.RESULT),
)

## Export data rows from a Model 

#### Export V1 (deprecated): 
1. ```model_run.export_labels(downlaod=True)```
    - Parameters:  
        - ```download: bool = False```
        - ```timeout_seconds: int = 1800```
    - Output : (str | List[Dict[Any, Any]] | None)

In [None]:
# Single output from model_run.export_labels()
single_output_example = """
[
   {'ID': '1c48a7a0-3016-48e0-b0e3-47430f974869',
   'Data Split': 'training',
   'DataRow ID': 'clpqdyf650xd40712pycshy6a',
   'External ID': './resume/BANKING/99124477.pdf',
   'Labeled Data': 'https://storage.labelbox.com/cl5bn8qvq1av907xtb3bp8q60%2F8c6afc38-42a4-b2e1-a2e3-1e3b0c2998fc-99124477.pdf?Expires=1706637969726&KeyName=labelbox-assets-key-3&Signature=2nVt3sJ21CbjGS9I64yFquUELRw',
   'Media Attributes': {'assetType': 'pdf',
      'contentLength': 42535,
      'mimeType': 'application/pdf',
      'pageCount': 3,
      'subType': 'pdf',
      'superType': 'application'},
   'Label': {'objects': [{'featureId': 'b9f3b584-0f45-050a-88d4-39c2a169c8e1',
      'schemaId': 'clq1ckwbd08jp07z91q9mch5j',
      'title': 'Test',
      'value': 'test',
      'color': '#1CE6FF',
      'data': {'location': [{'text-bbox': {'page': 1,
            'top': 158.44,
            'left': 58.765,
            'height': 13.691,
            'width': 78.261}}],
         'unit': 'POINTS'}}],
      'classifications': [],
      'relationships': []}}
   ]
   """

#### Export V2
For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) documentation.

1. ```model_run.export()```: Starting from SDK version 3.56, a streamable method is available, this method allows you to stream unlimited number of data rows. However, if you are using an earlier version, you can still utilize the ```export_v2()``` function with identical parameters. It's important to note that the output task type differs, and streaming data methods are not included in `export_v2()`.

    - Required parameters:  
      - ```"data_row_details": True```
      - ```"project_details": True```
      - ```"label_details": True```
    - Required filters: 
      - N/A -> Filters not supported
    - Output: 
       - ```ExportTask```
            - `ExportTask.has_result()` return type:  bool 
            - `ExportTask.has_errors()` return type: bool
            - `ExportTask.get_stream()` return type: Stream[JsonConverterOutput]

In [None]:
MODEL_RUN_ID = ""
model_run = client.get_model_run(MODEL_RUN_ID)

In [None]:
export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "performance_details": True,
}

export_task = model_run.export(params=export_params)
export_task.wait_till_done()

In [None]:
# Provide results with JSON converter
# Returns streamed JSON output strings from export task results/errors, one by one


# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
    print(output.json_str)


if export_task.has_errors():
    export_task.get_stream(converter=lb.JsonConverter(),
                           stream_type=lb.StreamType.ERRORS).start(
                               stream_handler=lambda error: print(error))

if export_task.has_result():
    export_json = export_task.get_stream(
        converter=lb.JsonConverter(), stream_type=lb.StreamType.RESULT).start(
            stream_handler=json_stream_handler)

print(
    "file size: ",
    export_task.get_total_file_size(stream_type=lb.StreamType.RESULT),
)
print(
    "line count: ",
    export_task.get_total_lines(stream_type=lb.StreamType.RESULT),
)

## Export data rows from a video project
Video projects include additional fields. Please refer to the example below to extract specific fields from video exports.


##### Export V1 (deprecated)  
1. ```project.export_labels()``` 
    - Parameters:  
        - ```download: bool = False```
        - ```timeout_seconds: int = 1800```
    - Output : (str | List[Dict[Any, Any]] | None)

For a comprehensive example of Export V1 ``project.export_labels()`` output, please refer to our documentation: [Export V1 sample export](https://docs.labelbox.com/reference/export-image-annotations#sample-project-export)

2. ```project.video_label_generator()```
    - Parameters: 
        - ```timeout_seconds: int = 600```
    - Output: LabelGenerator

##### Export V2

For complete details on the supported filters and parameters, including how they are used and what information is included, please see the [Export overview](https://docs.labelbox.com/reference/label-export#optional-parameters-and-filters) documentation.

1. ```project.export()```: Starting from SDK version 3.56, a streamable method is available, this method allows you to stream unlimited number of data rows. However, if you are using an earlier version, you can still utilize the ```export_v2()``` function with identical parameters. It's important to note that the output task type differs, and streaming data methods are not included in `export_v2()`.

    - Required parameters:  
      - ```"attachments": True```
      - ```"data_row_details": True```
      - ```"project_details": True```
      - ```"label_details": True```
      - ```"performance_details": True```
    - Output: 
       - ```ExportTask```
            - `ExportTask.has_result()` return type:  bool 
            - `ExportTask.has_errors()` return type: bool
            - `ExportTask.get_stream()` return type: Stream[JsonConverterOutput]

In [None]:
VIDEO_PROJECT_ID = ""
project = client.get_project(VIDEO_PROJECT_ID)

In [None]:
export_params = {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
    "performance_details": True,
    "label_details": True,
    "interpolated_frames":
        True,  # For additional information on interpolated frames please visit our documentation https://docs.labelbox.com/docs/video-annotations#video-editor-components
}
filters = {}

# A task is returned, this provides additional information about the status of your task, such as
# any errors encountered
export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

Fetch frame specific objects and frame or global classifications

In [None]:
import json
import pprint as pp  # Assuming pp is imported from pprint module

frames_objects_class_list = []
global_class_list = []

stream = export_task.get_stream()
for output in stream:
    output_json = json.loads(output.json_str)
    for dr in output_json["projects"][VIDEO_PROJECT_ID]["labels"]:
        frames_data = dr["annotations"]["frames"]
        for k, v in frames_data.items():
            frames_objects_class_list.append({k: v})
        global_class_list.extend(dr["annotations"]["classifications"])

    print("------- Frame specific classifications and objects -------")
    pp.pprint(frames_objects_class_list)

    print("------ Global classifications -------")
    pp.pprint(global_class_list)

Fetch key frame feature map 

In [None]:
keyframe_map = []

stream = export_task.get_stream()
for output in stream:
    output_json = json.loads(output.json_str)
    labels = output_json["projects"][VIDEO_PROJECT_ID]["labels"]
    for label in labels:
        annotations = label["annotations"]["key_frame_feature_map"]
        for key, value in annotations.items():
            keyframe_map.append({key: value})

print("----- Keyframe Feature Map -----")
pp.pprint(keyframe_map)

Fetch segments 

In [None]:
segments_map = []
stream = export_task.get_stream()
for output in stream:
    output_json = json.loads(output.json_str)
    labels = output_json["projects"][VIDEO_PROJECT_ID]["labels"]
    for label in labels:
        annotations = label["annotations"]["segments"]
        for key, value in annotations.items():
            segments_map.append({key: value})

print("----- Segments Feature Map -----")
pp.pprint(segments_map)