<img src="https://cdn.comet.ml/img/notebook_logo.png">

# Install Comet

In [None]:
%pip install --upgrade comet_ml

In [None]:
import comet_ml

comet_ml.init(project_name="remote-artifacts")

# Fetch the Metadata File for the Dataset

For this guide, we're going to use the [DOTA](https://captain-whu.github.io/DOTA/dataset.html) dataset. DOTA is a collection of aerial images that have been collected from different sensors and platforms. 

The dataset has been uploaded to an S3 bucket. First let's download the metadata for this dataset from our S3 bucket.

In [None]:
!wget https://cdn.comet.ml/dota_split/DOTA_1.0.json

# Create an Artifact to Track the Data

First, lets define the class names present in this dataset

In [None]:
LABEL_CLASS_NAMES = [
    "plane",
    "baseball-diamond",
    "bridge",
    "ground-track-field",
    "small-vehicle",
    "large-vehicle",
    "ship",
    "tennis-court",
    "basketball-court",
    "storage-tank",
    "soccer-ball-field",
    "roundabout",
    "harbor",
    "swimming-pool",
    "helicopter",
]

Next, we're going to load in the metadata file that we've downloaded from our S3 bucket and format it in a way that allows us to track the URLs for the individual image assets in a Remote Artifact. We will also track the annotations as asset metadata.  

In [None]:
import json

base_url = "https://cdn.comet.ml/dota_split"
metadata_file = "./DOTA_1.0.json"

with open(metadata_file, "r") as f:
    dota_metadata = json.load(f)

In [None]:
annotation_map = {}
for annotation in dota_metadata["annotations"]:
    img_id = annotation["image_id"]

    annotation_map.setdefault(img_id, [])
    annotation_map[img_id].append(annotation)

In [None]:
artifact = comet_ml.Artifact(
    name="DOTA", artifact_type="dataset", metadata={"class_names": LABEL_CLASS_NAMES}
)

for image in dota_metadata["images"]:
    try:
        annotations = annotation_map[image["id"]]
        artifact.add_remote(
            f"{base_url}/images/{image['file_name']}",
            metadata={"annotations": annotations},
        )
    except Exception as e:
        continue

# Log the Artifact 

In [None]:
experiment = comet_ml.Experiment()
experiment.log_artifact(artifact)
experiment.end()