# Moving OCO-3 and OCO-2 COGs

This notebook moves COGs from the [OCO3-data-transformer](https://github.com/EarthDigitalTwin/OCO3-data-transformer) project into the VEDA staging bucket so they can be published to the VEDA staging catalog and previewed in the VEDA UI.

## Step 1: Setup

### Import necessary libraries.

In [1]:
import boto3
import concurrent.futures
import os
from concurrent.futures import as_completed
from osgeo import gdal

gdal.DontUseExceptions()

### Initiate the S3 Client

In [2]:
s3_client = boto3.client("s3")

### Set constants

The constants below are not expected to change.

In [4]:
source_bucket = "sdap-dev-zarr"
source_dir = "OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog"
# Set the number of worker processes
NUM_WORKERS = 4
destination_bucket = "veda-data-store-staging"
collection_prefix_collection_id_map = {
    "oco3-co2": "oco3-co2-sam-l3-cogs",
    "oco2-co2": "oco2-co2-target-cogs",
    "oco3-sif": "oco3-co2-sif-cogs",
}

The constants below are expected to change depending on what collection we are working on (oco3 co2, oco2 co2 or oco3 sif) and how many files we want to upload (set to lower than the total for testing).

In [5]:
collection = "oco3-sif"
prefix = f"{source_dir}/{collection}"
object_limit = 100

## Step 2: List all objects for this collection in the source bucket + directory

In [6]:
all_objects = []
continuation_token = None

while all_objects == [] or continuation_token is not None:
    args = dict(Bucket=source_bucket, Prefix=prefix)
    if continuation_token is not None:
        args["ContinuationToken"] = continuation_token
    response = s3_client.list_objects_v2(**args)
    [all_objects.append(obj["Key"]) for obj in response.get("Contents", [])]
    continuation_token = response.get("NextContinuationToken", None)

### Sanity checks

In [7]:
len(all_objects)

2832

In [8]:
all_objects[0]

'OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif'

## Step 3: Define process and copy operations

The `copy_object` function copies the original file to the destination bucket and directory.

The `process_file` function checks to make sure there are valid pixels in the file. We do this check before moving a file to the staging location because having invalid files in the `veda-data-store-staging` bucket will cause publication for all files in that bucket to fail (at time of writing). So we skip files that fail this check.

In [12]:
def copy_object(object_key: str):
    copy_source = {"Bucket": source_bucket, "Key": object_key}
    destination_file_name = object_key.split("/")[-1]
    collection_prefix = destination_file_name.split("_")[0]
    destination_dir = collection_prefix_collection_id_map[collection_prefix]
    destination_object_key = f"{destination_dir}/{destination_file_name}"
    s3_client.copy_object(
        CopySource=copy_source, Bucket=destination_bucket, Key=destination_object_key
    )
    print(
        f"Copied: s3://{source_bucket}/{object_key} to s3://{destination_bucket}/{destination_object_key}"
    )

def process_file(object_key: str):
    """Function to process a single file."""
    dataset = gdal.Open(f"/vsis3/{source_bucket}/{object_key}", gdal.GA_ReadOnly)

    if dataset is None:
        return file_path, "Failed to open"

    band = dataset.GetRasterBand(1)
    _ = band.GetStatistics(True, True)

    # Check for errors
    err_msg = gdal.GetLastErrorMsg()

    if "Failed to compute statistics, no valid pixels found in sampling" in err_msg:
        return dict(
            object_key=object_key,
            error_message=f"GDAL Error: {err_msg}",
            status="error",
        )
    else:
        copy_object(object_key=object_key)
        return dict(object_key=object_key, error_message=None, status="success")



## Step 4: Test on one file

In [13]:
first_object_key = all_objects[0]
process_file(first_object_key)

Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif




{'object_key': 'OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif',
 'error_message': None,
 'status': 'success'}

### Optional: Check the file was moved

In [14]:
!aws s3 ls s3://veda-data-store-staging/oco3-co2-sif-cogs/ #| wc -l

2025-03-27 22:45:27     125814 oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif


## Step 5: Run process + copy for all files.

In [None]:
def main():
    failures = []
    with concurrent.futures.ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor:
        futures = {
            executor.submit(process_file, object_key): object_key
            for object_key in all_objects[0:object_limit]
        }

        for idx, future in enumerate(as_completed(futures)):
            if idx % 100 == 0:
                print(f"processed file {idx}")
            result = future.result()
            if result["status"] == "error":
                object_key = result["object_key"]
                failures.append((object_key, result))
                # print(f"{object_key}: {result['error_message']}")

    # Save results to a log file
    with open("failed_files.log", "w") as f:
        for object_key, result in failures:
            f.write(f"{object_key}\n")


main()

Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2019-10-10T190148Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-02-02T211516Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-02-02T211516Z_unfiltered_Daily_SIF_757nm.tif
processed file 0
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-02-10T180738Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-02-10T180738Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2019-10-10T190148Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-02-29T173750Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-02-29T173750Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-02-10T180738Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-02-10T180738Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-05T202057Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-05T202057Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-02-29T173738Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oc



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-13T171716Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-13T171716Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-05T202057Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-05T202057Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-13T171716Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-13T171716Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-17T154551Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-17T221622Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-17T221622Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-17T221611Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-17T221611Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-21T204250Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-21T204250Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-21T204250Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oc



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-29T173429Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-29T173429Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-04-29T173429Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-04-29T173429Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-05-03T160011Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_cal001_2020-05-03T160011Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_cal001_2020-05-03T160011Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oc



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2019-10-09T103320Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2019-10-09T103320Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2019-10-09T103320Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2019-10-09T103320Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2019-10-13T085540Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2019-10-13T085540Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2019-10-13T085540Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-02-09T093759Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-02-09T093759Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-02-09T093759Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-02-09T093759Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-02-22T103920Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-02-22T103920Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-02-22T103920Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-03-30T140959Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-03-30T140959Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-03-30T141003Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-03-30T141003Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-08T101838Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-08T101838Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-08T101838Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-12T084648Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-12T084648Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-12T084648Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-12T084648Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-16T071458Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-16T071458Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-17T125746Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/o



Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-16T071458Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-16T071458Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-17T125746Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-17T125746Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-29T081617Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-29T081617Z_filtered_Daily_SIF_757nm.tif




Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-04-29T081617Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-04-29T081617Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-05-03T064158Z_unfiltered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon100_2020-05-03T064158Z_unfiltered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon101_2019-09-24T124619Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-staging/oco3-co2-sif-cogs/oco3-sif_coccon101_2019-09-24T124619Z_filtered_Daily_SIF_757nm.tif
Copied: s3://sdap-dev-zarr/OCO3/outputs/veda/demo-2024.10.28-target/SIMULTEST_TFP_cog/oco3-sif_coccon100_2020-05-03T064159Z_filtered_Daily_SIF_757nm.tif to s3://veda-data-store-stagi

