# Example of how to use the S3List Utility

This notebook demonstrates the `helper.links.S3Links` class to get S3 links for the test files 

In [1]:
import sys
import os
import pprint

current = os.path.abspath('..')
sys.path.append(current)

#from helpers.dataset_lists import BEAM_GROUP
from helpers.links import S3Links, glob_s3bucket

Instantiate the class.  This loads paths into the S3 class

In [2]:
s3links = S3Links()

The different formats available are listed using the `formats` attribute.

In [3]:
s3links.formats

['h5repack', 'original']

S3 links for test files for a given format are returned as a list using the `get_links_by_format` method.

In [4]:
s3links.get_links_by_format('h5repack')

['h5cloud/h5repack/ATL03_20181120182818_08110112_006_02_repacked.h5',
 'h5cloud/h5repack/ATL03_20190219140808_08110212_006_02_repacked.h5',
 'h5cloud/h5repack/ATL03_20200217204710_08110612_006_01_repacked.h5',
 'h5cloud/h5repack/ATL03_20211114142614_08111312_006_01_repacked.h5',
 'h5cloud/h5repack/ATL03_20230211164520_08111812_006_01_repacked.h5']

The S3 link for a given file can be found using `get_link_by_name`

In [5]:
s3links.get_link_by_name('ATL03_20181120182818_08110112_006_02_repacked.h5')

'h5cloud/h5repack/ATL03_20181120182818_08110112_006_02_repacked.h5'

Links can also be returned for a given format by id.

In [6]:
s3links.get_link_by_fileid('original', 0)

'h5cloud/original/ATL03_20181120182818_08110112_006_02.h5'

The links can be updated using `update_links`.  Links are updated directly from the S3 bucket with the test files.  This can be used as is but the `s3filelinks.json` file is updated - just answer `y` to the prompt.

In [7]:
s3links.update_links()

Differences between self.table and S3 buckets: updating self.table


Update ../helpers/s3filelinks.json (y or n)? y


Updating ../helpers/s3filelinks.json


Running `s3links.formats` shows the updated files.

In [8]:
s3links.formats

['geoparquet',
 'h5repack',
 'kerchunk-original',
 'kerchunk-repacked',
 'original']

In [10]:
s3links.table

{'geoparquet': {'ATL03_20181120182818_08110112_006_02.h5.gpq': 's3://nasa-cryo-scratch/h5cloud/geoparquet/ATL03_20181120182818_08110112_006_02.h5.gpq',
  'ATL03_20190219140808_08110212_006_02.h5.gpq': 's3://nasa-cryo-scratch/h5cloud/geoparquet/ATL03_20190219140808_08110212_006_02.h5.gpq',
  'ATL03_20200217204710_08110612_006_01.h5.gpq': 's3://nasa-cryo-scratch/h5cloud/geoparquet/ATL03_20200217204710_08110612_006_01.h5.gpq',
  'ATL03_20211114142614_08111312_006_01.h5.gpq': 's3://nasa-cryo-scratch/h5cloud/geoparquet/ATL03_20211114142614_08111312_006_01.h5.gpq',
  'ATL03_20230211164520_08111812_006_01.h5.gpq': 's3://nasa-cryo-scratch/h5cloud/geoparquet/ATL03_20230211164520_08111812_006_01.h5.gpq'},
 'h5repack': {'ATL03_20181120182818_08110112_006_02_repacked.h5': 's3://nasa-cryo-scratch/h5cloud/h5repack/ATL03_20181120182818_08110112_006_02_repacked.h5',
  'ATL03_20190219140808_08110212_006_02_repacked.h5': 's3://nasa-cryo-scratch/h5cloud/h5repack/ATL03_20190219140808_08110212_006_02_repac