# Object Storage

DIAS instances store both the original Sentinel data (as downloaded from ESA Copernicus hub) and the processed CARD products on object storage. The CREODIAS, MUNDI and SOBLOO instances use S3 storage, which is similar to the storage solution used by Amazon Web Services (AWS). ONDA uses ENS. In all cases, there is a need to discover and transfer the required product from object storage to the disks attached to the VMs, in order to process the data.

S3 store access requires access credentials that are typically created at DIAS account setup. The credentials, together with the S3 host and bucket details need to be provided to the access protocol that is used to access the data, for instance, inside a python script that transfers the data to disk. 


In [1]:
# import required libraries for this Notebook
import os
import rasterio
from ipywidgets import widgets, HBox


In [2]:
# Note: Boto3 can not retrieve public buckets. If the bucket is public, enter
# the name of the bucket in the configuration file 'config/config.json'
# config_ui.all_configurations() # To display the widget for all the configurations.

dias = widgets.Dropdown(
    options=['CREODIAS', 'MUNDI', 'SOBLOO', 'ONDA', 'WEKEO', 'EOSC'],
    description='DIAS Provider: ')

bucket_name = widgets.Text(
        value='DIAS',
        placeholder='Bucket',
        description='Bucket:',
        disabled=False)

only_images = widgets.Checkbox(
    value=True,
    description='Show only images',
    disabled=False)

check_for_hdr = widgets.Checkbox(
    value=True,
    description='Check for header file',
    disabled=False)

reference = widgets.Text(
        value='',
        placeholder='Reference',
        description='Ref. or Prefix:',
        disabled=False)

bucket_options = widgets.VBox(children=[widgets.HBox([dias, bucket_name, reference]),
                                       widgets.HBox([only_images, check_for_hdr])])
bucket_options

# Use with: <variable name>.value
# To filter the results use a prefix (e.g. 'Sentinel-2/MSI/L1C/2018/12/08/', 'S2B/ ').
# Reference example:
# S2B_MSIL2A_20180903T105019_N0208_R051_T31TCG_20180903T171953.SAFE

VBox(children=(HBox(children=(Dropdown(description='DIAS Provider: ', options=('CREODIAS', 'MUNDI', 'SOBLOO', …

In [3]:
# Get list of files from the selected bucket, args: (bucket name, prefix).
from cbm.sources import object_storage
if reference.value != '':
    if '/' not in reference.value:
        if len(reference.value.split('_')) == 7 or len(reference.value.split('_')) == 11:
            try:
                s3path, file_info = object_storage.get_file_location(reference.value, dias.value)
                prefix = s3path
                print(1)
            except:
                print("Can not get information from the reference")
                prefix = reference.value
    else:
        prefix = reference.value
else:
    prefix = reference.value

print(prefix)
bucket_files = object_storage.list_files(bucket_name.value, prefix)

file_list = []
if bucket_files is not None:
    if only_images.value is True:
        for f in bucket_files:
            file = f['Key'].replace(prefix, '', 1)
            if file.lower().endswith(('.png', '.jpg', '.jpeg', '.tiff',
                                       '.bmp', '.gif', '.tif', '.img', 'jp2')):
                file_list.append(file)
        if len(file_list) == 0:
            print(f"! No files found. !")
        else:
            print("Select an image file from the dropdown list")

    else:
        for f in bucket_files:
            file = f['Key'].replace(prefix, '', 1)
            file_list.append(file)

# print(prefix)
file_list_widget = widgets.Dropdown(options=file_list, description='Select file: ')
file_list_widget

# Use with: file_list_widget.value

The file 'config/main.json' did not exist, a new default   file was created.

Could not retrieve list of the files from the selected buckets: Invalid endpoint: http://


Dropdown(description='Select file: ', options=(), value=None)

**Note:** The image to be recognized as 'geo' image, depending of the file type, it may need an extra headers file that contains the geographic information of the image, the headers information file usually have an .hdr extension.

In case it is needed to download the headers file, uncomment the lines of code for the headers file.

In [16]:
# A sample script to test downloading from s3 object storage.
# Select a new name for the file (e.g. 'sample_image').
file_name = "sample_image_from_object_storage"

# Get the selected image.
s3image = prefix + file_list_widget.value

# Name of the image file to be stored in data folder
localimg = f"temp/{file_name}.img"

print("-- File to be downloaded from object storage:\n", s3image, '\n')
print("-- The file will be stored in:\n", localimg)
print()

-- File to be downloaded from object storage:
 Sentinel-2/MSI/L2A/2018/08/01/S2B_MSIL2A_20180801T104019_N0206_R008_T31TCG_20180801T180307.SAFE/GRANULE/L2A_T31TCG_A007327_20180801T104716/IMG_DATA/R10m/T31TCG_20180801T104019_B03_10m.jp2 

-- The file will be stored in:
 /home/_jrc_dias/data/sample_image_from_object_storage.img



In [17]:
# Download image and image information file
object_storage.get_file(s3image, localimg, bucket_name.value)

# Get the selected image header file (Comment if not applicable).
if check_for_hdr.value is True:
    try:        
        s3header = os.path.splitext(s3image)[0] + ".hdr"
        localhdr = config_ui.folder_data + f"{file_name}.hdr"
        object_storage.get_file(s3header, localhdr, bucket_name.value)
    except Exception:
        print("No header file found.")


-Downloading to local file-


HBox(children=(FloatProgress(value=0.0, max=82752422.0), HTML(value='')))

File downloaded as:  /home/_jrc_dias/data/sample_image_from_object_storage.img

No header file found.


**Run the below cell to check if the image is correctly downloaded and recognized as an geo image file.**

In [18]:
# Get information for the downloaded raster image
with rasterio.open(localimg) as src:
    print(src.width, src.height)
    print(src.crs)
    print(src.transform)
    print(src.count)
    print(src.indexes)

10980 10980
EPSG:32631
| 10.00, 0.00, 300000.00|
| 0.00,-10.00, 4700040.00|
| 0.00, 0.00, 1.00|
1
(1,)
