<div style='text-align:center;'>
<figure><img src='https://raw.githubusercontent.com/wekeo/wekeo4data/main/img/LogoWekeo_Copernicus_RGB_0.png' alt='Logo EU Copernicus WEkEO' align='right' width='20%'>
</figure>
</div>

<h1><center><code>How To Download WEkEO Data</code></center></h1>

Follow the next few steps to download data from WEkEO via the __HDA API__.  
Please check the following article to get further details: 
- [What is the HDA API Python Client and how to use it?](https://help.wekeo.eu/en/articles/6751608-what-is-the-hda-api-python-client-and-how-to-use-it)
- [How to download WEkEO data?](https://help.wekeo.eu/en/articles/6416936-how-to-download-wekeo-data)
- [Official documentation of HDA API](https://hda.readthedocs.io/en/latest/usage.html)

## Step 1. Install the latest version of `hda`

You can run the next cell to install the latest version of `hda`:

In [1]:
!pip install hda -U

Defaulting to user installation because normal site-packages is not writeable
Collecting hda
  Downloading hda-1.15.tar.gz (13 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting tqdm
  Downloading tqdm-4.66.1-py3-none-any.whl (78 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.3/78.3 KB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: hda
  Building wheel for hda (setup.py) ... [?25ldone
[?25h  Created wheel for hda: filename=hda-1.15-py3-none-any.whl size=13948 sha256=ccab912fc2e215804d2db61e8319f40426cb50925e66dfa70297e75da252b9ab
  Stored in directory: /home/jp/.cache/pip/wheels/fc/f4/c1/3966e0fc4c89b122365b9461c257a2d39fb4ea604227261b5b
Successfully built hda
Installing collected packages: tqdm, hda
Successfully installed hda-1.15 tqdm-4.66.1


*__Note__: version used in this notebook is `1.15`*.

## Step 2. Import `hda` module

The HDA Client provides a fully compliant Python3 Client that can be used to search and download products using the Harmonized Data Access WEkEO API. First let's import the `hda` functions:

In [1]:
from hda import Client, Configuration

## Step 3. Configure credentials and load `hda` Client

### Method 1 (not regular users)

Pass your credentials directly in the script:

In [3]:
# Configure your credentials without a .hdarc file
conf = Configuration(user = "zapoteca", password = "Dzuliferi1")
hda_client = Client(config = conf)

### Method 2 (regular users)

If you have not yet created your `.hdarc` file to allow **auto-login process**, you can execute this cell (otherwise disregard it):

In [2]:
from pathlib import Path

hdarc = Path(Path.home()/'.hdarc')
if not hdarc.is_file():
    import getpass
    USERNAME = input('Enter your username: ')
    PASSWORD = getpass.getpass('Enter your password: ')

    with open(Path.home()/'.hdarc', 'w') as f:
        f.write('url: https://wekeo-broker.apps.mercator.dpi.wekeo.eu/databroker\n')
        f.write(f'user: {USERNAME}\n')
        f.write(f'password:{PASSWORD}\n')
else:
    print('Configuration file already exists.')
    
hda_client = Client()

Configuration file already exists.


## Step 4. Create the request and download data

### Get the dataset metadata

Here we are going to download the following Copernicus Land dataset: __EO:EEA:DAT:CLMS_HRVPP_VPP__.

To create our request we can ask to the API what parameters are needed.
To do so we use the `metadata()` function:

In [5]:
help(hda_client.metadata)

Help on method metadata in module hda.api:

metadata(dataset_id) method of hda.api.Client instance
    Returns the metadata object for the given dataset.
    
    :param dataset_id: The dataset ID
    :type dataset_id: str



In [4]:
# Request metadata of a dataset
hda_client.metadata(dataset_id="EO:ESA:DAT:SENTINEL-1:SAR")

{'datasetId': 'EO:ESA:DAT:SENTINEL-1:SAR',
 'parameters': {'boundingBoxes': [{'comment': 'Bounding Box',
    'details': {'crs': 'EPSG:4326', 'extent': []},
    'isRequired': False,
    'label': 'Bounding Box',
    'name': 'bbox'}],
  'dateRangeSelects': [{'comment': 'Sensing Start / Stop Time',
    'details': {'defaultEnd': None,
     'defaultStart': '2014-10-06T00:00:00Z',
     'end': None,
     'start': '2014-10-06T00:00:00Z'},
    'isRequired': True,
    'label': 'Sensing Start / Stop Time',
    'name': 'position'}],
  'multiStringSelects': None,
  'stringChoices': [{'comment': 'swath',
    'details': {'valuesLabels': {'EN': 'EN',
      'EW': 'EW',
      'EW1': 'EW1',
      'EW2': 'EW2',
      'EW3': 'EW3',
      'EW4': 'EW4',
      'EW5': 'EW5',
      'IS1': 'IS1',
      'IS2': 'IS2',
      'IS3': 'IS3',
      'IS4': 'IS4',
      'IS5': 'IS5',
      'IS6': 'IS6',
      'IS7': 'IS7',
      'IW': 'IW',
      'IW1': 'IW1',
      'IW2': 'IW2',
      'IW3': 'IW3',
      'N1': 'N1',
    

In [3]:
area_1 = [-53.674881, -5.795006, -53.342956, -5.564997]
area_2 = [-53.689, -6.290, -53.325, -6.037]
area_3 = [-51.806314, -5.794394, -51.541333, -5.523872]

date_min = "2021-10-06T00:00:00.000Z"
date_max = "2021-11-07T00:00:00.000Z"

## Create the request

Based on this information we can create the request below.

<div class="alert alert-block alert-info">
    📌 <b>Note</b>: to learn how to get your query from the Data Viewer, please check <a href="https://help.wekeo.eu/en/articles/6416936-how-to-download-wekeo-data#h_85849dcd7a">this article</a>.
</div>

In [5]:
query = {
  "datasetId": "EO:ESA:DAT:SENTINEL-1:SAR",
  "boundingBoxValues": [
    {
      "name": "bbox",
      "bbox": area_1
    }
  ],
  "dateRangeSelectValues": [
    {
      "name": "position",
      "start": "2021-10-06T00:00:00.000Z",
      "end": "2021-11-07T00:00:00.000Z"
    }
  ],
  "stringChoiceValues": [
    {
      "name": "productType",
      "value": "SLC"
    }
  ]
}

<div class="alert alert-block alert-info">
    📌 <b>Note</b>: the geographical coordinates in the <code>bbox</code> are ordered as: <code>[longitude_min, latitude_min, longitude_max, latitude_max]</code>
</div>

## Search data

The `search()` function launches the search of the data you requested with the specific parameters. It may take some time, as the server processes it.

In [6]:
matches = hda_client.search(query)
print(matches)

SearchResults[items=3,volume=0,jobId=Fye3c2LYFULWzyY26h2ltAwSPTY]


In [8]:
print(matches[0].results)

[{'downloadUri': None, 'extraInformation': {'cloudCover': 0, 'footprint': {'coordinates': [[[[-50.86348, -6.379965], [-50.511562, -4.750779], [-52.761547, -4.253418], [-53.120296, -5.876746], [-50.86348, -6.379965]]]], 'type': 'MultiPolygon'}}, 'filename': 'S1A_IW_SLC__1SDV_20211102T090717_20211102T090744_040390_04C99F_964D.zip', 'order': None, 'productInfo': {'datasetId': 'EO:ESA:DAT:SENTINEL-1:SAR', 'product': 'S1A_IW_SLC__1SDV_20211102T090717_20211102T090744_040390_04C99F_964D.SAFE', 'productEndDate': '2021-11-02T09:07:44Z', 'productStartDate': '2021-11-02T09:07:17Z'}, 'size': 0, 'url': 'fe25788f-bf2e-5f4d-bd98-367c736975ca/S1A_IW_SLC__1SDV_20211102T090717_20211102T090744_040390_04C99F_964D.zip'}]


We can see that we can download **12 items**, for a total **volume of 1.1 GB**.

## Download file(s)

On WEkEO's JupyterHub you are limited to 20GB of stockage space, so be careful of the total size of files your request generated.  
Follow one of these two options to download your files:  
- __Option 1__: in your current working directory __for data <20GB__
- __Option 2__: in a S3 bucket __without data size limit__

<div class="alert alert-block alert-info">
    📌 <b>Note</b>: you need to <a href="https://help.wekeo.eu/en/articles/6344723-registration-and-offer-plans">upgrade your plan</a> to have a tenant where you can <a href="https://help.wekeo.eu/en/articles/6618276-how-to-create-and-access-s3-buckets-on-wekeo">create S3 buckets</a>. 
</div>

### Option 1

You can run `matches.download()` to download all the files of your request.  
Please [read the documentation](https://hda.readthedocs.io/en/latest/usage.html#advanced-client-usage) for advanced usage such as:
- downloading first result: `matches[0].download()`
- downloading last result: `matches[-1].download()`
- downloading first 10 results: `matches[:10].download()`
- downloading even results: `matches[::2].download()`
- etc.

For the purpose of this example, we are going to fetch the last result:

In [9]:
OUTPUT_PATH = '/media/jp/FreeAgent GoFlex Drive/SAR'
matches[0].download(OUTPUT_PATH)

                                                       

The `download()` function launches the download of the file(s) your request generated. They will be downloaded in the same folder as this notebook unless you specify an existing directory as `OUTPUT_PATH`.

### Option 2

In order to save your files in a S3 bucket, you will first download them in a temporary folder:

In [8]:
# Create your temporary folder
import pathlib
OUTPUT_DIR = f"/tmp/folder_for_bucket"
pathlib.Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)

In [9]:
# Download all files in this new folder
matches.download(OUTPUT_DIR)

2022-12-22 11:36:42,470 INFO Downloading https://wekeo-broker.apps.mercator.dpi.wekeo.eu/databroker/dataorder/download/H1lYFxiGzXjvGNXbBpW2Q0Ls7xo to VPP_2020_S2_T29TMH-010m_V101_s1_TPROD.tif (27.1M)
2022-12-22 11:36:46,822 INFO Download rate 6.2M/s   
2022-12-22 11:36:53,927 INFO Downloading https://wekeo-broker.apps.mercator.dpi.wekeo.eu/databroker/dataorder/download/jX-BJKThm5nsJ4t1JBHGD5fxXH8 to VPP_2020_S2_T29TMJ-010m_V101_s1_TPROD.tif (927.6K)
2022-12-22 11:36:57,661 INFO Download rate 248.5K/s
2022-12-22 11:37:09,150 INFO Downloading https://wekeo-broker.apps.mercator.dpi.wekeo.eu/databroker/dataorder/download/zZk3TkCH3n-Bmwz1xrCnRya7cjY to VPP_2020_S2_T29TNH-010m_V101_s1_TPROD.tif (194.8M)
2022-12-22 11:37:16,993 INFO Download rate 24.8M/s 
2022-12-22 11:37:23,730 INFO Downloading https://wekeo-broker.apps.mercator.dpi.wekeo.eu/databroker/dataorder/download/csP3IS2QiuF-m3DJPbSPFh9ib_s to VPP_2020_S2_T29TNJ-010m_V101_s1_TPROD.tif (49.1M)
2022-12-22 11:37:30,580 INFO Download rat

Once the files are downloaded in the temporary folder, change the following parameters according to your own bucket.  
To find these informations, go to [Infrastructure > Storage](https://morpheus.dpi.wekeo.eu/infrastructure/storage/buckets) and click on the bucket of your choice (or add one if you don't have any yet).

<div style='text-align:center;'>
<figure><img src="https://i.imgur.com/YyKyNhx.png">
</figure>
</div>

<div class="alert alert-block alert-info">
    📌 <b>Note</b>: you will find the secret key in the answer of the WEkEO User Support.
</div>

In [10]:
import getpass

# Change these parameters for your own bucket
bucketname = "firstbucket"
aws_access_key_id = input('Enter your access key id: ')
aws_secret_access_key = getpass.getpass('Enter your secret access key: ')
endpoint_url = input('Enter your endpoint URL: ')

# Name of the output folder in your bucket
bucket_folder = "My_data"

Finally, run this cell to write all the downloaded files from the temporary folder to your bucket: 

In [12]:
# Import modules
import os
import boto3
import urllib3

# Open boto3 connection
session=boto3.session.Session()

# Connect to your bucket
s3_client = session.client(
    service_name='s3',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    endpoint_url=endpoint_url,
)

# Save files into your bucket folder
for root,dirs,files in os.walk(OUTPUT_DIR):
    for file in files:
        s3_client.upload_file(os.path.join(root,file), bucketname, os.path.join(bucket_folder,file))

## Additional Information
---

#### Compatible Data Science Toolkits

In [13]:
import pkg_resources; pkg_resources.get_distribution("hda").version

'1.15'

#### Last Modified and Tested

In [14]:
from datetime import date; print(date.today())

2023-04-12


<img src='https://github.com/wekeo/ai4EM_MOOC/raw/04147f290cfdcce341f819eab7ad037b95f25600/img/ai4eo_logos.jpg' alt='Logo EU Copernicus WEkEO' align='center' width='100%'></img>