# Archive Footprint File Search & Download with Capella API
This notebook utilizes the **Archive Export** endpoints in the Capella API to query the available images in the Capella archive on a week-by-week basis and download footprints of the entire archive or specific portions based on footprints file names.  
The documentation for these endpoints can be found in the **Catalog** section under the **Archive Export** heading: https://docs.capellaspace.com/api/catalog

The cells that require the user to input parameters prior to running are marked with `USER INPUT REQUIRED` in the header. Searching for `USER` in the notebook will also identify where the user must enter inputs.

* **Author:** [Sybrand van Beijma](mailto:sybrand.vanbeijma@capellaspace.com)
    * **Contributor(s):** [Hayley Pippin](mailto:hayley.pippin@capellaspace.com)
* **Last updated:** September 6, 2023
* **Required input(s):**
    * `credentials.json`: JSON containing the user's Capella Console credentials.
* **Output(s):**
    * `.gpkg`, `geoparquet`, or zipped `.shp` files of the requested archive footprints.

## Set Up

### `credentials.json`
Your username and password must be saved in a `.json` file named `credentials.json` and formatted as follows:
```
{"username": "yourusername","password": "xxxxxxxxx"}
```

### Install packages
The following cell **only needs to be run once** if packages are not already installed. Uncomment any of the following lines to install the necessary packages.

In [None]:
# !pip install requests
# !pip install json
# !pip install folium
# !pip install datetime
# !pip install urllib

### Import packages and define helper functions + API endpoints

In [None]:
import requests
import json
import pandas as pd
import re
from datetime import datetime as dt

# Function to view printed JSON files easier
def p(data):
    print(json.dumps(data, indent=2))
    
# Function to sort file names in alphanumeric order
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)
        
# Capella API endpoints
URL = 'https://api.capellaspace.com'
token = '/token'
collections = '/catalog/collections'
catsearch = '/catalog/search'
orders = '/orders/'

### Authentication (INITIAL USER INPUT REQUIRED)
This cell needs to be run hourly to re-authenticate with the Capella system.

In [None]:
# Load username and password
with open('PATH TO CREDENTIALS FILE HERE') as f: # USER: Input path to credentials.json file.
    data = json.load(f)
    username = data['username']
    password = data['password']

# Get a valid token from the auth service
r = requests.post("https://api.capellaspace.com/token", 
                  headers = {'Content-Type': 'application/x-www-form-urlencoded'}, auth=(username,password))
access_token = r.json()["accessToken"]
# p(access_token)

# GET user ID and org ID
headers = {'Authorization':'Bearer ' + access_token}
r = requests.get("https://api.capellaspace.com/user", headers=headers)
#p(r.json())

# Print user ID, org ID, and current environment
print('User email: ', r.json()['email'], '\nOrganization: ', r.json()['organization']['name'], '\nEnvironment: ', r.json()['apiEnvironmentRole'])

## Archive File Search

### Inspect Available Files

In [None]:
catalog_available = '/catalog/archive-export/available'

r = requests.get(URL + catalog_available, headers=headers)

# Print the list of results. OPTIONS:

# p(r.json()) # View ALL results

# p(r.json()["latest"]) # View LATEST results

# sorted_alphanumeric(r.json()["weekly"]) # View WEEKLY results

# View EARLIEST and MOST RECENT available week results
# print("Earliest available week: ", dt.strptime(re.split('/|_|\.' , sorted_alphanumeric(r.json()["weekly"])[0])[-2], "%Y%m%d"), "\nLatest available week:", dt.strptime(re.split('/|_|\.' , sorted_alphanumeric(r.json()["weekly"])[-2])[-2], "%Y%m%d")) 

p(r.json()["full"]) # View FULL results

### Generate List of Files to Download

In [None]:
# Define list of weekly archive sets
weekly = sorted_alphanumeric(r.json()["weekly"])

# Get files of particular type 
weekly_gpkg = [s for s in weekly if "gpkg" in s] # Weekly .gpkg files
# weekly_shp = [s for s in weekly if "shp" in s] # Weekly .shp.zip files

# Print last 4 weeks of files in list
weekly_gpkg[-4:]
# weekly_shp[-4:]

## Download Archive Files

### Subset Based on Specific Parameters (USER INPUT REQUIRED)

In [None]:
catalog_presigned = '/catalog/archive-export/presigned'

# USER: Define parameters for export. exportType is required.
params = {'exportType': 'weekly', # Options: latest, weekly, full
          #'exportFormat': 'gpkg', # Options: gpkg, shp.zip, geoparquet. fileNames should be EMPTY or COMMENTED OUT if this option is selected.
          'fileNames': weekly_gpkg[-4:] # Specify list of filenames. exportFormat should be EMPTY or COMMENTED OUT if this option is selected.
         }

r = requests.post(URL + catalog_presigned, headers=headers, json=params)

# Print response
p(r.json())

### Full Archive Files

In [None]:
catalog_presigned = '/catalog/archive-export/presigned'

params = {'exportType': 'full',
          'exportFormat': 'gpkg' # Options: gpkg, shp.zip, geoparquet       
         }

r = requests.post(URL + catalog_presigned, headers=headers, json=params)

# Print response
p(r.json())