# sat-stac: Working with catalogs

This notebook contains examples of using satstac to work with STAC catalogs.

**Table of contents**

- Working with existing catalogs
- Creating new catalogs
    - Adding catalogs to catalogs
    - Adding collections to catalogs
    - Adding items to collections
- Views (sub-catalogs)
- Publishing catalogs


The examples here use the [test catalog in the sat-stac repo](https://github.com/developmentseed/sat-stac/tree/master/test/catalog). The directory structure of the test catalog looks like the following, where the catalog.json files under the landsat-8-l1 and sentinel-2-l1c are `Collection`s, the rest of the catalog.json files are simple `Catalog`s, and the item.json files are `Item`s.

**Test Catalog structure:**

```
catalog
├── catalog.json
└── eo
    ├── catalog.json
    ├── landsat-8-l1
    │   ├── catalog.json
    │   └── item.json
    └── sentinel-2-l1c
        ├── catalog.json
        └── sentinel-2a
            ├── catalog.json
            └── item.json
```


## Working with existing catalogs<a name='existing' />

Existing catalogs can be opened and traversed. Note that a Collection is also a type of Catalog. First, open up a catalog, in this case a root catalog.

In [1]:
from satstac import Catalog

# open a root catalog
cat = Catalog.open('test/catalog/catalog.json')

# create pretty printer for later use
import pprint
pp = pprint.PrettyPrinter(indent=4)

The entire STAC tree can be traversed by following links. STAC links have a "rel" type, and several of the rel types are used to point to other members of the catalog:

- **root**: All Catalogs, Collections and Items have a root link, even a root catalog (to itself).
- **parent**: All Catalogs (and Collections), other than a root catalog, and Items have a parent link to it's parent catalog.
- **child**: Catalogs (and Collections) may have child links that point to other Catalogs.
- **collection**: Items have a link pointing to it's Collection.

sat-stac makes traversing a Catalog easy by not just following the links, but instantiating them as objects of the proper type.

In [2]:
# get first (and only in this case) sub-catalog
subcat = [c for c in cat.children()][0]

# print some IDs
print("Root Catalog: ", cat.id)
print("Sub Catalog: ", subcat.id)
print("Sub Catalog parent: ", subcat.parent().id)

# iterate through child catalogs of the sub-catalog
print("Sub Catalog children:")
for child in subcat.children():
    print('    ', child.id)

Root Catalog:  stac
Sub Catalog:  stac-eo
Sub Catalog parent:  stac
Sub Catalog children:
     sentinel-2-l1c
     landsat-8-l1


Being able to traverse a STAC catalog is useful, but it's more work to follow all the links in order to find anything. More useful is sat-stac's generator functions to get all other Catalogs, Collections, or Items that are within a Catalog in a recursive manner.

In [3]:
all_catalogs = cat.catalogs()
all_collections = cat.collections()
all_items = cat.items()

print(all_catalogs)
print(all_collections)
print(all_items)

<generator object Catalog.catalogs at 0x7f119a70bf68>
<generator object Catalog.collections at 0x7f119a70bfc0>
<generator object Catalog.items at 0x7f119a6b6048>


Because these are generator functions they do not return a list of of objects, but a generator that can be iterated through with a for loop. Using generator functions is essential because a Catalog could contain a lot of items.

In [4]:
print('**Catalogs**')
for c in cat.catalogs():
    print(c.id)

print('\n**Collections**')
for c in cat.collections():
    print(c.id)
    
print('\n**Items**')
for i in cat.items():
    print(i.id)

**Catalogs**
sentinel-2-l1c
sentinel-2a
landsat-8-l1
stac-eo

**Collections**
sentinel-2-l1c
landsat-8-l1

**Items**
L1C_T53MNQ_A017245_20181011T011722
LC08_L1GT_120046_20181012_20181012_01_RT


## Creating new catalogs<a name='create' />

New catalogs can be created with a STAC Catalog JSON (as a Python dictionary), or by using the create() function. The STAC version field (stac_version) will automatically be inserted into the Catalog based on the version of sat-stac.

STAC hierarchy links for the catalog (self, root, parent, child, collection, item) should not be provided (they will just be removed), since they will be created automatically as the catalog is created and sub-catalogs, collections, or items are added.

The root keyword will specify this as the root catalog (there must be a root catalog)

In [5]:
# create a Catalog object with JSON
cat_json = {
    "id": "mycat",
    "description": "My shiny new STAC catalog"
}
mycat = Catalog(cat_json)

# or, use the create() function
desc = 'My shiny new STAC catalog'
mycat = Catalog.create(id='mycat', description=desc, root='https://my.cat')

print(mycat.id)

mycat


When a new Catalog is created like this it does not yet exist on file. Since a hierarchical catalog doesn't make much sense unless it's stored somewhere, the Catalog will need to be saved before other Catalogs, Collections, or Items are added to it.

The save_as() function can be used to save it as a new file.

In [6]:
# save as a root catalog
mycat.save_as('mycat/catalog.json')

print(mycat.id)
# the filename is then stored with the object
print(mycat.filename)
print(mycat.path)

mycat
mycat/catalog.json
mycat


### Adding catalogs to catalogs<a name='addcat' />

Once a Catalog has been saved, other Catalogs can be added to it, and they will automatically be saved and all the necessary links in all affected objects will be created.

In [7]:
# add a new catalog to a root catalog
cat_json = {
    "id": "mykitten",
    "description": "A child catalog of my shiny new STAC catalog"
}

kitten = Catalog(cat_json)
print('Child catalog filename before adding: ', kitten.filename)

mycat.add_catalog(kitten)
print('Child catalog filename after adding: ', kitten.filename)

Child catalog filename before adding:  None
Child catalog filename after adding:  mycat/mykitten/catalog.json


A Catalog can have any number of child catalogs, and those can also have any number of child catalogs, and so on. These allow a data provider to partition the data in any way they wish. Data could be broken down by catalogs according to type, country, source, etc.

### Adding collections to catalogs<a name='addcol' />

Since a Collection is a Catalog, they can be added the same way. A Collection has additional fields over a normal Catalog that define the set of Items that fall under it. See the [Landsat example collection](https://github.com/developmentseed/sat-stac/blob/master/test/catalog/landsat-8-l1/catalog.json).

In [8]:
from satstac import Collection

# open the Landsat collection
collection = Collection.open('test/catalog/eo/landsat-8-l1/catalog.json')
print('Collection name: ', collection)

# add it to the child catalog created above
kitten.add_catalog(collection)
print('Collection filename: ', collection.filename)

print('\n**Collection links**')
pp.pprint(collection.data['links'])

Collection name:  landsat-8-l1
Collection filename:  mycat/mykitten/landsat-8-l1/catalog.json

**Collection links**
[   {   'href': 'https://my.cat/mykitten/landsat-8-l1/catalog.json',
        'rel': 'self'},
    {'href': '../../catalog.json', 'rel': 'root'},
    {'href': '../catalog.json', 'rel': 'parent'}]


### Adding items to collections<a name='additem' />

Since Items must belong in a Collection, they can be added to a Collection in a similar way that Collections or Catalogs are added to Catalogs.

In [9]:
from satstac import Item

# open a Landsat item
item = Item.open('test/catalog/eo/landsat-8-l1/item.json')
print('Item name: ', collection)

# add it to the collection created above
collection.add_item(item)
print('Item filename: ', item.filename)

print('\n**Item links**')
pp.pprint(item.data['links'])

Item name:  landsat-8-l1
Item filename:  mycat/mykitten/landsat-8-l1/LC08_L1GT_120046_20181012_20181012_01_RT.json

**Item links**
[   {   'href': 'https://my.cat/mykitten/landsat-8-l1/LC08_L1GT_120046_20181012_20181012_01_RT.json',
        'rel': 'self'},
    {'href': '../../catalog.json', 'rel': 'root'},
    {'href': 'catalog.json', 'rel': 'parent'},
    {'href': 'catalog.json', 'rel': 'collection'}]


## Views (sub-catalogs)<a name='views' />

It is often desirable to organize items by sub-catalogs according to some properties, rather than having all be together in it's Collection. For instance [Landsat on AWS](https://landsatonaws.com/) organizes data first by path, then row (path and row represent geographic location), then by date. This can be looked at as a series of catalogs, or a view of the Items.

```
root catalog
├── landsat-8 catalog (collection)
    └── path 170 catalog
        ├── row 120 catalog
            ├── 2018-10-31 catalog
    ├── path 171 catalog
        ├── row 120 catalog
            ├── 2018-10-31 catalog
```

Luckily, with sat-stac, each of the sub-catalogs does not need to be created manually. When an Item is added to a Collection, a string pattern for the path and filename to the Item can be provided. Any property field in the Item (or Collection if using the [Commons extension](https://github.com/radiantearth/stac-spec/tree/master/extensions/commons)) can be provided, and it will be substituted for value of that property in the Item.

In addition to the item's properties, there are two additional fields that may be used in the patterns:

- id: The id of the item
- date: The datetime property with the time portion stripped off

The `path` provided indicates the sub-catalogs that will be used, while the `filename` provided indicates the relative filename of the Item to it's parent catalog. In this example `path` is `${landsat:path}/${landsat:row}` which means sub-catalogs are created for each Landsat 'path' which contains catalogs for each Landsat 'row'. Each Landsat 'row' catalog in turns contains `item` links with the name `${date}/${id}`.json.

In [10]:
# save 
path = '${landsat:path}/${landsat:row}'
filename = '${date}/${id}'

collection.add_item(item, path=path, filename=filename)
print('Item filename: ', item.filename)

print('\n**Item links**')
pp.pprint(item.data['links'])

Item filename:  mycat/mykitten/landsat-8-l1/120/46/2018-10-12/LC08_L1GT_120046_20181012_20181012_01_RT.json

**Item links**
[   {   'href': 'https://my.cat/mykitten/landsat-8-l1/120/46/2018-10-12/LC08_L1GT_120046_20181012_20181012_01_RT.json',
        'rel': 'self'},
    {'href': '../../../../../catalog.json', 'rel': 'root'},
    {'href': '../catalog.json', 'rel': 'parent'},
    {'href': '../../../catalog.json', 'rel': 'collection'}]


Running the above code with the path set to path/row/date will result in a catalog tree with the following structure.

```
mycat/
├── catalog.json
└── mykitten
    ├── catalog.json
    └── landsat-8-l1
        ├── 120
        │   ├── 46
        │   │   ├── 2018-10-12
        │   │   │   └── LC08_L1GT_120046_20181012_20181012_01_RT.json
        │   │   └── catalog.json
        │   └── catalog.json
        ├── catalog.json
```

## Publishing a catalog<a name='publish' />

The STAC spec allows for all of the hierarchical links to be stored as relative paths, except for self which must be an absolute path. However, when creating a Catalog that is going to be moved elsewhere, absolute paths do not yet make sense, so sat-stac keeps self links as relative.

The Catalog can be published with a new endpoint by calling publish with the new root link. This is the absolute link to the root catalog. The publish() function will traverse the tree and update all of the self links in every Catalog, Collection, and Item to be an absolute path using the provided root link.

In [11]:
mycat.publish('https://my.other.cat')