## Preface

One of the pivotal projects the EASIER Data Initiative has produced is [ipfs-stac](https://pypi.org/project/ipfs-stac/). The Python library is a testament to the feasibility of onboarding and interfacing geospatial data on IPFS. The library enables developers and researchers to leverage STAC APIs enriched with Filecoin and IPFS metadata to seamlessly fetch, pin, and explore data in a familiar manner. In an ambiguous ecosystem with everchanging advancements, updates, breaking changes, and new infrastructure features will emerge. The team has made it a responsibility to adhere to these changes, prompting our projects to remain flexible. This notebook will explore the many new features and changes to ipfs-stac in version 0.2. 


### Changes Summary

1. When fetching content, the file size is now human-readable (progress is now tracked in Megabytes)
2. New search functionality via `searchSTAC` method added to the `web3` client class. Returns a collection of items
   1. A user can now pass in many of the [query parameters options](https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search#query-parameters-and-fields) to search a STAC catalog
3. Added parameters for content uploads to ipfs:
   1. By default, [CIDv1](https://docs.ipfs.tech/concepts/glossary/#cid-v1) are created
   2. Added option to select whether to pin content to your IPFS node
   3. Added option to add [mutable file system](https://docs.ipfs.tech/concepts/file-systems/#mutable-file-system-mfs) (MFS) reference to the content on upload
   4. Added option to provide a filename to content that's uploaded
      1. If a user uploads a file, the filename is extracted. You can override by passing in a value to this parameter.
4. Optimized functions that start and stop ipfs daemon
5. Assets are no longer fetched by default
6. Added `getAssetNames` function to retrieve the asset names from a collection or item
7. New `Web3` class property that automatically grabs all the collection id from the stac endpoint when instantiated.
8. `pinned_list` returns


### Environment Setup
1 - [Install IPFS Kubo CLI](https://docs.ipfs.io/install/ipfs-kubo/) (if you haven't already). This will allow you to run an IPFS node on your local machine.

2 - [Set up a Jupyter Notebook environment](https://www.youtube.com/watch?v=DA6ZAHBPF1U). A convenient method for achieving this is by utilizing the Jupyter integration in Visual Studio Code.

3 - Run `pip install ipfs-stac` to install the latest version of the library.

In [2]:
from ipfs_stac.client import Web3, Asset

### Initialize client
easier = Web3(local_gateway="localhost", gateway_port=8081, stac_endpoint="https://stac.easierdata.info")


### Attributes added to the Web3 class

A couple attributes have been added to the Web3 class which support a deeper understanding of its current configuration and high level exploration of the STAC endpoint:

1. Added `client` attribute - Instance of a Pystac catalog client to support a variety of additional API functionality.
2. Added `collections` attribute - List of unique collection identifiers to enable the discovery of additional collection metadata.

### Added methods to start and stop IPFS daemon
1. Added `startDaemon` method to `Web3` class - Will attempt to start ipfs daemon on the device running the Python program. After the program has finished running, the daemon will be shut down. If the daemon is already running, it will not be tagged.
2. Added `shutdown_process` method to `Web3` class - will shut down the ipfs daemon on the device running the Python program.

In [3]:
easier.startDaemon()

print(f"Collections: {easier.collections} \n")

easier.client


Collections: ['landsat-c2l1', 'GEDI_L4A_AGB_Density_V2_1_2056.v2.1'] 



You can also retrieve a list of [**Collection objects**](https://pystac.readthedocs.io/en/latest/api/pystac.html#collection) through the new `get_collections()` method.

Let's explore STAC Collection for GEDI.


In [5]:
# Grab the list index containing the GEDI data
gedi_index = easier.collections.index('GEDI_L4A_AGB_Density_V2_1_2056.v2.1')

# Grab the GEDI collection via the list index
easier.getCollections()[gedi_index]


### Enhancements to search

The team has introduced methods and attributes to the `Web3` class which support searching/exploring a STAC catalog that may not be entirely managed by the user. With the following additions, the user experience of being able to query and index unknown assets has been improved:

1. Added `searchSTAC` method to Web3 class - Searches through STAC catalog leveraging the pystac-client attribute, effectively allowing one to use the same exact parameters.
2. Added `getAssetNames` method to Web3 class - List of asset names given a collection or item

#### searchSTAC
With the `searchSTAC` method, we can define our search parameters more effectively. Using the [query extension](https://github.com/stac-api-extensions/query), we can now define the desired logic and pass it to the `query` parameter.

The following is an example of using the `searchSTAC` method:

In [8]:
# Selecting the index representing the landsat id
landsat_index = easier.collections.index('landsat-c2l1')

# Query parameters 
query_params = {"eo:cloud_cover": {"gte": 0, "lte": 20}}

# Search an entire catalog
search_items = easier.searchSTAC(collections=easier.collections[landsat_index])
print(f"Total scenes for {easier.collections[landsat_index]}: {len(search_items)}.")

# Search an entire catalog with filter logic
search_items = easier.searchSTAC(
    collections=easier.collections[landsat_index], query=query_params
)
print(f"Total scenes with 0% to 20% cloud coverage: {len(search_items)}")


Total scenes for landsat-c2l1: 465.
Total scenes with 0% to 20% cloud coverage: 70


Let's refine our search with some spatial features by identifying the items that intersect the mid-Atlantic states.

For this example, we will pass in the geometry from a GeoJSON object.

In [9]:
geojson_feature = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {},
            "geometry": {
                "coordinates": [
                    [
                        [
                          -73.34480105418365,
                          45.04588475346222
                        ],
                        [
                          -79.11384967787046,
                          43.52056357721321
                        ],
                        [
                          -80.47118272957593,
                          42.28803694032902
                        ],
                        [
                          -80.5365358352876,
                          40.79229977072049
                        ],
                        [
                          -82.49364337671977,
                          38.39703841681529
                        ],
                        [
                          -81.93909548527188,
                          37.5876942513829
                        ],
                        [
                          -83.66567377966864,
                          36.58712944146795
                        ],
                        [
                          -75.852486772675,
                          36.563723138993936
                        ],
                        [
                          -75.01397348543532,
                          38.467247825352786
                        ],
                        [
                          -73.86651757284588,
                          40.753014073703696
                        ],
                        [
                          -73.34480105418365,
                          45.04588475346222
                        ]
                    ]
                ],
                "type": "Polygon"
            }
        }
    ],
    "bbox": [
        -90.22893757804734,
        35.65674455842897,
        -76.26898766968725,
        43.58558355601974,
    ],
}

search_items = easier.searchSTAC(
    collections="landsat-c2l1",
    query=query_params,
    intersects=geojson_feature["features"][0]["geometry"],
)
print(
    f"Total scenes with 5% to 20% cloud coverage AND also intersect our area of interest: {len(search_items)}"
)


Total scenes with 5% to 20% cloud coverage AND also intersect our area of interest: 2


#### getAssetNames
So far we've been able to take advantage of the underlying item metadata and spatial component to select items but what can we download? 

To get an idea, we can use the `getAssetNames` method to get a list of the asset ids. Knowing the asset id's will allow us to easily drill down into the asset metadata and pull out the reference links to download the data.

What kind of assets can we retrieve from our search results?

In [10]:
easier.getAssetNames(search_items)


['ANG.txt',
 'MTL.json',
 'MTL.txt',
 'MTL.xml',
 'SAA',
 'SZA',
 'VAA',
 'VZA',
 'blue',
 'cirrus',
 'coastal',
 'green',
 'index',
 'lwir11',
 'lwir12',
 'nir08',
 'pan',
 'qa_pixel',
 'qa_radsat',
 'red',
 'reduced_resolution_browse',
 'swir16',
 'swir22',
 'thumbnail']

For those eagle eye readers, you may have noticed that `items` contained more than one item. ipfs-stac eases the process of pulling out asset names from a collection of `items`. It supports getting the asset names from `CollectionClient`, `ItemCollection`, and `Item` objects. This dynamic approach will return unique asset names for the iterable objects.

### Refactored data fetching

Two critical changes have been made to ipfs-stac, which affect the results of the `pinned_list` method and when an instance of an `Asset` is created:

1. The `pinned_list` You can now specify which type of pinned content to list with the `pin_type` argument.
2. The `pinned_list` method now has a `names` argument (boolean), which dictates whether or not to include link names associated with each CID. You can think of **link names** as a label, such as a filename, making it much easier to identify content with human-readable names. 
3. The data associated with an `Asset` object will no longer be fetched by default. To retrieve the data, you must call the `fetch` method and then access it through the `data` attribute

#### pinned_list
This method fetches pinned CIDs from the configured node. It will now take two arguments:

1. `pin_type` - (optional string): The type of [pinned CIDs](https://docs.ipfs.tech/how-to/pin-files/#three-kinds-of-pins) to list, can be between: `direct`, `indirect`, `recursive`, or `all`, it previously defaulted to all. Defaults to `recursive`
2. `names` - (optional boolean): Whether to include pin/link names in the output json with CIDs. Defaults to false

In [11]:
## Usage of updated pinned_list method
recursive_pins = easier.pinned_list()

indirect_pins = easier.pinned_list(pin_type="indirect", names=False)

print(f"Recursive pins: {len(recursive_pins)}")
print(f"Indirect pins: {len(indirect_pins)}")


Recursive pins: 297
Indirect pins: 18900


In [None]:
## Fetching data for an asset
demo_asset = easier.getAssetFromItem(search_items[0], asset_name="SAA")
print(f"Before: {demo_asset.data}")

demo_asset.fetch()

print(f"After: {len(demo_asset.data)}")


In [None]:
# Alternatively, you can force data to be fetched through the fetch_data argument of getAssetFromItem
demo_asset = easier.getAssetFromItem(items[0], asset_name="SAA", fetch_data=True)


### Added ability to write/upload to IPFS Mutable File System

The IPFS Mutable File System (MFS) is a powerful feature to optimize the organization of data stored on the network. 

1. The `uploadToIPFS` method has been updated to support writing to an MFS path
2. The `Asset` class now has an `addToMFS` method which supports writing to a specific directory with the option of specifying a file name.

In [None]:
## Example usage
easier.uploadToIPFS(file_path="./image.tiff", file_name="example.tiff", pin_content=True, mfs_path="images")

demo_asset.addToMFS(filename="blog_post")


In [5]:
# And finally, shutdown ipfs daemon (will automatically shut down if startDaemon method was used)
easier.shutdown_process()


## Closing remarks

All in all, the team has produced new features that optimize interfacing with STAC catalogs enriched with IPFS metadata. These changes are a huge step forward in bringing to light the capabilities of decentralized infrastructure when mingled with geospatial data. Stay tuned for more posts that highlight these changes in action. For more technical details, keep an eye out for the [Github Repository](https://github.com/easierdata/ipfs-stac)