## Reading Data from the STAC API

The provided code created(catlog folder) catalogs the datasets we created using the [STAC](http://stacspec.org/) (SpatioTemporal Asset Catalog) specification. We could also can provide an endpoint for searching the  datasets by h3_index, time, and pt and other parameters. This quickstart will show you how to search for data using our STAC API and open-source Python libraries. 

To get started you'll need the [pystac-client](https://github.com/stac-utils/pystac-client) library installed. You can install it via pip:

```
> python -m pip install pystac-client
```

To access the data, we'll create a `pystac_client.Client`.

In [1]:
from datetime import datetime, timezone
import pystac
import os

In [2]:
def load_catalog_lazily(catalog_path):
    """
    Lazily loads a STAC catalog from the provided file path.
    
    Parameters:
        catalog_path (str): The path to the STAC catalog JSON file.
        
    Returns:
        pystac.Catalog: The loaded STAC catalog object.
    """
    return pystac.Catalog.from_file(catalog_path)


In [3]:
#load here 
# catalog.json which got created via the pipeline (data/catalog/catalog.json)
catalog_path="/Users/bikash/planet/juuuua/finalsubmission/pythonProject/data/catalog/catalog.json"
catalog = load_catalog_lazily(catalog_path)
catalog

## 🔍 Searching with the STAC API

The STAC API allows us to search for assets that meet specific criteria, such as:

- **Date and Time**: The period the asset covers.
- **Spatial Extent**: The geographical area of interest.
- **Other Metadata**: Any other properties captured in the STAC item's metadata.

For the purpose of our assignment, we'll focus on answering the following:

1. **🗂️ A List of Items**: Identifying items that match a specified H3 index and date range.
2. **🔗 A List of URLs**: Extracting URLs to Parquet files that meet these criteria.


### Filtering Collections by Date Range


In [4]:
 def filter_collections_by_date(catalog, start_date, end_date):
    """
    Filters the collections in the catalog based on a date range.
    
    Parameters:
        catalog (pystac.Catalog): The loaded STAC catalog object.
        start_date (datetime): The start date for filtering.
        end_date (datetime): The end date for filtering.
        
    Returns:
        list: A list of collections that match the date range.
    """
    matching_collections = []
    
    for collection in catalog.get_children():
        # Extract the temporal extent from the collection
        temporal_extent = collection.extent.temporal
        
        if temporal_extent:
            # Extract the start and end dates from the temporal extent
            collection_start, collection_end = temporal_extent.intervals[0]
            
            # If the collection has no end date, we assume it extends indefinitely
            if collection_end is None:
                collection_end = datetime.max.replace(tzinfo=timezone.utc)
            
            # Ensure both dates are timezone-aware
            collection_start = collection_start.astimezone(timezone.utc)
            if collection_end:
                collection_end = collection_end.astimezone(timezone.utc)
            
            # Compare the collection's date range with the provided date range
            if collection_start <= end_date and collection_end >= start_date:
                matching_collections.append(collection)
    
    return matching_collections


### Filtering Items by H3 Index and Date


In [5]:
def filter_items_by_h3_and_date(collection, h3_index, start_date, end_date):
    """
    Filters items within a collection based on H3 index and date range.
    
    Parameters:
        collection (pystac.Collection): The collection to filter items from.
        h3_index (str): The H3 index to match.
        start_date (datetime): The start date for filtering.
        end_date (datetime): The end date for filtering.
        
    Returns:
        list: A list of items that match the H3 index and date range.
    """
    matching_items = []
    
    for item in collection.get_items():
        # Ensure the item date is timezone-aware and in UTC
        item_date = item.datetime.astimezone(timezone.utc)

        # Filter items based on date and H3 index
        if start_date <= item_date <= end_date:
            if h3_index in item.properties.get('h3_indexes', []):
                matching_items.append(item)

    return matching_items


### Extracting Parquet Files from STAC Items
start_date = datetime(2022, 12, 11, tzinfo=timezone.utc)
end_date = datetime(2022, 12, 11, tzinfo=timezone.utc)
h3_index = "8a0326233ab7fff"

matching_collections = filter_collections_by_date(catalog, start_date, end_date)
matching_items = []
for collection in matching_collections:
    items = filter_items_by_h3_and_date(collection, h3_index, start_date, end_date)
    matching_items.extend(items)
parquet_files = get_parquet_files_from_items(matching_items)

In [6]:
def get_parquet_files_from_items(items):
    """
    Extracts the Parquet file paths from the matching STAC items.
    
    Parameters:
        items (list): A list of STAC items to extract Parquet file paths from.
        
    Returns:
        list: A list of URLs pointing to the Parquet files.
    """
    parquet_files = []
    
    for item in items:
        # Iterate over the assets within the item
        for asset_key, asset in item.assets.items():
            # Check if the asset is a Parquet file
            if asset.media_type == "application/x-parquet":
                parquet_files.append(asset.href)
    
    return parquet_files


## Example usage

In [7]:
def main(catalog_path, start_date, end_date, h3_index):
    catalog = load_catalog_lazily(catalog_path)
    matching_collections = filter_collections_by_date(catalog, start_date, end_date)
    matching_items = []
    for collection in matching_collections:
        items = filter_items_by_h3_and_date(collection, h3_index, start_date, end_date)
        matching_items.extend(items)
    parquet_files = get_parquet_files_from_items(matching_items)  # Corrected call
    return parquet_files

In [8]:
catalog_path="/Users/bikash/planet/juuuua/finalsubmission/pythonProject/data/catalog/catalog.json"
start_date = datetime(2022, 12, 11, tzinfo=timezone.utc)
end_date = datetime(2022, 12, 11, tzinfo=timezone.utc)
h3_index = "8a0326233ab7fff"

parquet_files = main(catalog_path, start_date, end_date, h3_index)

print("Matching Parquet Files:")
for file in parquet_files:
    print(file)


Matching Parquet Files:
/Users/bikash/planet/juuuua/finalsubmission/pythonProject/data/parquet/day=2022-12-11/part.9.parquet
/Users/bikash/planet/juuuua/finalsubmission/pythonProject/data/parquet/day=2022-12-11/part.11.parquet
/Users/bikash/planet/juuuua/finalsubmission/pythonProject/data/parquet/day=2022-12-11/part.10.parquet
/Users/bikash/planet/juuuua/finalsubmission/pythonProject/data/parquet/day=2022-12-11/part.8.parquet
