# Searching the Catalog
All objects support the same search interface. `Searches` work by creating a query builder (class `Search`), which can be used in a fluent programming style to refine the search prior to execution by applying filtering, sorting, and limiting of result sets. Normally `Search` objects are created using class methods on one of the primary object types, e.g. `Product.search()`.

The searches are then executed by any of several methods: calling the `count()` method to obtain a count of matching objects, using the `Search` object in an iterating context such as a for loop or a list comprehension to yield each matching object in turn, or calling the `collect()` method which will return a list-like collection object (e.g. `ProductCollection`, `BandCollection`, or `ImageCollection`).

`Search` object methods never mutate the original object, but instead return modified copies. Thus Search objects can be reused for both further modification and repeated executions.

Let’s look at two of the most commonly searched for types of objects: products and images.

## Finding products

### Filtering, sorting, and limiting
Filtering is achieved through the use of the `Properties` class which allows you to express logical and comparison operations on attributes of an object such as a product or image. Multiple filters are combined as if by `AND`. Please see the API documentation for further details; the uses demonstrated below should be readily apparent. A general-use instance of this class can be imported from `descarteslabs.catalog.properties`.

Sorting by an attribute of an object in either ascending or descending order is supported for many of the attributes of each object type.

API documentation should be consulted to determine which properties support filtering and/or sorting. This is noted on each attribute’s specific documentation, e.g. `acquired`.

Limiting allows you to restrict search results to at most a specified number of objects.

`Product.search()` is the entry point for searching products. It returns a query builder that you can use to refine your search and can iterate over to retrieve search results.

Count all products with some data before 2023 using `filter()`:

In [None]:
from descarteslabs.catalog import Product, properties as p

search = Product.search().filter(p.start_datetime < "2023-01-01")
search.count()

You can apply multiple filters. To restrict this search to products with data before 2023 and after 2000:

In [None]:
search = search.filter(p.end_datetime > "2000-01-01")
search.count()

All attributes are documented in the [Product API reference](https://docs.descarteslabs.com/descarteslabs/catalog/docs/product.html#descarteslabs.catalog.Product), which also spells out which ones can be used to filter or sort.



### Text search
Add text search to the mix using `find_text()`. This finds all products with “landsat” in the name or description:

In [None]:
landsat_search = search.find_text("landsat").limit(None)
for product in landsat_search:
    print(product)

### Lookup by id and object relationships
If you know a product’s id, look it up directly with `Product.get()`:

In [None]:
landsat8_collection1 = Product.get("usgs:landsat:oli-tirs:c2:l2:v0")
landsat8_collection1

#### Bands
Wherever there are relationships between objects expect methods such as `Product.bands()` to find related objects. This shows the first four bands of the Landsat 8 product we looked up:

In [None]:
for band in landsat8_collection1.bands().limit(5):
    print(band)

In a similar fashion `Product.images()` returns a search object for images belonging to the product, as detailed in the next section.


## Finding images

### Image filters
`Image` searches support a special method `intersects()` which is used to filter images by means of a geospatial search. Unlike `filter()` this method cannot be used multiple times. It will accept as an argument a GeoJSON dictionary, a shapely geometry, or any of the DL standard `GeoContext` object types. It will select any image for which the image geometry intersects the supplied geometry in lat-lon space (i.e. WGS84). As coordinate system transformations of bounding boxes are involved here, it should be noted that this filtering can be inexact; the overlap of geometries in the native coordinate system of the image may not be the same as that when transformed to the geographic coordinate system.

Please see the `GeoContext Guide` for more information about working with `GeoContexts`.

Please consult the API documentation for the `Image` class for information on which properties can be filtered.

Search images by the most common attributes - by product, intersecting with a geometry and by a date range:

In [None]:
from descarteslabs.catalog import Image, Product, properties as p

geometry = {
    "type": "Polygon",
    "coordinates": [
        [
            [2.915496826171875, 42.044193618165224],
            [2.838592529296875, 41.92475971933975],
            [3.043212890625, 41.929868314485795],
            [2.915496826171875, 42.044193618165224],
        ]
    ],
}

search = landsat8_collection1.images()
search = search.intersects(geometry)
search = search.filter("2023-01-01" <= p.acquired < "2023-06-01")
search = search.sort("acquired")
search.count()

There are other attributes useful to filter by, documented in the API reference for [`Image`](https://docs.descarteslabs.com/descarteslabs/catalog/docs/image.html#descarteslabs.catalog.Image). For example exclude images with too much cloud cover:

In [None]:
search = search.filter(p.cloud_fraction < 0.2)
search.count()

Filtering by `cloud_fraction` is only reasonable when the product sets this attribute on images. `Images` that don’t set the attribute are excluded from the filter.

The `created` timestamp is added to all objects in the catalog when they are created and is immutable. Restrict the search to results created before some time in the past, to make sure that the image results are stable:

In [None]:
from datetime import datetime

search = search.filter(p.created < datetime(2023, 4, 1))
search.count()

Note that for all timestamps we can use `datetime` instances or strings that can reasonably be parsed as a timestamp. If a timestamp has no explicit timezone, it’s assumed to be in UTC.

## ImageCollections
We can use the `collect()` method with an image search to obtain an `ImageCollection` with many useful features:

In [None]:
images = search.collect()
images

Our original AOI for the search is available on the image collection:

In [None]:
images.geocontext

We can extract attributes across the collection with `each()`, or filter or group based on their attributes with `filter()` and `groupby()`:

In [None]:
list(images.each.acquired.month)

In [None]:
spring = images.filter(lambda i: 3 <= i.acquired.month < 6)
list(spring.groupby(lambda i: i.acquired.month))

## Rastering imagery
`Image` and `ImageCollection` support a variety of methods that can be used to retrieve the image data associated with an image, including all manner of transformations such as coordinate systems, resolution, compositing, and scaling of pixel brightness. These operations can result in either a numpy ndarray of image data, or a GeoTIFF file on disk containing the image data.

### Rastering images
To support the rastering of images, each image has a `geocontext` attribute which is a `GeoContext` instance describing the geospatial attributes of the image. All the rastering methods use this geocontext by default, but will accept another geocontext if desired. The resolution parameter can be used to change the resolution of the geocontext if desired.

`Image` supports two methods for rastering, `ndarray()` and `download()`. A variety of parameters used to control the rastering are described in the documentation for those methods.

With `ndarray()` the resulting data is returned as a 3-dimensional numpy array, with the first dimension representing the different bands selected (by default, this can be altered with the bands_axis parameter).

In [None]:
from descarteslabs.catalog import Image
from descarteslabs.utils import display

image = Image.get(
    "usgs:landsat:oli-tirs:c2:l2:v0:LC08_L2SP_197031_20230106_20230110_02_T1"
)
data = image.ndarray("red green blue", resolution=120)
(data.shape, data.dtype)

In [None]:
display(data, title=image.id, figsize=(10, 5))

The ordering of the axes within the ndarray are `(band, y, x)` or `(band, row, column)`.

With `download()` the resulting data is stored in the local filesystem and the name of the file is returned.

In [None]:
import os.path
from descarteslabs.catalog import Image

image = Image.get(
    "usgs:landsat:oli-tirs:c2:l2:v0:LC08_L2SP_197031_20230106_20230110_02_T1"
)
file = image.download("red green blue", resolution=120)
os.path.exists(file)

In [None]:
os.remove(file)

## Rastering image collections
`ImageCollection` supports several methods for rastering. A variety of parameters used to control the rastering are described in the documentation for ech of these methods.

`stack()` can be used to raster each of the images in the collection and then stack the resulting 3D arrays into a single 4-dimensional array, with the different images along the first axis in the order they appear in the ImageCollection (i.e. the axes are (image, band, y, x)). Note that rastering the images is performed in parallel, so this is significantly faster than rastering each image in the collection in a loop.

In [None]:
from descarteslabs.catalog import Product, properties as p

geometry = {
    "type": "Polygon",
    "coordinates": [
        [
            [2.915496826171875, 42.044193618165224],
            [2.838592529296875, 41.92475971933975],
            [3.043212890625, 41.929868314485795],
            [2.915496826171875, 42.044193618165224],
        ]
    ],
}

search = landsat8_collection1.images()
search = search.intersects(geometry).filter("2021-01-01" <= p.acquired < "2022-01-01")
search = search.filter(p.cloud_fraction <= 0.2)
search = search.sort("acquired")
images = search.collect()
data = images.stack("red green blue", resolution=120)
data.shape

In [None]:
# Display the first few
display(*data[0:4], title=list(images[0:4].each.name), ncols=2, figsize=(10, 10))

`mosaic(`) can be used to composite the images to form a single image, resulting in a 3D array. A mosaic composite uses, for each pixel location, the pixel value from the last image in the collection containing a valid (unmasked) pixel value at that location. Since individual images may not cover the same pixels this operation is typically used to combine overlapping images to obtain a single complete image. If the image collection is sorted by increasing acquisition date, this means the most recent image wins. You can use the `sort()` method on the search object to alter the ordering of the images in the collection, or the `sort()` method on the ImageCollection itself to alter the ordering of the images and hence the results of the mosaic operation.

In [None]:
data = images.mosaic("red green blue", resolution=120)
data.shape

In [None]:
display(data, title="Mosaic", figsize=(10, 5))

See the [Compositing Imagery with Catalog](https://docs.descarteslabs.com/examples-gallery/plot_images_mosaic.html) example for a more in-depth discussion of compositing by mosaic. Other kinds of compositing are possible but are not directly supported in the rastering engine but are easily achieved using the NumPy package, see the Composite Multi-Product Imagery example for the use of a median composite.

Stacking and compositing can be combined using the `stack()` method with the flatten parameter. This uses the `groupby()` method to form a partitioning of the image list into multiple image lists of 1 or more images. Each sub-list is rastered as a composite (mosaic), and the multiple resulting mosaics are stacked. Note in this case that the first dimension of the resulting 4D array is equal to the number of different groups resulting from the flatten operation, and not the number of images in the original ImageCollection.

In this example, we will group the images by the acquisition month. As there is at least one image each month, we end up with twelve partitioned image lists. Thus the resulting stack ends up with twelve mosaics. Note that the flatten operation preserves the original ordering of images within each group, so that if the original image collection is sorted by increasing acquired date, each mosaic will again represent “most recent image wins”.

In [None]:
for month, sublist in images.groupby(lambda i: i.acquired.month):
    print(f"Month {month:02} Images {sublist}")

In [None]:
data = images.stack(
    "red green blue", resolution=120, flatten=lambda i: i.acquired.month
)
data.shape

In [None]:
display(*data, title=[f"{m+1:02d}/2021" for m in range(data.shape[0])], ncols=2)

in the image collection (but all using the same geocontext), while the `download_mosaic()` method composites the images in the ImageCollection just like the `mosaic()` method but results in a single geotiff file rather than an ndarray. The names of the resulting files are generated by default but can also be set explicitly. See the [API documentation](https://docs.descarteslabs.com/descarteslabs/catalog/docs/image.html#descarteslabs.catalog.ImageCollection) for further information.

## Common Rastering parameters
Many of the rastering methods accept a common set of parameters including `geocontext, resolution, processing_level, scaling, data_type` and `progress`. These parameters are treated consistently across the different methods, and merit some explanation and examples.

### `geocontext` and `resolution`
`Image` and `ImageCollection` objects have a default geocontext associated with them. The `Image.geocontext` attribute represents the geometry of the image, while the `ImageCollection.geocontext` attribute represents the geocontext used in the search that generated the collection, if any. If the geocontext parameter to a rastering method is not specified, this corresponding geocontext of the image or collection will be used by default. The resolution parameter can be used to override the resolution of the geocontext (whether defaulted or explicitly provided).

### `processing_level`
The `processing_level` parameter allows the selection of different processing levels (e.g. `toa_reflectance` or `surface_reflectance`) supported by a product and its bands. When specifying a non-default processing level, the resulting data will often have a different data type and scaling than the raw image data. You must consult the `processing_levels` attribute to determine what processing levels a band supports.

### scaling and data_type
When band raster data is retrieved, it can be scaled and converted to a variety of data types as required by the user. When neither of these parameters are provided, the original band data (or the selected `processing_level`) is copied into the result without change, while the resulting data type is automatically selected based on the data types of the bands in order to hold all the data without loss of precision or range.

However, the user may specify several different alternative treatments of the band data. One of four automated scaling modes can be specified which direct the operation to rescale the pixel values in each band according to either the range of data in the image or ranges defined in the band attributes and targeting an appropriate output data type.

The raw mode is equivalent to no scaling: the data is preserved as is (after applying any `processing_level`), and the output data type is selected to hold all the band data without loss of precision or range.