# Cerulean API Documentation

For most users, we recommend using the [Cerulean web application](https://cerulean.skytruth.org/), which provides a visual interface for exploring the complete set of Cerulean data layers.

For users who want to directly access and download oil slick detection data, we provide programmatic free access to an OGC compliant API ([api.cerulean.skytruth.org](https://api.cerulean.skytruth.org)). Currently, oil slick detections can be downloaded in addition to data used for potential source identification of vessels and offshore oil platform locations (excluding AIS tracks, which are only accessible via the UI). API queries can be made programmatically (e.g. a curl request in Python) for direct data access and download. You can also execute API queries within a browser by pasting an API command into your browser’s address bar, which will then show the results of your query, including a helpful paginated map, or download the data directly. Below, we provide some working examples of common data queries from our API. This is only a small sample of the types of queries that are possible. To dig deeper, please see our full API docs and check out the current documentation for [tipg](https://developmentseed.org/tipg/) and [CQL-2](https://cran.r-project.org/web/packages/rstac/vignettes/rstac-02-cql2.html), both of which are used by our API.

In [None]:
%%capture
!pip install contextily

In [None]:
import requests
import geopandas as gpd
import contextily as ctx
import matplotlib.pyplot as plt

In [None]:
def query_to_gdf_vis(data):
    if "features" in data:
        gdf = gpd.GeoDataFrame.from_features(data["features"])
    else:
        gdf = gpd.GeoDataFrame.from_features([data])

    gdf.crs = "EPSG:4326"
    gdf = gdf.to_crs(epsg=3857)
    ax = gdf.plot(figsize=(10, 10), alpha=0.5, edgecolor="k")
    ctx.add_basemap(
        ax, source=ctx.providers.OpenStreetMap.Mapnik, crs=gdf.crs.to_string()
    )
    plt.show()
    return gdf



# Example 1: Return slicks within a bounding box

For our first example, let's return slick detection data found within a specific geographic area. To do this, you can use the bounding box (bbox) pattern. For example, the below command will download model detections in the Mediterranean Sea, using this bbox parameter as input:

```
?bbox=10.9,42.3,19.7,36.1"
```




**NOTE:** In our examples we use a limit parameter to limit the number of entries returned from a query. If unspecified, all requests have a default limit value of 10: `&limit=10` (up to a maximum of 9999). To make use of pagination, you can also use the parameter `&offset=60` to return entries starting at any arbitrary row (e.g. returning from row 61 onwards).


In [None]:
example_1_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items"  # This is the endpoint for the slick data
    "?limit=100"  # This limits the number of entries returned to 100
    "&bbox=10.9,42.3,19.7,36.1"  # This is the bbox for a section of the Mediterranean Sea
)

data = requests.get(example_1_url).json()  # This is the response from the API
# `data should be a fully formed GeoJSON FeatureCollection

gdf = query_to_gdf_vis(data)  # This is the visualization of the data

You can also query the metadata of the response to determine how many total items apply to the filters you provided. This value will be independent of the limit requested.

In [None]:
print("Total Matched:", data["numberMatched"])
print("Total Returned:", data["numberReturned"])

To explore the data that is returned itself, we can peek inside the dataframe and see what variables are returned. For full documentation, see our [standard API docs](https://api.cerulean.skytruth.org/)

In [None]:
gdf.head()

1. **geometry**: A multipolygon representing a slick on the water's surface.
2. **aoi_type_1_ids**: A list of Exclusive Economic Zones (EEZs) that the slick intersects.
3. **aoi_type_2_ids**: A list of International Hydrographic Organization (IHO) Sea Areas that the slick intersects.
4. **aoi_type_3_ids**: A list of Marine Protected Areas (MPAs) that the slick intersects.
5. **area**: An estimate of the total area of the slick in square meters.
6. **cls**: An integer representing the models best guess of slick classification (not to be interpreted as reliable ground truth).
7. **fill_factor**: A metric indicating how rectangular the slick is.
8. **hitl_cls**: An integer representing the slick classification provided through internal human review. Corresponds to classes defined in `public.cls`.
9. **hitl_cls_name**: The label corresponding to the human reviewed classification of the slick (`hitl_cls`).
10. **id**: A unique identifier for the slick record.
11. **centerlines**: A computed feature collection of linestrings that trace along the center of each slick multipolygon, reducing linear slicks to 1D geometries.
12. **length**: An estimate of the length of the slick in meters.
13. **linearity**: A measure of how linear the slick's shape is.
14. **aspect_ratio_factor**: A measure of the aspect ratio of the slick (its area divided by its centerline length squared).
15. **machine_confidence**: A score from the slick detection model regarding its confidence in the detection. It is used as a rough estimate of detection accuracy.
16. **orchestrator_run**: Identifier for the orchestrator run that processed this slick detection. Typically one per Sentinel-1 Scene.
17. **perimeter**: The perimeter length of the slick in meters.
18. **polsby_popper**: A compactness measure of the slick's shape based on the Polsby-Popper metric.
19. **s1_scene_id**: Sentinel-1 Scene identifier associated with the slick detection.
20. **slick_timestamp**: The timestamp indicating when the slick was detected.
21. **slick_url** A URL to quickly view the slick detection in the Cerulean UI
22. **source_type_1_ids**: A list of vessels that are proximal to the slick.
23. **source_type_2_ids**: A list of pieces of infrastructue that are proximal to the slick.
24. **source_type_3_ids**: A list of dark vessels that are proximal to the slick.
25. **max_source_collated_score**: The highest Source Collated Score (aka. Slick-Source Match) for the slick, indicating its adjacency to anthropogenic sources and therefore our strongest indicator that the slick detection is a true positive oil slick (prior to human review).

# Example 2: High-grade results and filter by datetime

For our next example, let’s add a datetime filter to return slick detection data from December, 2023, sorted by Source Collated Score (aka. Slick-Source Match). To do this, we specify a sorting function `?sortby=-max_source_collated_score` (the negative makes the sort descending) and provide a start and end datetime. The required date format is `YYYY-MM-DDTHH:MM:SSZ`, where the time is in UTC (which matches the timezone of S1 imagery naming convention). 

**Note:** Source Collated Score (aka. Slick-Source Match) is a score produced from the slick attribution model representing its estimated likelihood that a given slick detection can be attributed to anthropogenic sources. While useful as a proxy, it should not be interpreted as a ground truth or a guaranteed measure of correctness. Rather, it is used primarily to reduce the number of false detections returned by the query. 

We strongly recommend that users consider slicks where `max_source_collated_score > 0` as credible oil, and use discretion when considering slicks with lower values.

In [None]:
example_2_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items"
    "?limit=100"
    "&bbox=10.9,42.3,19.7,36.1"
    "&datetime=2024-01-01T00:00:00Z/2025-01-01T00:00:00Z"  # Limit results to a specific date range
    "&filter=max_source_collated_score > 0"
    "&sortby=-max_source_collated_score"  # Sort by machine confidence in descending order
)

data = requests.get(example_2_url).json()
gdf = query_to_gdf_vis(data)

# Example 3: Other basic filtering

Our API also allows you to filter results using various properties of the slick detection data. For example, let’s repeat the query from example 1, but limit results to detections with a `max_source_collated_score` greater-than-or-equal-to (GTE) 0.0, and an `area` greater than (GT) 5 square km:

In [None]:
example_3_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items"
    "?limit=100"
    "&bbox=10.9,42.3,19.7,36.1"
    "&datetime=2024-01-01T00:00:00Z/2025-01-01T00:00:00Z"
    "&sortby=slick_timestamp"  # sort by slick timestamp
    "&filter=max_source_collated_score GTE 0.0 AND area GT 5000000"  # filter by max_source_collated_score greater than or equal to 0.0 and area greater than 5000000 square meters (5 square kilometers)
)

data = requests.get(example_3_url).json()
gdf = query_to_gdf_vis(data)

Note that these filter commands include spaces and abbreviated operators such as GTE (greater-than-or-equal-to), which are patterns enabled by CQL-2. There are a large number of fields available for filtering. We’ll cover a few more common examples below.

# Example 4: Filter by source

For higher-confidence slicks detected by Cerulean, we apply a second model that finds any vessels or offshore oil infrastructure recorded in the vicinity of those slicks. Let’s repeat our query from example 1, but limit the results to slicks with a possible *vessel* source identified nearby (excluding potential infrastructure sources).

In [None]:
example_4_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items"
    "?limit=100"
    "&bbox=10.9,42.3,19.7,36.1"
    "&sortby=slick_timestamp"
    "&datetime=2024-01-01T00:00:00Z/2025-01-01T00:00:00Z"
    "&filter=max_source_collated_score GTE 0.0 AND area GT 5000000 AND NOT source_type_1_ids IS NULL"
)

data = requests.get(example_4_url).json()
gdf = query_to_gdf_vis(data)

This one is a little complicated.

`NOT source_type_1_ids IS NULL`

This command returns slicks where Cerulean has identified at least one potential source of type 1 (vessel). Those slicks may or may not have potential sources of the other types: type 2 (infrastructure) or type 3 (dark vessels). The syntax is a little confusing because of the double negative, but the command `NOT source_type_1_ids IS NULL` tells the API to fetch all slicks where the `source_type_1` (vessels) field has at least one entry.

# Example 5: Download data

If you wanted to return the query directly we recommend using curl. The default filetype is geojson.

In [None]:
import urllib.parse

example_5_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items"
    "?limit=100"
    "&bbox=10.9,42.3,19.7,36.1"
    "&sortby=slick_timestamp"
    "&datetime=2024-01-01T00:00:00Z/2025-01-01T00:00:00Z"
    "&filter=max_source_collated_score GTE 0.0 AND area GT 5000000 AND NOT source_type_1_ids IS NULL"
)

encoded_url = urllib.parse.quote(
    example_5_url, safe=":/?=&"
)  # handle special characters in the URL

!curl "{encoded_url}" -o /content/example_5.geojson # download the geojson

If you prefer a CSV, you can append `&f=csv` to the query to indicate the preferred filetype, like this:

In [None]:
import urllib.parse

encoded_url = urllib.parse.quote(
    example_5_url + "&f=csv", safe=":/?=&"
)  # handle special characters in the URL

!curl "{encoded_url}" -o /content/example_5.csv # download the csv

You can always check out any particular slick of interest in our online app to get additional context, like the S1 image that generated it, or the actual paths that nearby vessels took. Simply grab the entry from the results in the column named "slick_url", which will look like this:

https://cerulean.skytruth.org/slicks/3582918?ref=api&slick_id=3582918

# Example 6: Return a specific slick by its ID

If you know which slick you want to pull from the API - let’s say it’s slick `3582918` from above - you can fetch it using a query like this:

In [None]:
example_6_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items?id=3582918"
)

data = requests.get(example_6_url).json()
gdf = query_to_gdf_vis(data)

In [None]:
gdf

# Example 7: Return all slicks detected in a specific Sentinel-1 scene

If you want to return all slick detections in a specific Sentinel-1 scene, use a query like this:

In [None]:
example_7_url = (
    "https://api.cerulean.skytruth.org/collections/public.slick_plus/items"
    "?s1_scene_id=S1A_IW_GRDH_1SDV_20240113T165611_20240113T165636_052091_064BBD_918F"
)

data = requests.get(example_7_url).json()
gdf = query_to_gdf_vis(data)

# Example 8: Filter by Exclusive Economic Zone (EEZ), IHO Sea Area, or Marine Protected Area (MPA)

Cerulean keeps track of the world's EEZs, IHOs, and MPAs using a unique AOI ID that has been assigned to each. To filter slicks based on these areas of interest, you first need to find its `aoi_id` by querying the `public.aoi`, `public.aoi_eez`, `public.aoi_iho`, or `public.aoi_mpa` tables. Once you have an `aoi_id` you can find slick detections based on the queryable fields `aoi_type_1_ids` (for EEZs) or `aoi_type_2_ids` (for IHOs) or `aoi_type_3_ids` (for MPAs).

## Search for an AOI based on name
Let's query the `public.aoi` table and explore the result to find an `aoi_id` associated with the Greek EEZ.

In [None]:
example_aoi_name = (
    "https://api.cerulean.skytruth.org/collections/public.aoi/items"  # This is the endpoint for all AOI data
    "?filter=LOWER(name) LIKE '%greek%'"  # filter by name
)

data = requests.get(example_aoi_name).json()
print("Number of results:", len(data["features"]))
print(data["features"][0]["properties"])
aoi_id = data["features"][0]["properties"]["id"]

Note that the `id` field here is the `aoi_id` necessary for futher filtering. The `type` field indicates whether it is an `aoi_type_1_id` (for EEZs) or `aoi_type_2_id` (for IHOs) or `aoi_type_3_id` (for MPAs)

## Search for an EEZ based on MRGID ([Marine Regions Gazetteer ID](https://www.marineregions.org/eezsearch.php))
Let's query the `public.aoi_eez` table and explore the result to find an `aoi_id` associated with the Greek EEZ. Its MRGID is `5679`.

In [None]:
example_aoi_mrgid = (
    "https://api.cerulean.skytruth.org/collections/public.aoi_eez/items"  # This is the endpoint for the EEZ data
    "?mrgid=5679"  # filter by MRGID for the Greek EEZ
)
data = requests.get(example_aoi_mrgid).json()
print(data["features"][0]["properties"])
aoi_id = data["features"][0]["properties"]["aoi_id"]

## Return results for a specific EEZ

Now that we have an `aoi_id`, we can query for slicks associated with it.

In [None]:
example_8_url = (
    "https://api.cerulean.skytruth.org/collections/public.get_slicks_by_aoi/items"  # This is the endpoint for the slick data filtered by aoi
    "?limit=100"
    f"&aoi_id={aoi_id}"  # filter by aoi_id calculated above
    "&collation_threshold=0"
)
data = requests.get(example_8_url).json()
gdf = query_to_gdf_vis(data)

Similarly you can search for the WDPAID ([World Database on Protected Areas ID](https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA)) in the `public.aoi_mpa` table. 

**NOTE:** Not all geometries are true oil detections. It is important to verify the validity of the detections using the original Sentinel-1 imagery that the data was derived from. We recommend using the Cerulean UI to do this.

## Example 9: Iterate Through Sources
In this example, we iterate through two sources (with `mmsi_or_structure_id` values `477932400` and `372519000`) and query the `public.source_plus` collection for slicks associated with each source. Each query filters results to only include records where `source_rank=1` and `source_collated_score>0.0`. 

In [None]:
data_dict = {}
for source_id in [372519000, 477932400]:
    example_9a_url = (
        "https://api.cerulean.skytruth.org/collections/public.source_plus/items?"
        f"mmsi_or_structure_id={source_id}"
        "&filter=source_collated_score GT 0.0 AND source_rank EQ 1"
    )
    data = requests.get(example_9a_url).json()
    data_dict[source_id] = data
query_to_gdf_vis(data_dict[372519000])

1. **geometry**: A multipolygon representing a slick on the water's surface.
2. **git_tag**: Version record (e.g. 1.1.0) indicating major and minor changes to the underlying codebase, and therefore representing compatibility between scores.
3. **mmsi_or_structure_id**: The numeric name used to refer to a specific vessel (MMSI) or piece of infrastructure (structure_id). The Structure ID values are defined by Global Fishing Watch.
4. **slick_confidence**: A score from the slick detection model regarding its confidence in the detection. It is used as a rough estimate of detection accuracy.
5. **id**: A unique identifier for the slick record.
6. **slick_url**: A link used to look at the indicated slick in the User Interface.
7. **source_collated_score**: A relative score between -5 and +5 indicating how strongly we believe a potential source is responsible for a given slick. We recommend typically looking at values greater than 0.
8. **source_rank**: An integer indicating the order of potential sources. E.g. the top 3 likely potential sources have rank 1, 2, and 3 respectively.
9. **source_type**: A key used to distinguish between sources that are VESSEL, INFRA, DARK, or NATURAL. Can be used to distinguish between `MMSI` and `structure_id` if there is ambiguity.
10. **source_url**: A link used to look at all slicks associated with the indicated source in the User Interface.


### Filtering the JSON Locally by git_tag
First, we compute the highest `git_tag` from our existing JSON data using the helper function. Then we filter the features already in that JSON so that only those with a git_tag greater than or equal to that highest value remain. Finally, we visualize the filtered data using `query_to_gdf_vis`.


Note that the higher the `git_tag`, the more robust the results can be expected to be. We recommend starting at `git_tag >= 1.1.0`, based on breaking changes introduced after `1.0.11`.

In [None]:
def highest_git_tag(data):
    # Convert each git_tag (e.g., "1.1.0") to a tuple of integers for comparison.
    tags = [
        tuple(map(int, feat["properties"]["git_tag"].split(".")))
        for feat in data.get("features", [])
        if "git_tag" in feat["properties"]
    ]
    return ".".join(map(str, max(tags))) if tags else None


# Example usage: Compute the highest git_tag from the previously returned JSON data.
high_tag = highest_git_tag(data)
print("Highest git_tag for previous query:", high_tag)

# Now filter the JSON already returned, selecting features with git_tag >= high_tag.
filtered_data = {
    "type": "FeatureCollection",
    "features": [
        feat
        for feat in data.get("features", [])
        if "git_tag" in feat["properties"]
        and tuple(map(int, feat["properties"]["git_tag"].split(".")))
        >= tuple(map(int, "1.1.0".split(".")))
    ],
}

# Visualize the filtered results.
query_to_gdf_vis(filtered_data)

## Example 10: Locating a Dark Vessel polluter
You can return a set of potential polluters in a given bounding box and date range. For example, we can specify `&source_type=DARK` to search for pollution events that are tied to anonymous vessels which don't report their location, also known as dark vessels. We'll use the same bounding box for the Mediterranean Sea which was used above.

We'll also filter for sources with `source_collated_score` greater than `0.5` to return only strongly associated examples.

Note: We recommend collated scores above 0 for default searches. If you need higher-likelihood slick-to-source matching, then we recommend increasing to 0.5 or even 1.0. You should be aware that the tradeoff at higher values is that while the percentage of slicks showing true matches will increase, the absolute number will decrease (effectively sacrificing recall in favor of precision). Conversely, if you are willing to handle more false positives, you can lower the number to -0.5 or -1.0 to cast a wider net.

In [None]:
example_10_url = (
    "https://api.cerulean.skytruth.org/collections/public.source_plus/items"
    "?bbox=10.9,42.3,19.7,36.1"
    "&datetime=2025-05-01T00:00:00Z/2025-05-09T00:00:00Z"
    "&source_type=DARK"
    "&filter=source_collated_score GT 0.5"
    "&sortby=-source_collated_score"
)
data = requests.get(example_10_url).json()
gdf = query_to_gdf_vis(data)

### Verifying the oil
We can visually validate the results by going to the link provided in the `slick_url` field. This will launch the slick in the Cerulean UI alongside the corresponding Sentinel-1 Imagery.

In [None]:
gdf["slick_url"].values

# Conclusion
We hope this summary helps you get started with Cerulean’s API. This is a small sample of the data queries that are currently possible with Cerulean’s API. For full documentation, please see our [standard API docs](https://api.cerulean.skytruth.org/).

# How to Cite
SkyTruth Cerulean API. (n.d.). Query: [Brief description of your query]. Retrieved [Month DD, YYYY] from https://api.cerulean.skytruth.org/