In [2]:
import datetime
import pandas as pd
import pprint
import pyaurorax

aurorax = pyaurorax.PyAuroraX()

# Advanced metadata filter searching

When interacting wit the AuroraX search engine, you can utilize metadata filtering capabilities to further hone your searches. In other example notebooks, we've seen already a few examples of this, such as limiting results when spacecrafts are in certain regions or when an ML model believes an ASI is not cloudy. In this example notebook, we'll explore a full range of metadata filter options available to you when constructing search requests.

First up, a reminder. An important part of being able to utilize the metadata filters in the AuroraX search engine is knowing the available keys and values. Each data source record has an attribute named `ephemeris_metadata_schema` and `data_products_metadata_schema`. The 'ephemeris' schema is used for conjunction and ephemeris searching, and the 'data products' schema is used for data product searching.

In [3]:
# The data sources are what we use for search queries. We list some below,
# and in the following search queries in this notebook, we utilize this
# information for the program, platform, instrument type fields.

# let's list the first 10 data sources just to get us a table view of a few
aurorax.search.sources.list_in_table(limit=20)

# the below line gets all data sources, which we'll use later to explore the
# available metadata filters
sources = aurorax.search.sources.list()

Identifier   Program      Platform       Instrument Type    Source Type   Display Name   
3            swarm        swarma         footprint          leo           Swarm A        
29           swarm        swarmb         footprint          leo           Swarm B        
30           swarm        swarmc         footprint          leo           Swarm C        
32           epop         epop           footprint          leo           ePOP           
33           themis       themisa        footprint          heo           THEMIS-A       
34           themis       themisb        footprint          heo           THEMIS-B       
35           themis       themisc        footprint          heo           THEMIS-C       
36           themis       themisd        footprint          heo           THEMIS-D       
37           themis       themise        footprint          heo           THEMIS-E       
38           arase        arase          footprint          heo           Arase          
39        

In [4]:
# using the data source listing that we retrieved further above, let's
# have a look at one of the records
#
# for the first data source, print only the first metadata filter info
print(sources[0].program, sources[0].platform, sources[0].instrument_type)
pprint.pprint(sources[0].ephemeris_metadata_schema[0])  # type: ignore

swarm swarma footprint
{'allowed_values': ['north polar cap',
                    'north cusp',
                    'north cleft',
                    'north auroral oval',
                    'north mid-latitude',
                    'low latitude'],
 'data_type': 'string',
 'description': 'Region based on where the magnetic field line that passes '
                "through the spacecraft intersects the Earth's surface in the "
                "Earth's northern magnetic hemisphere",
 'field_name': 'nbtrace_region',
 'searchable': True}


We see above, just one of the metadata filters we can use for the Swarm-A spacecraft. We'll leave it up to you from here to explore the additional filters for Swarm, and the available filters for any other data source. 

If you prefer to look at all the available metadata filters in a web browser instead, you can head on over to the [AuroraX Conjunction Search webpage](https://aurorax.space/conjunctionSearch/standard). Select your data source(s), and click on the '+' icon for metadata filters, and a modal will pop up. All metadata filters for the selected data sources are displayed in the modal.

Now that we know a bit more about how the data sources come into play with the search engine, let's do a conjunction search with a simple metadata filter.

In [5]:
# search for conjunctions between any THEMIS-ASI instrument, and any Swarm
# spacecraft where the north B-trace region is 'north polar cap'.
#
# NOTE: this region metadata is not derived by AuroraX, but instead by SSCWeb.
# This is the same for several other metadata fields for spacecrafts.
#
# set timeframe and distance
start = datetime.datetime(2019, 2, 1, 0, 0, 0)
end = datetime.datetime(2019, 2, 10, 23, 59, 59)
distance = 500

# set ground criteria block
ground = [aurorax.search.GroundCriteriaBlock(programs=["themis-asi"])]

# set space criteria block, with a metadata filter
expression1 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values="north polar cap", operator="=")
metadata_filter = aurorax.search.MetadataFilter(expressions=[expression1])
space = [aurorax.search.SpaceCriteriaBlock(programs=["swarm"], metadata_filters=metadata_filter)]

# perform search
s = aurorax.search.conjunctions.search(start, end, distance, ground=ground, space=space, verbose=True)

[2025-01-27 13:19:57.970188] Search object created
[2025-01-27 13:19:58.013216] Request submitted
[2025-01-27 13:19:58.013283] Request ID: 0722a3e6-2532-4411-a8ce-30d888d9155e
[2025-01-27 13:19:58.013310] Request details available at: https://api.aurorax.space/api/v1/conjunctions/requests/0722a3e6-2532-4411-a8ce-30d888d9155e
[2025-01-27 13:19:58.013333] Waiting for data ...
[2025-01-27 13:19:59.539875] Checking for data ...
[2025-01-27 13:19:59.960137] Data is now available
[2025-01-27 13:19:59.960341] Retrieving data ...
[2025-01-27 13:20:00.516462] Retrieved 902.8 kB of data containing 109 records


In [6]:
# output data
#
# NOTE: while here we format the results into a Pandas dataframe, this
# is not required. We actually don't include Pandas as a dependency since
# it's used simply as a nice add-on to view data. If you're good with slicing
# and dicing lists and dictionaries, you'll be fine without it.
conjunctions = [c.__dict__ for c in s.data]
df = pd.DataFrame(conjunctions)
df.sort_values("start")[0:10]

Unnamed: 0,conjunction_type,start,end,data_sources,min_distance,max_distance,events,closest_epoch,farthest_epoch
89,nbtrace,2019-02-01T02:20:00,2019-02-01T02:21:00,"[DataSource(identifier=53, program='themis-asi...",153.98789,371.020379,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T02:20:00,2019-02-01T02:21:00
100,nbtrace,2019-02-01T06:30:00,2019-02-01T06:31:00,"[DataSource(identifier=53, program='themis-asi...",314.7668,382.560968,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T06:30:00,2019-02-01T06:31:00
41,nbtrace,2019-02-01T06:30:00,2019-02-01T06:31:00,"[DataSource(identifier=53, program='themis-asi...",385.245792,398.072324,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T06:30:00,2019-02-01T06:31:00
33,nbtrace,2019-02-01T06:32:00,2019-02-01T06:32:00,"[DataSource(identifier=51, program='themis-asi...",438.413051,438.413051,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T06:32:00,2019-02-01T06:32:00
79,nbtrace,2019-02-01T06:32:00,2019-02-01T06:32:00,"[DataSource(identifier=51, program='themis-asi...",358.971769,358.971769,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T06:32:00,2019-02-01T06:32:00
6,nbtrace,2019-02-01T08:04:00,2019-02-01T08:04:00,"[DataSource(identifier=47, program='themis-asi...",361.353685,361.353685,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T08:04:00,2019-02-01T08:04:00
53,nbtrace,2019-02-01T08:04:00,2019-02-01T08:04:00,"[DataSource(identifier=47, program='themis-asi...",424.929958,424.929958,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-01T08:04:00,2019-02-01T08:04:00
81,nbtrace,2019-02-02T02:00:00,2019-02-02T02:01:00,"[DataSource(identifier=53, program='themis-asi...",282.989015,435.935278,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-02T02:01:00,2019-02-02T02:00:00
65,nbtrace,2019-02-02T02:00:00,2019-02-02T02:00:00,"[DataSource(identifier=51, program='themis-asi...",466.658885,466.658885,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-02T02:00:00,2019-02-02T02:00:00
103,nbtrace,2019-02-02T05:50:00,2019-02-02T05:51:00,"[DataSource(identifier=53, program='themis-asi...",167.907825,266.571666,"[{'conjunction_type': 'nbtrace', 'e1_source': ...",2019-02-02T05:51:00,2019-02-02T05:50:00


The available expression `operator` values are integrated into the library using type hints. VSCode and other editors that support autocomplete and linting for types will point out the possible choices quite easily. 

Here's the list of possible operators: `=`, `!=`, `>`, `<`, `>=`, `<=`, `between`, `in`, `not in`

Later on in this notebook, we'll go through many of these operators.

# Single expression, multiple values

You'll notice in the above example that we have set the metadata filter to be only one expression - if the spacecraft north B-field magnetic footprint is in the north polar cap. Let's adjust this example to still have only one expression, but make it so that the nbtrace_region can be multiple values.

In [7]:
# set timeframe and distance
start = datetime.datetime(2019, 2, 1, 0, 0, 0)
end = datetime.datetime(2019, 2, 10, 23, 59, 59)
distance = 500

# set ground criteria block
ground = [aurorax.search.GroundCriteriaBlock(programs=["themis-asi"])]

# set space criteria block, with a metadata filter
#
# now let's do multiple values for a single key
expression1 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values=["north polar cap", "north auroral oval"], operator="in")
metadata_filter = aurorax.search.MetadataFilter(expressions=[expression1])
space = [aurorax.search.SpaceCriteriaBlock(programs=["swarm"], metadata_filters=metadata_filter)]

# perform search
s = aurorax.search.conjunctions.search(start, end, distance, ground=ground, space=space, verbose=True)

[2025-01-27 13:27:33.366955] Search object created
[2025-01-27 13:27:33.397910] Request submitted
[2025-01-27 13:27:33.397957] Request ID: 067ff167-9931-4514-9fd6-5390c2ad795b
[2025-01-27 13:27:33.397968] Request details available at: https://api.aurorax.space/api/v1/conjunctions/requests/067ff167-9931-4514-9fd6-5390c2ad795b
[2025-01-27 13:27:33.397979] Waiting for data ...
[2025-01-27 13:27:34.825736] Checking for data ...
[2025-01-27 13:27:36.249177] Checking for data ...
[2025-01-27 13:27:37.663061] Checking for data ...
[2025-01-27 13:27:38.106492] Data is now available
[2025-01-27 13:27:38.106697] Retrieving data ...
[2025-01-27 13:27:39.173522] Retrieved 6.8 MB of data containing 820 records


Notice that the `values` parameter turned into a list, and the `operator` became 'in'. This is how we set an expression for multiple values. Each value is evaluated in the search engine as a logical OR; so this would find results where any Swarm spacecraft was either in the north polar cap OR in the north auroral oval. 

If we were to think back to the first example of an expression with a single value, the following way to write it would yield the same results.

Method 1: `expression1 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values="north polar cap", operator="=")`

Method 2: `expression1 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values=["north polar cap"], operator="in")`

# Multiple expressions

Let's build off the above example to look at doing searches with multiple expressions. As mentioned above, when doing an expression with multiple values, the search engine evaluates each value using a logical OR. What if we wanted it to evaluate using a logical AND?

We can achieve this using two expressions, each with a single value. The default `operator` for a `MetadataFilter` object (the parent object that expressions go into when creating a search object) is 'AND'. 

Let's adjust the above example to see how to do this.

In [8]:
# set timeframe and distance
start = datetime.datetime(2019, 2, 1, 0, 0, 0)
end = datetime.datetime(2019, 2, 10, 23, 59, 59)
distance = 500

# set ground criteria block
ground = [aurorax.search.GroundCriteriaBlock(programs=["themis-asi"])]

# set space criteria block, with a metadata filter
#
# now let's do multiple values for a single key
expression1 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values=["north polar cap"], operator="in")
expression2 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values=["north auroral oval"], operator="in")
metadata_filter = aurorax.search.MetadataFilter(expressions=[expression1, expression2],
                                                operator="and")  # AND is the default; we specify it just to be explicit
space = [aurorax.search.SpaceCriteriaBlock(programs=["swarm"], metadata_filters=metadata_filter)]

# perform search
s = aurorax.search.conjunctions.search(start, end, distance, ground=ground, space=space, verbose=True)

[2025-01-27 13:40:27.189376] Search object created
[2025-01-27 13:40:27.220755] Request submitted
[2025-01-27 13:40:27.220823] Request ID: 5177aba9-4a33-4a99-80df-9a015276c017
[2025-01-27 13:40:27.220841] Request details available at: https://api.aurorax.space/api/v1/conjunctions/requests/5177aba9-4a33-4a99-80df-9a015276c017
[2025-01-27 13:40:27.220855] Waiting for data ...
[2025-01-27 13:40:28.669283] Checking for data ...
[2025-01-27 13:40:29.097743] Data is now available
[2025-01-27 13:40:29.097993] Retrieving data ...
[2025-01-27 13:40:29.561111] Retrieved 5 Bytes of data containing 0 records


You'll notice that we found zero conjunctions! This is a 'duh' moment if we take a step back for a second...a spacecraft cannot be in both the north polar cap AND the north auroral oval at the same time!

What if we tweak this to find conjunctions where Swarm was in the north auroral oval, and the TII instrument was collecting data? We have this instrument operating information only for Swarm right now, but maybe we'll have more in the future!

In [9]:
# set timeframe and distance
start = datetime.datetime(2019, 2, 1, 0, 0, 0)
end = datetime.datetime(2019, 2, 10, 23, 59, 59)
distance = 500

# set ground criteria block
ground = [aurorax.search.GroundCriteriaBlock(programs=["themis-asi"])]

# set space criteria block, with a metadata filter
#
# now let's do multiple values for a single key
expression1 = aurorax.search.MetadataFilterExpression(key="nbtrace_region", values="north auroral oval", operator="=")
expression2 = aurorax.search.MetadataFilterExpression(key="tii_on", values="true", operator="=")
metadata_filter = aurorax.search.MetadataFilter(expressions=[expression1, expression2], operator="and")
space = [aurorax.search.SpaceCriteriaBlock(programs=["swarm"], metadata_filters=metadata_filter)]

# perform search
s = aurorax.search.conjunctions.search(start, end, distance, ground=ground, space=space, verbose=True)

[2025-01-27 13:44:35.141453] Search object created
[2025-01-27 13:44:35.169594] Request submitted
[2025-01-27 13:44:35.169651] Request ID: db0e9b29-db30-4778-b245-89f25785f369
[2025-01-27 13:44:35.169665] Request details available at: https://api.aurorax.space/api/v1/conjunctions/requests/db0e9b29-db30-4778-b245-89f25785f369
[2025-01-27 13:44:35.169678] Waiting for data ...
[2025-01-27 13:44:36.592280] Checking for data ...
[2025-01-27 13:44:37.012342] Data is now available
[2025-01-27 13:44:37.012487] Retrieving data ...
[2025-01-27 13:44:37.745755] Retrieved 3.3 MB of data containing 397 records


Horray, we found some conjunctions!

Remember with most conjunction searches, you can view the results directly in Swarm-Aurora using the `aurorax.search.conjunctions.swarmaurora.open_in_browser(s)` or `print(aurorax.search.conjunctions.swarmaurora.get_url(s))` lines of code. More info can be found in the conjunction searching notebook.


# Exploring numerical expression values and operators

For some metadata filter keys, the values are a numerical number. For example, the values for the `calgary_cloud_ml_v1` key are a string/list-of-strings, but the `calgary_cloud_ml_v1_confidence` key is a number between 0 and 100. To integrate these numerical keys into our expressions, we have a few different operators at our disposal: `=`, `!=`, `>`, `<`, `>=`, `<=`, and `between`.

Let's have a look at a simple example using the `>=` operator. We're going to find conjunctions with Swarm where the UCalgary cloud ML model thinks any THEMIS ASI data is not cloudy and that classification has a confidence of >= 75%.

In [3]:
# set timeframe, distance, and conjunction type
start = datetime.datetime(2020, 1, 1, 0, 0, 0)
end = datetime.datetime(2020, 1, 15, 23, 59, 59)
distance = 500

# set ground criteria block
ground = [
    aurorax.search.GroundCriteriaBlock(
        programs=["themis-asi"],
        metadata_filters=aurorax.search.MetadataFilter(expressions=[
            # only find records that were classified as not cloud
            aurorax.search.MetadataFilterExpression("calgary_cloud_ml_v1", "classified as not cloudy", operator="="),

            # with a confidence of at least 75%
            aurorax.search.MetadataFilterExpression("calgary_cloud_ml_v1_confidence", 75, operator=">=")
        ]))
]

# set space criteria block
space = [aurorax.search.SpaceCriteriaBlock(programs=["swarm"], hemisphere=["northern"])]

# perform the search
s = aurorax.search.conjunctions.search(
    start=start,
    end=end,
    distance=distance,
    ground=ground,
    space=space,
    verbose=True,
)

[2025-01-27 14:01:43.573001] Search object created
[2025-01-27 14:01:43.605433] Request submitted
[2025-01-27 14:01:43.605468] Request ID: 23cba3db-9142-4030-b700-155584069734
[2025-01-27 14:01:43.605478] Request details available at: https://api.aurorax.space/api/v1/conjunctions/requests/23cba3db-9142-4030-b700-155584069734
[2025-01-27 14:01:43.605487] Waiting for data ...
[2025-01-27 14:01:45.035903] Checking for data ...
[2025-01-27 14:01:45.454915] Data is now available
[2025-01-27 14:01:45.455092] Retrieving data ...
[2025-01-27 14:01:46.068603] Retrieved 1.6 MB of data containing 193 records


# Using the `between` operator

The `between` operator is a special case, different from the rest when constructing an expression. This is because this operator requires that the values be a list, and only contain two elements.

Let's have a look at an example similar to the one directly above. Instead of finding conjunctions where the ML model thinks the confidence is above a certain number, let's adjust that to be a confidence between two numbers.


In [5]:
# set timeframe, distance, and conjunction type
start = datetime.datetime(2020, 1, 1, 0, 0, 0)
end = datetime.datetime(2020, 1, 15, 23, 59, 59)
distance = 500

# set ground criteria block
ground = [
    aurorax.search.GroundCriteriaBlock(
        programs=["themis-asi"],
        metadata_filters=aurorax.search.MetadataFilter(expressions=[
            # only find records that were classified as not cloud
            aurorax.search.MetadataFilterExpression("calgary_cloud_ml_v1", "classified as not cloudy", operator="="),

            # with a confidence of at least 75%
            aurorax.search.MetadataFilterExpression("calgary_cloud_ml_v1_confidence", [75, 90], operator="between")
        ]))
]

# set space criteria block
space = [aurorax.search.SpaceCriteriaBlock(programs=["swarm"], hemisphere=["northern"])]

# perform the search
s = aurorax.search.conjunctions.search(
    start=start,
    end=end,
    distance=distance,
    ground=ground,
    space=space,
    verbose=True,
)

[2025-01-27 14:05:54.959859] Search object created
[2025-01-27 14:05:54.990465] Request submitted
[2025-01-27 14:05:54.990535] Request ID: e8b5eabd-e777-4765-aebd-f63866cfa9dd
[2025-01-27 14:05:54.990563] Request details available at: https://api.aurorax.space/api/v1/conjunctions/requests/e8b5eabd-e777-4765-aebd-f63866cfa9dd
[2025-01-27 14:05:54.990588] Waiting for data ...
[2025-01-27 14:05:56.412971] Checking for data ...
[2025-01-27 14:05:56.844238] Data is now available
[2025-01-27 14:05:56.844398] Retrieving data ...
[2025-01-27 14:05:57.340327] Retrieved 403.2 kB of data containing 48 records


# Ephemeris searching with metadata filters

When doing ephemeris searches instead of conjunction searches like we have been in this notebook, there is no difference with the `metadata_filters` parameter. All queries share the same way of doing metadata filters, so you can easily port over the above examples to retrieve ephemeris records.

For more examples, you can check out the [Perform ephemeris searches](https://github.com/aurorax-space/pyaurorax/tree/main/examples/notebooks/search/search_ephemeris.ipynb) and [Explore ML-enhanced conjunction and ephemeris searching](https://github.com/aurorax-space/pyaurorax/tree/main/examples/notebooks/search/search_ml_enhanced_searching.ipynb) example notebooks.

# Data product searching with metadata filters

When doing data product searches, again there is no difference with the `metadata_filters` parameter. The only difference is the keys and values for data product metadata filtering will be different than the ones used in conjunction or ephemeris searches. The underlying data is different, and therefore has different filters that are available. 

For more examples, you can check out the [Perform data product searches](https://github.com/aurorax-space/pyaurorax/tree/main/examples/notebooks/search/search_data_products.ipynb) example notebook.