## Indexing

Indexing is used to make searches more efficient. When there are no indices, any search goes through each full document in order to find the specified results. If a search uses indexed fields, it can perform more efficiently by going only through the indexed field(s), and then retrieving only the matching documents without having to go through the full collection. However, not everything needs to be indexed, since the storage usage can grow substantially when too many indexes are created. It is generally recommended to limit the number of indices to the most commonly used fields.

For the examples in this tutorial, neither the gain in search speed nor the extra storage required will be noticeable, but for large collections this has to be taken into account.

Every collection in MongoDB comes indexed by `_id` by default, but new indices can be created at any time and there are different kinds of indices.

In this section we'll be using a sample of ALeRCE-like objects that should be loaded into the database if the instructions in `README.md` have been followed. For an example document in this collection:

In [2]:
from pprint import pprint
from pymongo import MongoClient

client = MongoClient(host='localhost', port=27017, username='mongo', password='mongo')

objects = client.alerce.objects  # This is the collection we'll be using

pprint(objects.find_one())

{'_id': 'AL17kydexvudyfzwq',
 'e_dec': 6.98509819165943e-05,
 'e_ra': 0.000167809096646232,
 'firstmjd': 58366.4399768999,
 'lastmjd': 59542.2826156998,
 'loc': {'coordinates': [-112.883127892771, 52.2932219572289], 'type': 'Point'},
 'meandec': 52.2932219572289,
 'meanra': 67.1168721072289,
 'ndet': 166.0,
 'oid': ['ZTF17aaaacji'],
 'probabilities': [{'class_name': 'AGN',
                    'classifier_name': 'stamp_classifier',
                    'classifier_version': 'stamp_classifier_1.0.0',
                    'probability': 0.086826704,
                    'ranking': 2.0},
                   {'class_name': 'asteroid',
                    'classifier_name': 'stamp_classifier',
                    'classifier_version': 'stamp_classifier_1.0.0',
                    'probability': 0.059574142,
                    'ranking': 4.0},
                   {'class_name': 'bogus',
                    'classifier_name': 'stamp_classifier',
                    'classifier_version': 'stamp_cla

**Note:** The objects in the actual ALeRCE database do not exactly follow this structure. We're using only a simplified version here to show these concepts.

### Creating basic indices

To create a simple index over a field, the structure is similar to what we saw for the `sort` option in the `find` method. It should be a list of pairs with a direction given by `1` or `-1` (ascending or descending, respectively).

In [None]:
from pymongo import IndexModel

objects.create_index([('firstmjd', 1)])

The return value correspond to the name given to the index in the collection.

In [None]:
objects.index_information()

The indices here are given by index name and have information about the field, order and type of index (the `v` field refers to the version).

One can also drop an index based on the index name:

In [None]:
objects.drop_index('firstmjd_1')
objects.index_information()

At creation, an index can also be given a custom name, instead of relying on MongoDB for the name:

In [None]:
objects.create_index([('firstmjd', 1)], name='first_detection_date')

In the case of the field `probabilities`, we have the case of a an array with nested documents. If we wanted to create an index for a field inside it, we can use dot notation:

In [None]:
objects.create_index([('probabilities.probability', -1)], name='probs')

As for other common options, it is possible to demand that the elements of the index are unique:

In [None]:
objects.create_index([('oid', 1)], unique=True)

Or creating a partial index (only indexes documents that fullfill a given condition):

In [None]:
objects.create_index([('lastmjd', 1)], partialFilterExpression={'ndet': {'$gt': 100}})

The above creates an index over `lastmjd`, but only for documents with `ndet` greater than 100.

For now, we'll just remove all indices (note that this will never remove the index over `_id`):

In [None]:
objects.drop_indexes()

### Geospatial indexing

Some fields can have a special indexing, such as the location on a sphere. This is why the field `loc` has the form it does:

In [5]:
doc = objects.find_one()

print(f'loc: {doc["loc"]}')
print(f'RA: {doc["meanra"]}; RA - 180: {doc["meanra"] - 180}')
print(f'Dec: {doc["meandec"]}')

loc: {'type': 'Point', 'coordinates': [-112.883127892771, 52.2932219572289]}
RA: 67.1168721072289; RA - 180: -112.8831278927711
Dec: 52.2932219572289


The first value of `coordinates` inside `loc` corresponds to RA minus 180, while Dec remains the same. This is because the format with `type` and `coordinates` is defined by [GeoJSON](https://www.mongodb.com/docs/manual/reference/geojson/) and normally used for latitude/longitude coordinates (thus why we need to use RA minus 180). Using the GeoJSON notation allows us index over the sphere and perform cone-searches over the coordinates (which we'll see later on). For this, the index cannot just be ascending or descending and have to use the value `2dsphere`:

In [6]:
objects.create_index([('loc', '2dsphere')])

'loc_2dsphere'

### Compund indices

It is also possible to create indices over multiple fields at a time, with them being sorted by in order (latter indices fixing clashes over the first). This can be very useful depending on the type of search. For instance, we'll create an index over the classifier name and version, so that all versions are in order, but first sorted by classifier (for each document):

In [5]:
objects.create_index([('probabilities.classifier_name', 1), ('probabilities.classifier_version', 1)])  # The list now has a second element (the secondary index)

'probabilities.classifier_name_1_probabilities.classifier_version_1'

Again we'll clean up all the indices:

In [6]:
objects.drop_indexes()

## Projections (selecting fields for output)

The `find` and `find_one` methods have some additional functionality that we'll discuss now. Besides the dictionary with the query filters, a second argument can be passed with another dictionary for "projection". These projections allow for different manipulations of the output documents, without modifying them on the database. For instance, if there is only one field of interest in the output:

In [7]:
docs = objects.find({'ndet': {'$gte': 100}}, {'firstmjd': True, 'lastmjd': True})

for doc in docs:
    print(doc)

{'_id': 'AL17kydexvudyfzwq', 'lastmjd': 59542.2826156998, 'firstmjd': 58366.4399768999}
{'_id': 'AL17kyhickkapibwi', 'lastmjd': 59530.3737846999, 'firstmjd': 58450.4007060002}
{'_id': 'AL17msbqengdzwbtk', 'lastmjd': 59520.1848958, 'firstmjd': 58288.4348148}
{'_id': 'AL17kvgayignexklo', 'lastmjd': 59540.3450694, 'firstmjd': 58363.4715161999}
{'_id': 'AL17kykzgxdgqsntc', 'lastmjd': 59538.2855556002, 'firstmjd': 58348.4711342999}
{'_id': 'AL17ktitbgrfqqhkq', 'lastmjd': 59540.4023263999, 'firstmjd': 58336.4893518998}
{'_id': 'AL17kvzjrikyiplhk', 'lastmjd': 59540.4116898002, 'firstmjd': 58338.4519213}
{'_id': 'AL17lasirdapyanxs', 'lastmjd': 59542.3199421, 'firstmjd': 58343.4893749999}
{'_id': 'AL17ldcrbgfdyppbw', 'lastmjd': 59550.375, 'firstmjd': 58423.4193402999}
{'_id': 'AL17laogqkaumzppg', 'lastmjd': 59542.3199421, 'firstmjd': 58342.4907986}
{'_id': 'AL17ldheheiulmpbw', 'lastmjd': 59550.3538079001, 'firstmjd': 58423.3785068998}
{'_id': 'AL17ldghniumrlnpo', 'lastmjd': 59530.4050925998, 'f

Here we've selected only objects with more than 100 detections, but are only interested in the first and last MJD. As you can see, even if not explicitly selected, the `_id` field will be carried by default. This behaviour can be changed:

In [8]:
docs = objects.find({'ndet': {'$gte': 100}}, {'firstmjd': True, 'lastmjd': True, '_id': False})

for doc in docs:
    print(doc)

{'lastmjd': 59542.2826156998, 'firstmjd': 58366.4399768999}
{'lastmjd': 59530.3737846999, 'firstmjd': 58450.4007060002}
{'lastmjd': 59520.1848958, 'firstmjd': 58288.4348148}
{'lastmjd': 59540.3450694, 'firstmjd': 58363.4715161999}
{'lastmjd': 59538.2855556002, 'firstmjd': 58348.4711342999}
{'lastmjd': 59540.4023263999, 'firstmjd': 58336.4893518998}
{'lastmjd': 59540.4116898002, 'firstmjd': 58338.4519213}
{'lastmjd': 59542.3199421, 'firstmjd': 58343.4893749999}
{'lastmjd': 59550.375, 'firstmjd': 58423.4193402999}
{'lastmjd': 59542.3199421, 'firstmjd': 58342.4907986}
{'lastmjd': 59550.3538079001, 'firstmjd': 58423.3785068998}
{'lastmjd': 59530.4050925998, 'firstmjd': 58357.4957869998}
{'lastmjd': 59498.3893634002, 'firstmjd': 58343.4893749999}
{'lastmjd': 59550.2775809998, 'firstmjd': 58443.3156249998}
{'lastmjd': 59540.4386343001, 'firstmjd': 58443.2944676001}
{'lastmjd': 59532.4202777999, 'firstmjd': 58370.515625}
{'lastmjd': 59542.1187153002, 'firstmjd': 58346.3351968001}
{'lastmjd': 

If at least one of the projected fields is explicitly selected, all the other will be implicitly removed. The oposite is also true:

In [10]:
docs = objects.find({'ndet': {'$gte': 100}}, {'probabilities': False, '_id': False})

for doc in docs:
    print(doc)

{'oid': ['ZTF17aaaacji'], 'lastmjd': 59542.2826156998, 'firstmjd': 58366.4399768999, 'ndet': 166.0, 'loc': {'type': 'Point', 'coordinates': [-112.883127892771, 52.2932219572289]}, 'meanra': 67.1168721072289, 'meandec': 52.2932219572289, 'e_ra': 0.000167809096646232, 'e_dec': 6.98509819165943e-05, 'tid': ['ZTF']}
{'oid': ['ZTF17aaaacpo'], 'lastmjd': 59530.3737846999, 'firstmjd': 58450.4007060002, 'ndet': 226.0, 'loc': {'type': 'Point', 'coordinates': [-111.458553317257, 7.32192741061947]}, 'meanra': 68.5414466827434, 'meandec': 7.32192741061947, 'e_ra': 4.63841947555995e-05, 'e_dec': 5.73385615644401e-05, 'tid': ['ZTF']}
{'oid': ['ZTF17aaaajgn'], 'lastmjd': 59520.1848958, 'firstmjd': 58288.4348148, 'ndet': 345.0, 'loc': {'type': 'Point', 'coordinates': [138.677740238551, 48.5400606710145]}, 'meanra': 318.677740238551, 'meandec': 48.5400606710145, 'e_ra': 8.30365847738501e-05, 'e_dec': 8.06191525756549e-05, 'tid': ['ZTF']}
{'oid': ['ZTF17aaaampi'], 'lastmjd': 59540.3450694, 'firstmjd': 5

The above includes every field, except for `probabilities` and `_id`. Note that it is not possible to mix inclusion and exclusion of fields, except for the case of `_id`:

In [12]:
# Will fail due to mixing inclusion and exclusion
docs = objects.find({'ndet': {'$gte': 100}}, {'probabilities': False, 'firstmjd': True})
# However, it will only fail at this stage
for doc in docs:
    print(doc)

OperationFailure: Cannot do inclusion on field firstmjd in exclusion projection, full error: {'ok': 0.0, 'errmsg': 'Cannot do inclusion on field firstmjd in exclusion projection', 'code': 31253, 'codeName': 'Location31253'}

It is possible also to change the names of fields using the new name as key and the old name as value (with `$` before the name):

In [20]:
docs = objects.find({'ndet': {'$gte': 100}}, {'detections': '$ndet', 'firstmjd': True})

# Renamed ndet to detections
for doc in docs:
    print(doc)

{'_id': 'AL17kydexvudyfzwq', 'firstmjd': 58366.4399768999, 'detections': 166.0}
{'_id': 'AL17kyhickkapibwi', 'firstmjd': 58450.4007060002, 'detections': 226.0}
{'_id': 'AL17msbqengdzwbtk', 'firstmjd': 58288.4348148, 'detections': 345.0}
{'_id': 'AL17kvgayignexklo', 'firstmjd': 58363.4715161999, 'detections': 144.0}
{'_id': 'AL17kykzgxdgqsntc', 'firstmjd': 58348.4711342999, 'detections': 196.0}
{'_id': 'AL17ktitbgrfqqhkq', 'firstmjd': 58336.4893518998, 'detections': 893.0}
{'_id': 'AL17kvzjrikyiplhk', 'firstmjd': 58338.4519213, 'detections': 1134.0}
{'_id': 'AL17lasirdapyanxs', 'firstmjd': 58343.4893749999, 'detections': 565.0}
{'_id': 'AL17ldcrbgfdyppbw', 'firstmjd': 58423.4193402999, 'detections': 197.0}
{'_id': 'AL17laogqkaumzppg', 'firstmjd': 58342.4907986, 'detections': 378.0}
{'_id': 'AL17ldheheiulmpbw', 'firstmjd': 58423.3785068998, 'detections': 126.0}
{'_id': 'AL17ldghniumrlnpo', 'firstmjd': 58357.4957869998, 'detections': 205.0}
{'_id': 'AL17larptsionfnlg', 'firstmjd': 58343.4

It is also possible to project embedded documents:

In [23]:
docs = objects.find({'ndet': {'$gte': 100}}, {'probability': '$probabilities.probability', '_id': False})

# Renamed ndet to detections
for doc in docs:
    print(doc)

{'probability': [0.086826704, 0.059574142, 0.07940478, 0.055563178, 0.7186312, 0.19460636, 0.07008195, 0.12235069, 0.06864654, 0.5443145, 0.00032, 0.00512, 0.103416, 0.014, 0.002, 0.95]}
{'probability': [0.12818314, 0.051327016, 0.04167947, 0.057408877, 0.7214016, 0.18147986, 0.050509796, 0.05055209, 0.04540265, 0.6720556, 0.278, 0.252, 0.182, 0.288, 0.074, 0.07, 0.016, 0.0, 0.012, 0.828, 0.0, 0.0, 0.074, 0.916, 0.002, 0.042]}
{'probability': [0.256, 0.152, 0.294, 0.298, 0.15338813, 0.045076847, 0.099779636, 0.028416526, 0.67333895, 0.12405841, 0.051031448, 0.20627593, 0.11432854, 0.50430566, 0.000512, 0.00064, 0.054096, 0.324, 0.014, 0.626, 0.056, 0.148, 0.596, 0.0, 0.074, 0.126]}
{'probability': [0.09212254, 0.03309303, 0.060435995, 0.042847354, 0.77150106, 0.17492342, 0.05852321, 0.10463365, 0.048219666, 0.61370003, 0.052, 0.002, 0.034, 0.434, 0.472, 0.006, 0.00224, 0.00672, 0.039728, 0.002, 0.0, 0.958]}
{'probability': [4e-05, 5.6e-05, 0.127744, 0.578, 0.004, 0.37, 0.998, 0.002, 0.

The projections can also limit the number of elements returned from an array:

In [17]:
docs = objects.find({'probabilities.ranking': 1}, {'probabilities.$': True, '_id': False})

for doc in docs:
    print(doc)

{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7186312, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7214016, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'lc_classifier_top', 'classifier_version': 'hierarchical_rf_1.1.0', 'class_name': 'Periodic', 'probability': 0.926, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'lc_classifier_transient', 'classifier_version': 'hierarchical_rf_1.1.0', 'class_name': 'SNII', 'probability': 0.298, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.77150106, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'lc_classifier', 'classifier_version': 'hierarchical_rf_1.1.0', 'class_name': 'CV,Nova', 'probabili

The `$` operator seen above will only return the first element of the array that matches the query,  even if more than one element does. For this reason it requires for the array to actually be used within the query.

For more control over the returned element, there is also the `$elemMatch` projection operator:

In [8]:
docs = objects.find(
    {
        'ndet': {'$gte': 100}
    }, 
    {
        '_id': False,
        'probabilities': {
            '$elemMatch': {  # The value for $elemMatch has the form of a query, over the fields inside the elements
                'classifier_name': 'stamp_classifier',
                'ranking': 1
            }
        },
    })

for doc in docs:
    print(doc)

{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7186312, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7214016, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.67333895, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.77150106, 'ranking': 1.0}]}
{}
{}
{}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7479756, 'ranking': 1.0}]}
{'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probab

**Note:** If more than one element of `$elemMatch` meets the criteria, only the first match will be returned.

The objects where no element matches the requirements for the projection are still returned, but are now empty. This is because they still fullfill the main query in the find command. Also, in this case there is no limitation for the main query to include the array used in the projection.

In order to retrieve all matching elements, it is better to use `$filter`:

In [12]:
docs = objects.find(
    {
        'ndet': {'$gte': 100}
    }, 
    {
        '_id': False,
        'probs': {  # This is the name of the output array (can be anything)
            '$filter': {
                'input': '$probabilities',  # This is the name of the input array
                'cond': {  # The condition that needs to match
                    '$and': [
                        {'$eq': ['$$this.ranking', 1]},
                        {'$eq': ['$$this.classifier_name', 'stamp_classifier']}
                    ]
                }
            }
        },
    })

for doc in docs:
    print(doc)

{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7186312, 'ranking': 1.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'VS', 'probability': 0.5443145, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7214016, 'ranking': 1.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'VS', 'probability': 0.6720556, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.67333895, 'ranking': 1.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'VS', 'probability': 0.50430566, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'clas

The queries are different this time. The condition (`cond`) for `$filter` must be a single query, using an operator as the key. That's why we are explicitly using `$and`. Additionally, the conditions within the `$and` are still passed as a list of dictionaries, but now the operator is the key, while the value is a two element list, with the required field in the first position and the value used for the operator in the second position. 

The name `$$this` refers to elements of the array defined in `input`. When passing the name of the array to input, it must be preceeded by `$`. It is possible to change the name from `this` to something else using the option `as`:

In [13]:
docs = objects.find(
    {
        'ndet': {'$gte': 100}
    }, 
    {
        '_id': False,
        'probs': {
            '$filter': {
                'input': '$probabilities',
                'as': 'element',  # New name for items
                'cond': {
                    '$and': [
                        {'$eq': ['$$element.ranking', 1]},  # now using 'element' instead of 'this'
                        {'$eq': ['$$element.classifier_name', 'stamp_classifier']}
                    ]
                }
            }
        },
    })

for doc in docs:
    print(doc)

{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7186312, 'ranking': 1.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'VS', 'probability': 0.5443145, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.7214016, 'ranking': 1.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'VS', 'probability': 0.6720556, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'VS', 'probability': 0.67333895, 'ranking': 1.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'VS', 'probability': 0.50430566, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'clas

## Array queries

So far we've seen some queries over simple fields. Queries over arrays sometimes work in unexpected ways. First of, a simple query (without projection) will always return the full object, not just the matching elements:

In [30]:
docs = objects.find({'probabilities.classifier_name': 'stamp_classifier'})

print(docs[1])

{'_id': 'AL17kyhickkapibwi', 'oid': ['ZTF17aaaacpo'], 'lastmjd': 59530.3737846999, 'firstmjd': 58450.4007060002, 'ndet': 226.0, 'loc': {'type': 'Point', 'coordinates': [-111.458553317257, 7.32192741061947]}, 'meanra': 68.5414466827434, 'meandec': 7.32192741061947, 'e_ra': 4.63841947555995e-05, 'e_dec': 5.73385615644401e-05, 'tid': ['ZTF'], 'probabilities': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'AGN', 'probability': 0.12818314, 'ranking': 2.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'asteroid', 'probability': 0.051327016, 'ranking': 4.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'bogus', 'probability': 0.04167947, 'ranking': 5.0}, {'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.0', 'class_name': 'SN', 'probability': 0.057408877, 'ranking': 3.0}, {'classifier_name': 

A projection is needed to limit the results within an array. Furthermore, in the following query one might expect to select only objects classed as AGN with the highest probability on the stamp classifier:

In [29]:
docs = objects.find(
    {
        'probabilities.classifier_name': 'stamp_classifier',
        'probabilities.ranking': 1,
        'probabilities.class_name': 'AGN'
    }, 
    {  # We're using the projection to get only the elements that actually match the query
        '_id': False,
        'probs': {
            '$filter': {
                'input': '$probabilities',
                'cond': {
                    '$and': [
                        {'$eq': ['$$this.ranking', 1]},
                        {'$eq': ['$$this.classifier_name', 'stamp_classifier']},
                        {'$eq': ['$$this.class_name', 'AGN']}
                    ]
                }
            }
        },
    }
)

for doc in docs:
    print(doc)

{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': []}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'AGN', 'probability': 0.38717097, 'ranking': 1.0}]}
{'probs': []}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'AGN', 'probability': 0.56945103, 'ranking': 1.0}]}
{'probs': []}
{'probs': []}


Why are we getting empty arrays?

The answer is that, by concatenating queries over array fields, they will return documents where at least one element of the array fullfills each condition *independently*. In other words, the above will match as long as an element of the array has a has stamp classifier as the classifier name, a ranking one and a class name of AGN, *even if each condition is fullfilled by a different element*.

To make sure that a given element matches the condition simultaneously we can use the `$elemMatch` operator for queries:

In [31]:
docs = objects.find(
    {
        'probabilities': {
            '$elemMatch': {
                'classifier_name': 'stamp_classifier',
                'ranking': 1,
                'class_name': 'AGN'
            }
        }
    }, 
    {  # We're using the projection to get only the elements that actually match the query
        '_id': False,
        'probs': {
            '$filter': {
                'input': '$probabilities',
                'cond': {
                    '$and': [
                        {'$eq': ['$$this.ranking', 1]},
                        {'$eq': ['$$this.classifier_name', 'stamp_classifier']},
                        {'$eq': ['$$this.class_name', 'AGN']}
                    ]
                }
            }
        },
    }
)

for doc in docs:
    print(doc)

{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'AGN', 'probability': 0.38717097, 'ranking': 1.0}]}
{'probs': [{'classifier_name': 'stamp_classifier', 'classifier_version': 'stamp_classifier_1.0.4', 'class_name': 'AGN', 'probability': 0.56945103, 'ranking': 1.0}]}


## Geospatial queries

If a geospatial index is being used on a field, it allows us to do geospatial queries. While multiple types of searches are possible depending on the geometries defined, we will focus only on cone-searches, which are the more relevent for usage within ALeRCE. For other types of geospatial queries, see [here](https://www.mongodb.com/docs/manual/reference/operator/query-geospatial/).

First we need to create a geospatial index:

In [7]:
objects.create_index([('loc', '2dsphere')])

'loc_2dsphere'

Now, to search for elements within a circle over the sphere we use the operator `$geoWithin`. Inside the operator there are multiple options that can be used, but in the case of the circle we use `$centerSphere`, which has as value an array. The first element is an array of coordinates (latitude and longitude, or RA minus 180 and Dec, always in degrees) and the second element is the radius (in radians):

In [9]:
docs = objects.find(
    {
        'loc': {
            '$geoWithin': {
                '$centerSphere': [[67 - 180, 52], 3.14 / 180]
            }
        }
    },
    {
        'meanra': True,
        'meandec': True,
        '_id': False
    }
)

for doc in docs:
    print(doc)

{'meanra': 67.1168721072289, 'meandec': 52.2932219572289}


It is also possible to use other geometries, for instance a `$box`. This uses an array with two arrays, representing oposite corners of the box:

In [None]:
docs = objects.find(
    {
        'loc': {
            '$geoWithin': {
                '$box': [[67 - 180, 52], [68 - 180, 53]]
            }
        }
    },
    {
        'meanra': True,
        'meandec': True,
        '_id': False
    }
)

for doc in docs:
    print(doc)