# Database access example
A simple example of how to access the AIND Document Database using the AIND Data Access API.  

Demonstrates:  
* How to import the MetadataDbClient from the aind_data_access_api
* How to configure the client to connect to the dev database
* How to use the retrieve_docdb_records method to query the database
* How to use the aggregate_docdb_records method to aggregate data from the database




## Imports
Make sure to install the aind_data_access_api using `pip install "aind-data-access-api[docdb]"`

In [18]:
from aind_data_access_api.document_db import MetadataDbClient
import pandas as pd

# configure pandas to display all columns
pd.set_option('display.max_columns', None)



## Connect to the metadata database



In [3]:
client = MetadataDbClient(
    host="api.allenneuraldynamics.org", # example to connect to dev docdb:"api.allenneuraldynamic-test.org"
    database="metadata_index", # example to connect to test database:"test"
    collection="data_assets", # this would be the collection name. must fall under the database name specified above.
)

## create a simple query to demonstrate the 'retrieve_docdb_records' method
Demonstrates the use of the 'retrieve_docdb_records' method to query the database.  

This method is useful for retrieving a single record or a list of records from the database.  

Here we are using a simple filter query to retrieve all records where the 'data_description.project_name' field is "Thalamus in the middle".  

We are also using the 'projection' parameter to specify that we only want to retrieve the 'data_description' field.  

In [None]:


query = {
    "data_description.project_name": "Thalamus in the middle"
}


results = client.retrieve_docdb_records(
    filter_query=query,
    projection={"data_description": 1}
)

len(results)

352

## Convert the results to a pandas dataframe
The result is a list of nested dictionaries, so we can use the json_normalize method to convert it to a dataframe with column names in dot notation.  

In [16]:
# results is a list of dictionaries, so we can normalize it directly
pd.json_normalize(results)

Unnamed: 0,_id,data_description.creation_time,data_description.data_level,data_description.data_summary,data_description.describedBy,data_description.funding_source,data_description.group,data_description.input_data_name,data_description.institution.abbreviation,data_description.institution.name,data_description.institution.registry.abbreviation,data_description.institution.registry.name,data_description.institution.registry_identifier,data_description.investigators,data_description.label,data_description.license,data_description.modality,data_description.name,data_description.platform.abbreviation,data_description.platform.name,data_description.process_name,data_description.project_name,data_description.related_data,data_description.restrictions,data_description.schema_version,data_description.subject_id,data_description.creation_date,data_description.experiment_type,data_description.project_id
0,c7cd72a7-db43-4a37-bbbd-b6e1e8e60327,2024-10-08T10:27:18.270423Z,derived,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': None, 'funder': {'abbreviation': '...",,SmartSPIM_755809_2024-10-02_01-24-27,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Mathew Summer...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_755809_2024-10-02_01-24-27_stitched_...,SmartSPIM,SmartSPIM platform,stitched,Thalamus in the middle,[],,1.0.0,755809,,,
1,31721054-361f-4ff0-a8da-f7f77b2871c5,2024-12-05T19:47:16-08:00,raw,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_763080_2024-12-05_19-47-16,SmartSPIM,SmartSPIM platform,,Thalamus in the middle,[],,1.0.3,763080,,,
2,c51dbc32-774f-4847-a12a-fa5d50ec20c5,2024-12-05T16:10:07-08:00,raw,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_763079_2024-12-05_16-10-07,SmartSPIM,SmartSPIM platform,,Thalamus in the middle,[],,1.0.3,763079,,,
3,83aa7ae2-2aae-47c6-8664-aad12e4bdacd,2023-08-01 20:56:44-07:00,derived,,https://raw.githubusercontent.com/AllenNeuralD...,[{'funder': {'name': 'National Institute of Ne...,MSMA,SmartSPIM_678706_2023-06-28_16-43-04,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'name': 'Mathew Summers', 'abbreviation': No...",,CC-BY-4.0,[{'name': 'Selective plane illumination micros...,SmartSPIM_678706_2023-06-28_16-43-04_stitched_...,SmartSPIM,SmartSPIM platform,stitched,Thalamus in the middle,[],,1.0.4,678706,,,
4,3754c6dc-1e58-47cd-b8fd-bc0249697454,2024-11-22T14:38:12-08:00,raw,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_741213_2024-11-22_14-38-12,SmartSPIM,SmartSPIM platform,,Thalamus in the middle,[],,1.0.1,741213,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
347,008f4f26-9fd1-4480-aa05-259fbdaeac04,2025-02-08T08:49:53.997682Z,derived,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,SmartSPIM_776193_2025-01-17_02-15-13,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_776193_2025-01-17_02-15-13_stitched_...,SmartSPIM,SmartSPIM platform,stitched,Thalamus in the middle,[],,1.0.4,776193,,,
348,e7b4a712-7347-4ab6-b2e8-99b8595d5d82,2025-02-08T10:00:56.936201Z,derived,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,SmartSPIM_776192_2025-01-16_21-37-50,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_776192_2025-01-16_21-37-50_stitched_...,SmartSPIM,SmartSPIM platform,stitched,Thalamus in the middle,[],,1.0.4,776192,,,
349,0cd0107a-83bb-4708-8bfd-be26660cca27,2025-02-08T09:57:41.772777Z,derived,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,SmartSPIM_767171_2025-01-02_20-28-03,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_767171_2025-01-02_20-28-03_stitched_...,SmartSPIM,SmartSPIM platform,stitched,Thalamus in the middle,[],,1.0.4,767171,,,
350,e7fbf44f-f256-4a13-9631-af0bc6f28625,2025-02-08T08:52:55.804230Z,derived,,https://raw.githubusercontent.com/AllenNeuralD...,"[{'fundee': 'Han Hou, Jayaram Chandrashekar, K...",,SmartSPIM_768517_2025-01-03_10-42-25,AIND,Allen Institute for Neural Dynamics,ROR,Research Organization Registry,04szwah67,"[{'abbreviation': None, 'name': 'Han Hou', 're...",,CC-BY-4.0,"[{'abbreviation': 'SPIM', 'name': 'Selective p...",SmartSPIM_768517_2025-01-03_10-42-25_stitched_...,SmartSPIM,SmartSPIM platform,stitched,Thalamus in the middle,[],,1.0.4,768517,,,


## Pipeline aggregation example
Demonstrates the use of the 'aggregate_docdb_records' method to aggregate data from the database.  

Here we are using a pipeline to count the number of records in the 'thalamus in the middle' project.  

In [4]:
## count how many records there are in the thalamus in the middle project

pipeline = [
    {
        "$match": {
            "data_description.project_name": "Thalamus in the middle"
        }
    },
    {
        "$count": "total"
    }
]

results = client.aggregate_docdb_records(pipeline=pipeline)
results

[{'total': 352}]