See it in action: https://meta.cafltar.org
Midden is a data catalog that is easily adaptable to fit within a typical researcher's workflow. Midden solves the problem of "who is collecting what data, and where can I access it?" that is the bane of many large academic research projects.
There are numerous solutions for cataloging data to make it discoverable. Many of these solutions, however, require technical knowledge that is uncommon among academic researchers. Many researchers feel at ease with managing their data workflow through their native filesystem (see here, here, here). Despite that, many data catalogs require complex solutions that is more typical in teams of data scientists and data engineers.
Midden meets researchers where they are comfortable.
Midden is a suite of three tools; an editor, a crawler, and a data catalog. The editor and catalog are static web apps that can be hosted for free from various providers (Github, Azure, netlify) and the crawler is a cross-platform command line interface.
Please visit the Wiki if you intend to actually use Midden.
Midden has a metadata editor that supports fields common in many standard metadata formats including contact info, data dictionaries, methods, tags, spatial information, and much more.
Midden has a cross-platform command-line interface with commands to crawl various data stores (local file system, Google Workspace Shared drive, Azure Data Lake Gen 2, more coming soon...) and collates all metadata into a single file.
Midden supports viewing all metadata through a rich interactive interface that supports global search through datasets and variables.
- Researcher does magic, creates a dataset
- Researcher uses the Editor to create metadata then downloads and saves the file with the dataset
- Researcher (or data manager, or an automated script) runs the Crawler
- The data catalog is updated with the new metadata
- Collaborators find data using the Catalog, rejoice
Insights Dashboard shows interesting statistics of your data holdings.
Variable Data Catalog allows searching for specific variables across all datasets.
Dataset Details shows all metadata of a given dataset.
-
Fork the repository
-
Create a local clone
git clone https://github.com/{your-account}/midden.git
-
Make it yours by editing Midden/Caf.Midden.Wasm/wwwroot/app-config.json
-
Publish to a free static website host (Github Pages, netlify, Azure Static Web Apps)
-
Create metadata, crawl metadata, update catalog.json, push changes to Github, enjoy the envy of your peers.
A midden is a refuse heap created by various entities such as packrats, earthworms, and human societies. It is also a rich source of information for scientists trying to study a system. The Midden Data Catalog takes datasets without any context (i.e. refuge) and helps apply metadata so it becomes information; a digital midden, if you will.
Please help us make this tool better. We welcome Issues and Pull Requests. Contribution guidelines coming soon...
Original work was supported by the R.J. Cook Agronomy Farm, a member of the USDA Long-Term Agroecological Research network.
The Midden web tools rely heavily on the UI component library Ant Design Blazor.
As a work of the United States government, this project is in the public domain within the United States.
Additionally, we waive copyright and related rights in the work worldwide through the CC0 1.0 Universal public domain dedication.