Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Providing a hands-on, project-based experience for the on-line student of data science can be a challenge especially when the experience involves interaction with a real dataset. Our project explores the hosting of datasets on Microsoft Azure for purposes of project-based experiences in data science training. Our goal is to use the Azure cloud infrastructure to store and curate datasets for use in a secure environment. The larger goal is to select learning modules that support the hands-of learning by drawing from the MBDH community.

In this pilot we create and evaluate a training environment for data science that allows students to interact with data. For dataset upload, we draw on our earlier work, [Sustainable Environments Actionable Data - SEAD] (, funded through a grant from the National Science Foundation, and extend it to publish datasets to Azure through an environment that allows curation of datasets and post-deposit discovery.

This pilot project focuses on the following activity:

  • Data Deposit: Extend the SEAD data curation and publishing services to publish data to MS Azure so that data products gain additional metadata and receive a persistent identifier.
  • Training and Outreach events
  • Evaluation: Carry out an evaluation of the pilot including study of access control needed to protect the datasets; the access control issues with individual student computers; tools in place for tracking student activity and student signup.

Find the SEADTrain PID'ified Airbox Data Discovery User Interface below:


The materials were developed by the Data To Insight Center of Indiana University and are available at under a Creative Commons 4.0 license. The data used in this training exercise is made available in part through funding from the National Science Foundation under award #1234983. The Azure resources are funded through an award from Microsoft for Azure credits. All software is licensed under an Apache 2.0 license.


A pilot project to test publishing data for educational purposes



No releases published


No packages published