Skip to content
A pilot project to test publishing data for educational purposes
Branch: master
Clone or download
Latest commit 52ca93b Feb 5, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
images
mqtt-client
sead-client
training/SEADTrain-introduction
README.md

README.md

SEADTrain

Providing a hands-on, project-based experience for the on-line student of data science can be a challenge especially when the experience involves interaction with a real dataset. Our project explores the hosting of datasets on Microsoft Azure for purposes of project-based experiences in data science training. Our goal is to use the Azure cloud infrastructure to store and curate datasets for use in a secure environment. The larger goal is to select learning modules that support the hands-of learning by drawing from the MBDH community.

In this pilot we create and evaluate a training environment for data science that allows students to interact with data. For dataset upload, we draw on our earlier work, [Sustainable Environments Actionable Data - SEAD] (http://sead-data.net), funded through a grant from the National Science Foundation, and extend it to publish datasets to Azure through an environment that allows curation of datasets and post-deposit discovery.

This pilot project focuses on the following activity:

  • Data Deposit: Extend the SEAD data curation and publishing services to publish data to MS Azure so that data products gain additional metadata and receive a persistent identifier.
  • Training and Outreach events
  • Evaluation: Carry out an evaluation of the pilot including study of access control needed to protect the datasets; the access control issues with individual student computers; tools in place for tracking student activity and student signup.

Find the SEADTrain PID'ified Airbox Data Discovery User Interface below:
https://data-to-insight-center.github.io/SEADTrain/

Contributing

The materials were developed by the Data To Insight Center of Indiana University and are available at https://figshare.com/articles/SEADTrain_Data_Analysis/6873800 under a Creative Commons 4.0 license. The data used in this training exercise is made available in part through funding from the National Science Foundation under award #1234983. The Azure resources are funded through an award from Microsoft for Azure credits. All software is licensed under an Apache 2.0 license.

You can’t perform that action at this time.