-
Notifications
You must be signed in to change notification settings - Fork 6
Home
Data Science Research Architecture (DSRA), Data Center OS, is an implementation of Apache Mesos, providing performers with a wide range of distributed computing frameworks and ability to stage prepacked docker containers. They can build their own distributed processing system implementation by leveraging the Mesos architecture or deploy custom, long-running Docker applications into the cloud using Marathon. Docker embodies functionality similar to OpenStack, where performers can pre-package their applications for cluster deployment. Docker image deployments are managed using the Marathon web interface.
This new system is built around the following set of general requirements distilled from DEVOPS operations, hackathons, support requests, and general development efforts approximately over the past year:
- Improve hardware resource utilization
- Migrate from static to dynamic hardware resource allocation
- Elastic infrastructure
- Dynamic GPU utilization alternative to PCI pass-through
- Manage heterogeneous frameworks and standalone applications
- Performers utilize containerized applications (i.e. docker)
- Maximize support for performer’s tools and development languages
- Hadoop, Yarn, Spark, Julia, MPI, RHadoop, MongoDB
- Java, Python, C++
- Scalable & fault tolerant
- User self-service tools & API access
- Security & multi-tenancy
- Resource management in public clouds (i.e. Amazon, RackSpace, etc)
- Update framework independent of applications with no downtime
The goal of the new architecture is to consolidate the existing cloud infrastructure hardware (CPUs, GPUs, storage, memory, etc.) into one large computing environment and migrate data into federated HDFS data storage as we gravitate away from multiple static and standalone clouds at an unclassified level. Mesos will allow performers to concurrently spin-up distributed computing frameworks (i.e. Spark, Yarn, MPI, etc), complete the computing tasks, and tear them down on the fly. By consolidating the infrastructure, performers will be able to take advantage of the computing resources from multiple programs as one large cloud, while protecting access to sensitive data and applications.