Skip to content

This project provides a serverless platform for the execution of Big Data workloads through Hadoop YARN in container-based clusters. An automated Ansible deployment and a web interface are also provided.

License

UDC-GAC/ServerlessYARN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ServerlessYARN

This project provides a platform for the execution of Big Data workloads through Hadoop YARN in container-based clusters.

The platform provides a serverless environment that supports Singularity/Apptainer containers, scaling their allocated resources to fit them according to the usage in real time.

It is provided an automatic way of deploying the platform through IaC tools such as Ansible, as well as a web interface to easily manage the platform and execute Big Data workloads. The serverless platform may be deployed on an existing cluster or a virtual Vagrant cluster for testing purposes.

More information

Óscar Castellanos-Rodríguez, Roberto R. Expósito, Jonatan Enes, Guillermo L. Taboada, Juan Touriño, Serverless-like platform for container-based YARN clusters. Future Generation Computer Systems, 155:256-271, February 2024.

Getting Started

Prerequisites

For the Vagrant virtual cluster deployment

  • Vagrant
  • VirtualBox
  • Vagrant plugins: vagrant-hostmanager, vagrant-reload

vagrant-reload plugin is only necessary when deploying nodes with cgroups V2

You may install the vagrant plugins with the following commands:

vagrant plugin install vagrant-hostmanager
vagrant plugin install vagrant-reload

For the existing cluster deployment

  • Python
  • Ansible
  • Passwordless SSH login between nodes

Only one node (master node) needs to have Ansible installed and a passwordless SSH login to the remaining ones

Quickstart

The platform need to be installed and deployed on a master node (or "server" node), while the containers will be deployed on the remaining nodes of the cluster.

  • You can clone this repository and the required frameworks with

    git clone --recurse-submodules https://github.com/UDC-GAC/ServerlessYARN.git
    
  • Once cloned, change directory to the root directory

    cd ServerlessYARN
    
  • Modify ansible/provisioning/config/config.yml to customize your environment.

  • You may deploy the virtual cluster with Vagrant (if needed):

    vagrant up
    

NOTE: You must ensure "id_rsa.pub" doesn't exist when executing "vagrant up" the first time (or after a "vagrant destroy")

  • Inside the server node (you may use "vagrant ssh" to log in if using a virtual cluster) go to the "ansible/provisioning/scripts" directory within the platform root directory (accessible from "/vagrant" on the virtual cluster). Then execute the scripts to install and set up all the necessary requirements for the platform and start its services:
    python3 load_inventory_from_conf.py
    bash start_all.sh
    

NOTE: When deploying on an existing cluster through SLURM, you may skip the execution of the "load_inventory_from_conf.py" script. The inventory will be automatically generated considering the available nodes. A sample script for sbatch is provided on the "slurm" directory.

  • Once you are done, you can shutdown the virtual cluster (if applicable) exiting the server node and executing:

    vagrant halt
    
  • Or you may destroy the virtual cluster with:

    vagrant destroy --force
    

Web Interface

Once done with the installation and launch, you can visit the web interface in your browser in {server-ip}:9000/ui (or the port specified in the config file instead of 9000 if modified).

You will see a Home page with 5 subpages:

  • Containers: here you can see and manage all the deployed containers
  • Hosts: here you can see and manage all hosts as well as their containers
  • Apps: here you can see and manage all apps as well as their associated containers
  • Services: here you can see and manage all services of the platform
  • Rules: here you can see and manage all scaling rules that are followed by the services

Used tools

  • Vagrant - IaC tool for deploying the virtual cluster
  • VirtualBox - VM Software to support the machines of the cluster
  • Ansible - Configuration Management Tool
  • Apptainer - Singularity/Apptainer Containers management tool
  • Django - Web development framework
  • Python - Programming language
  • Serverless Containers - Container resource scaling framework
  • BDWatchdog - Resource monitoring framework

Authors

License

This project is distributed as free software and is publicly available under the GNU GPLv3 license (see the LICENSE file for more details).

About

This project provides a serverless platform for the execution of Big Data workloads through Hadoop YARN in container-based clusters. An automated Ansible deployment and a web interface are also provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published