Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omnia Release v0.2 #110

Merged
merged 86 commits into from
Jun 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
7a1ed6a
Added play to install MPI Operator under startservices role
Mar 31, 2020
604434c
Changing from v0.2.2 tag to master
Apr 2, 2020
e918adc
Merge pull request #48 from lwilson/issue-44-mpi-operator
j0hnL Apr 3, 2020
002e06f
Create branch-switcher.yml
Apr 7, 2020
8b314a8
switching to Dell curated containers
j0hnL Apr 14, 2020
6f1333f
changed from base to cpu
j0hnL Apr 15, 2020
916841c
Merge pull request #54 from j0hnL/devel
Apr 15, 2020
6c47867
updating jupyterhub helm chart version
j0hnL Apr 16, 2020
cc18688
Merge pull request #56 from j0hnL/devel
Apr 16, 2020
4068b17
* added helm repo auto install
j0hnL May 5, 2020
00314be
Merge pull request #58 from j0hnL/devel
j0hnL May 5, 2020
4fae67f
changing network scheme to not conflict with existing network setup
j0hnL May 5, 2020
2b1c569
adding documentation for MetalLB
j0hnL May 6, 2020
5cb5026
Merge pull request #60 from j0hnL/devel
j0hnL May 6, 2020
44b6868
Updating instructions on contributions to the project
May 6, 2020
f7b3ccb
fixed wrong info and added more
j0hnL May 6, 2020
10a4f29
Updating instructions on contributions to the project
May 6, 2020
a79a9c1
Removing reduntant text on issue and pull request creation
May 6, 2020
0bdca58
Updating CONTRIBUTING.md to resolve merge issue
May 6, 2020
8fa5e92
Adding full text of DCO
May 6, 2020
eff7e29
Merge pull request #66 from lwilson/issue-63
j0hnL May 6, 2020
6d091a5
Merge branch 'devel' into install_doc_update
May 6, 2020
e7c61e7
Merge pull request #65 from j0hnL/install_doc_update
May 6, 2020
3aac65c
Adding contributors section to README.md
May 8, 2020
fe4878a
Changing markdown image syntax to html in order to specify image width
May 8, 2020
6557659
Removing extra newline
May 8, 2020
60109c5
Update README.md
May 8, 2020
a663a85
Merge pull request #70 from lwilson/devel
j0hnL May 8, 2020
44bfb5d
Updating LICENSE file to unclude Dell Technologies name in copyright
May 8, 2020
4b715c0
Adding Apache2 copyright clause to all yml and bash files
May 8, 2020
8531b16
Merge pull request #72 from lwilson/issue-71
j0hnL May 8, 2020
b70a8b5
Adding omnia logo
May 8, 2020
24ac6ee
Fixing missing newline in README to render markdown
May 8, 2020
2c4915a
Adding shields for license and open issues
May 8, 2020
e650baf
Updating omnia logo image
May 11, 2020
797af8f
Removing space bewteen contributor logos
May 11, 2020
5b0e868
Reducing size of contribtor logos and switching to height limit inste…
May 11, 2020
3e461b7
Adding shields to top of README
May 11, 2020
4c6702e
Reordering shields
May 11, 2020
b53de26
Merge pull request #74 from lwilson/issue-73
j0hnL May 11, 2020
de429f2
adding updated TF example using NGC and latest MPI Operator
j0hnL May 12, 2020
6e54ff6
resolves issue #77
j0hnL May 12, 2020
5774a94
Merge pull request #78 from j0hnL/issue-77
May 12, 2020
b80557f
adding cpu example in pytorch to close #62
j0hnL May 12, 2020
7861e47
Merge pull request #80 from j0hnL/issue-62
May 12, 2020
095f3bf
set defautStorageClass = nfs-client
j0hnL May 15, 2020
c16ff88
moved command back to `shell:` instead in helper script
j0hnL May 15, 2020
d1848f0
Merge pull request #82 from j0hnL/issue-81
May 15, 2020
57570ca
moved jupyterhub install
j0hnL May 19, 2020
0d90a3c
Merge pull request #87 from j0hnL/issue-86
May 19, 2020
ba37f5d
Kubeflow install
j0hnL May 20, 2020
bbf84d2
Merge pull request #88 from j0hnL/issue-16
May 20, 2020
dc01a85
added variable for master_ip
j0hnL May 20, 2020
a88ce63
Merge pull request #90 from j0hnL/issue-89
May 20, 2020
4a45d41
modified kfserving-gatway
j0hnL May 21, 2020
ffc434d
Merge pull request #93 from j0hnL/issue-92
lwilson May 21, 2020
28a01f5
Omnia Edge Install
j0hnL May 21, 2020
c26d16b
changed edge_install to single_node
j0hnL May 22, 2020
a522441
Merge pull request #94 from j0hnL/issue-40
lwilson May 22, 2020
73588a7
Merge branch 'master' of https://github.com/dellhpc/omnia into devel
May 26, 2020
a6b0767
Adding absolute path to CONTRIBUTING.md in order to support /docs-bas…
May 26, 2020
6a05e13
Updating /docs/_config.yml to include logo, title, and description fo…
May 26, 2020
9517673
fix kfserving-gateway limtis
j0hnL May 26, 2020
c3ee72d
Merge pull request #99 from j0hnL/issue-98
May 26, 2020
01784bb
updated documentaion for kubeflow install
j0hnL May 28, 2020
625ab06
Merge pull request #101 from j0hnL/doc-update
May 28, 2020
784448a
Adding contributors to readme
Jun 8, 2020
d7aff9d
Renaming playbooks to address issue #102.
Jun 8, 2020
b85acdc
Merge branch 'devel' of https://github.com/dellhpc/omnia into issue-96
Jun 8, 2020
210f2e8
Merge branch 'devel' of https://github.com/dellhpc/omnia into devel
Jun 8, 2020
d74d49c
Fixing copyright line
Jun 8, 2020
278d62a
Merge pull request #104 from lwilson/issue-103
j0hnL Jun 9, 2020
1b963c2
Merge branch 'devel' of https://github.com/dellhpc/omnia into issue-96
Jun 9, 2020
22d5859
Changing documentation link in top-level README to reference website
Jun 9, 2020
6d50425
Adding separate file for contributors
Jun 9, 2020
b391081
Updated preinstall to mention separate deployment node
Jun 9, 2020
086248d
Moving contributors to separate file
Jun 9, 2020
8ca9149
Adding layer-cake diagrams for Omnia Slurm and K8s stacks
Jun 9, 2020
eb3adf8
Add Xilinx FPGA Device Plugin
j0hnL Jun 11, 2020
fde169d
Merge pull request #107 from j0hnL/issue-51
Jun 11, 2020
effecb7
Adding overview image of Omnia
Jun 15, 2020
a469c93
Updated readme to include overview image
Jun 15, 2020
ccca968
Merge branch 'devel' of https://github.com/dellhpc/omnia into issue-96
Jun 15, 2020
c7e9bfc
Merge pull request #97 from lwilson/issue-96
j0hnL Jun 15, 2020
c5dc9d7
Updating commits-since badge to omnia-v0.2
Jun 16, 2020
4c9a6e3
Merge pull request #112 from lwilson/issue-111
j0hnL Jun 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
116 changes: 70 additions & 46 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,83 @@ These guidelines are based on the [pravega project](https://github.com/pravega/p

This document will evolve as the project matures. Please be sure to regularly refer back in order to stay in-line with contribution guidelines.

## Issues and Pull Requests
To produce a pull request against Omnia, follow these steps:

* **Create an issue:** Create an issue and describe what you are trying to solve. It doesn't matter whether it is a new feature, a bug fix, or an improvement. All pull requests need to be associated to an issue. See more here: Creating an issue
* **Issue branch:** Create a new branch on your fork of the repository. Typically, you need to branch off master, but there could be exceptions. To branch off master, use git checkout master; git checkout -b <new-branch-name>.
* **Push the changes:** To be able to create a pull request, push the changes to origin: git push --set-upstream origin <new-branch-name>. I'm assuming that origin is your personal repo, e.g., `lwilson/omnia.git`.
* **Branch name:** Use the following pattern to create your new branch name: issue-number-description, e.g., issue-1023-reformat-testutils.
* **Create a pull request:** Github gives you the option of creating a pull request. Give it a title following this format Issue ###: Description, _e.g., Issue 1023: Reformat testutils. Follow the guidelines in the description and try to provide as much information as possible to help the reviewer understand what is being addressed. It is important that you try to do a good job with the description to make the job of the code reviewer easier. A good description not only reduces review time, but also reduces the probability of a misunderstanding with the pull request.
* **Merging:** Merging of pull requests will be handled by project mantainers

When preparing a pull request it is important to stay up-to-date with the master. We recommend that you rebase against the upstream repository _frequently_. To do this, use the following commands:
```
git pull --rebase upstream master #upstream is dellhpc/omnia
git push --force origin <pr-branch-name> #origin is your fork of the repository (e.g., <github_user_name>/omnia.git)
## How to Contribute to Omnia
Contributions to Omnia are made through [Pull Requests (PRs)](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests). To make a pull request against Omnia, use the following steps:

1. **Create an issue:** [Create an issue](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue) and describe what you are trying to solve. It does not matter whether it is a new feature, a bug fix, or an improvement. All pull requests need to be associated to an issue. When creating an issue, be sure to use the appropriate issue template (bug fix or feature request) and complete all of the required fields. If your issue does not fit in either a bug fix or feature request, then create a blank issue and be sure to including the following information:
* **Problem description:** Describe what you believe needs to be addressed
* **Problem location:** In which file and at what line does this issue occur?
* **Suggested resolution:** How do you intend to resolve the problem?
2. **Create a personal fork:** All work on Omnia should be done in a [fork of the repository](https://help.github.com/en/github/getting-started-with-github/fork-a-repo). Only the maintiners are allowed to commit directly to the project repository.
3. **Issue branch:** [Create a new branch](https://help.github.com/en/desktop/contributing-to-projects/creating-a-branch-for-your-work) on your fork of the repository. All contributions should be branched from `devel`. Use `git checkout devel; git checkout -b <new-branch-name>` to create the new branch.
* **Branch name:** The branch name should be based on the issue you are addressing. Use the following pattern to create your new branch name: issue-number, e.g., issue-1023.
4. **Commit changes to the issue branch:** It is important to commit your changes to the issue branch. Commit messages should be descriptive of the changes being made.
* **Signing your commits:** All commits to Omnia need to be signed with the [Developer Certificate of Origin (DCO)](https://developercertificate.org/) in order to certify that the contributor has permission to contribute the code. In order to sign commits, use either the `--signoff` or `-s` option to `git commit`:
```
git commit --signoff
git commit -s
```
Make sure you have your user name and e-mail set. The `--signoff | -s` option will use the configured user name and e-mail, so it is important to configure it before the first time you commit. Check the following references:

* [Setting up your github user name](https://help.github.com/articles/setting-your-username-in-git/)
* [Setting up your e-mail address](https://help.github.com/articles/setting-your-commit-email-address-in-git/)

5. **Push the changes to your personal repo:** To be able to create a pull request, push the changes to origin: `git push origin <new-branch-name>`. Here I assume that `origin` is your personal repo, e.g., `lwilson/omnia.git`.
6. **Create a pull request:** [Create a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) with a title following this format Issue ###: Description (_i.e., Issue 1023: Reformat testutils_). It is important that you do a good job with the description to make the job of the code reviewer easier. A good description not only reduces review time, but also reduces the probability of a misunderstanding with the pull request.
* **Important:** When preparing a pull request it is important to stay up-to-date with the project repository. We recommend that you rebase against the upstream repo _frequently_. To do this, use the following commands:
```
git pull --rebase upstream master #upstream is dellhpc/omnia
git push --force origin <pr-branch-name> #origin is your fork of the repository (e.g., <github_user_name>/omnia.git)
```
* **PR Description:** Be sure to fully describe the pull request. Ideally, your PR description will contain:
1. A description of the main point (_e.g., why was this PR made?_),
2. Linking text to the related issue (_e.g., This PR closes issue #<issue_number>_),
3. How the changes solves the problem, and
4. How to verify that the changes work correctly.

## Omnia Branches and Contribution Flow
The diagram below describes the contribution flow. Omnia has two lifetime branches: `devel` and `master`. The `master` branch is reserved for releases and their associated tags. The `devel` branch is where all development work occurs. The `devel` branch is also the default branch for the project.

![Omnia Branch Flowchart](docs/images/omnia-branch-structure.png "Flowchart of Omnia branches")

## Developer Certificate of Origin
Contributions to Omnia must be signed with the [Developer Certificate of Origin (DCO)](https://developercertificate.org/):
```
## Creating an Issue
When creating an issue, there are two important parts: title and description. The title should be succinct, but give a good idea of what the issue is about. Try to add all important keywords to make it clear to the reader. For example, if the issue is about changing the log level of some messages in the segment store, then instead of saying "Log level" say "Change log level in the segment store". The suggested way includes both the goal where in the code we are supposed to do it.
Developer Certificate of Origin
Version 1.1

For the description, there three parts:
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

* *Problem description:* Describe what it is that we need to change. If it is a bug, describe the observed symptoms. If it is a new feature, describe it is supposed to be with as much detail as possible.
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

* *Problem location:* This part refers to where in the code we are supposed to make changes. For example, if it is bug in the client, then in this part say at least "Client". If you know more about it, then please add it. For example, if you that there is an issue with SegmentOutputStreamImpl, say it in this part.

* *Suggestion for an improvement:* This section is designed to let you give a suggestion for how to fix the bug described in the Problem description or how to implement the feature described in that same section. Please make an effort to separate between problem statement (Problem Description section) and solution (Suggestion for an improvement).
Developer's Certificate of Origin 1.1

We next discuss how to create a pull request.

## Creating a Pull Request
When creating a pull request, there are also two important parts: title and description. The title can be the same as the one of the issue, but it must be prefixed with the issue number, e.g.:
```
Issue 724: Change log level in the segment store
```
The description has four parts:
By making a contribution to this project, I certify that:

* __Changelog description*:__ This section should be the two or three main points about this PR. A detailed description should be left for the What the code does section. The two or three points here should be used by a committer for the merge log.
* __Purpose of the change:__ Say whether this closes an issue or perhaps is a subtask of an issue. This section should link the PR to at least one issue.
* __What the code does:__ Use this section to freely describe the changes in this PR. Make sure to give as much detail as possible to help a reviewer to do a better job understanding your changes.
* __How to verify it:__ For most of the PRs, the answer here will be trivial: the build must pass, system tests must pass, visual inspection, etc. This section becomes more important when the way to reproduce the issue the PR is resolving is non-trivial, like running some specific command or workload generator.
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

## Signing Your Commits
We require that developers sign off their commits to certify that they have permission to contribute the code in a pull request. This way of certifying is commonly known as the [Developer Certificate of Origin (DCO)](https://developercertificate.org/). We encourage all contributors to read the DCO text before signing a commit and making contributions.
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

To make sure that pull requests have all commits signed off, we use the [Probot DCO plugin](https://probot.github.io/apps/dco/).
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

### Signing off a commit

#### Using the command line
To make sure that pull requests have all commits signed off, we use the Probot DCO plugin.
Use either `--signoff` or `-s` with the commit command.

Make sure you have your user name and e-mail set. The `--signoff | -s` option will use the configured user name and e-mail, so it is important to configure it before the first time you commit. Check the following references:

[Setting up your github user name](https://help.github.com/articles/setting-your-username-in-git/)

[Setting up your e-mail address](https://help.github.com/articles/setting-your-commit-email-address-in-git/)
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
```
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# Omnia
<img src="docs/images/omnia-logo.png" width="500px">

![GitHub](https://img.shields.io/github/license/dellhpc/omnia) ![GitHub issues](https://img.shields.io/github/issues-raw/dellhpc/omnia) ![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/dellhpc/omnia?include_prereleases) ![GitHub last commit (branch)](https://img.shields.io/github/last-commit/dellhpc/omnia/devel) ![GitHub commits since tagged version](https://img.shields.io/github/commits-since/dellhpc/omnia/omnia-v0.2/devel)

#### Ansible playbook-based deployment of Slurm and Kubernetes on Dell EMC PowerEdge servers running an RPM-based Linux OS

Omnia (Latin: all or everything) is a deployment tool to turn Dell EMC PowerEdge servers with RPM-based Linux images into a functioning Slurm/Kubernetes cluster.

## Omnia Documentation
For Omnia documentation, including installation and contribution instructions, see [docs](docs/README.md).
For Omnia documentation, including installation and contribution instructions, please see the [website](https://dellhpc.github.io/omnia).

### Current maintainers:
## Current maintainers:
* Lucas A. Wilson (Dell Technologies)
* John Lockman (Dell Technologies)

## Omnia Contributors:
<img src="docs/images/delltech.jpg" height="150px" alt="Dell Technologies"> <img src="docs/images/pisa.png" height="150px" alt="Universita di Pisa">
6 changes: 6 additions & 0 deletions docs/CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Omnia Maintainers
- Luke Wilson and John Lockman (Dell Technologies)
<img src="images/delltech.jpg" height="90px" alt="Dell Technologies">

# Omnia Contributors
<img src="images/delltech.jpg" height="90px" alt="Dell Technologies"> <img src="images/pisa.png" height="100px" alt="Universita di Pisa">
92 changes: 66 additions & 26 deletions docs/INSTALL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Installing Omnia

## TL;DR

## TL;DR Installation

### Kubernetes
Install Kubernetes and all dependencies
```
Expand All @@ -12,54 +10,96 @@ Initialize K8s cluster
```
ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
```

### Install Kubeflow
```
ansible-playbook -i host_inventory_file kubernetes/kubeflow.yaml
```

### Slurm
```
ansible-playbook -i host_inventory_file slurm/slurm.yml
```

## Build/Install
# Omnia
Omnia is a collection of [Ansible](https://www.ansible.com/) playbooks which perform:
* Installation of [Slurm](https://slurm.schedmd.com/) and/or [Kubernetes](https://kubernetes.io/) on servers already provisioned with a standard [CentOS](https://www.centos.org/) image.
* Installation of auxiliary scripts for administrator functions such as moving nodes between Slurm and Kubernetes personalities.

### Kubernetes

* Add additional repositories:
Omnia playbooks perform several tasks:
`common` playbook handles installation of software
* Add yum repositories:
- Kubernetes (Google)
- El Repo (nvidia drivers)
- Nvidia (nvidia-docker)
- El Repo (for Nvidia drivers)
- EPEL (Extra Packages for Enterprise Linux)
* Install common packages
* Install Packages from repos:
- bash-completion
- docker
- gcc
- python-pip
- docker
- kubelet
- kubeadm
- kubectl
- nfs-utils
- nvidia-detect
- yum-plugin-versionlock
* Restart and enable system level services
- Docker
- Kubelet

`computeGPU` playbook installs Nvidia drivers and nvidia-container-runtime-hook
* Add yum repositories:
- Nvidia (container runtime)
* Install Packages from repos:
- kmod-nvidia
- nvidia-x11-drv
- nvidia-container-runtime
- ksonnet (CLI framework for K8S configs)
* Enable GPU Device Plugins (nvidia-container-runtime-hook)
* Modify kubeadm config to allow GPUs as schedulable resource
* Start and enable services
- nvidia-container-runtime-hook
* Restart and enable system level services
- Docker
- Kubelet
* Initialize Cluster
* Configuration:
- Enable GPU Device Plugins (nvidia-container-runtime-hook)
- Modify kubeadm config to allow GPUs as schedulable resource
* Restart and enable system level services
- Docker
- Kubelet

`master` playbook
* Install Helm v3
* (optional) add firewall rules for Slurm and kubernetes

Everything from this point on can be called by using the `init` tag
```
ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
```

`startmaster` playbook
* turn off swap
*Initialize Kubernetes
* Head/master
- Start K8S pass startup token to compute/slaves
- Initialize networking (Currently using WeaveNet)
- Setup K8S Dashboard
- Create dynamic/persistent volumes
* Compute/slaves
- Join k8s cluster
- Initialize software defined networking (Calico)

`startworkers` playbook
* turn off swap
* Join k8s cluster

`startservices` playbook
* Setup K8S Dashboard
* Add `stable` repo to helm
* Add `jupyterhub` repo to helm
* Update helm repos
* Deploy NFS client Provisioner
* Deploy Jupyterhub
* Deploy Prometheus
* Install MPI Operator


### Slurm
* Download and build Slurm source
* Install necessary dependencies
* Downloads and builds Slurm from source
* Install package dependencies
- Python3
- munge
- MariaDB
- MariaDB development libraries
* Build Slurm configuration files

2 changes: 1 addition & 1 deletion docs/PREINSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Omnia assumes that prior to installation:
* Systems have a base operating system (currently CentOS 7 or 8)
* Network(s) has been cabled and nodes can reach the internet
* SSH Keys for `root` have been installed on all nodes to allow for password-less SSH
* Ansible is installed on the master node
* Ansible is installed on either the master node or a separate deployment node
```
yum install ansible
```
Expand Down