Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for package building #1358

Merged
merged 1 commit into from Apr 6, 2017

Conversation

mellenburg
Copy link

This process is shrouded in darkness and secrecy from a docs perspective. FIAT LUX!

@d2iq-mergebot
Copy link
Collaborator

This repo has @mesosphere-mergebot integration. You can interact with the following commands.

@mesosphere-mergebot bump-ee

Copy link
Contributor

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/depedent/dependent/g
s/enchances/enhances/g
s/depedencies/dependencies/g
s/istance/instance/g
s/specifc/specific/g

From the `packages` directory, one can run `mkpanda tree` which will essentially do a full, locally-cached DC/OS build. Alternatively, one can name a variant tree like so: `mkpanda tree installer`. This will instruct pkgpanda to only make the packages necessary for building the completed variant.

## Deployment Artifacts
The artifacts are actually delivered to hosts via one of the deployment methods crafted in the `do_create` function of the modules in `gen.build_deploy`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider avoiding the mention of these method names like do_create. If we refactor the code and miss out this document, it could confuse people.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh. I see your point, but I at least want one giant arrow that says "THIS IS WHERE WE MAKE ____" where ____ is CF template, ACS template, and the installer. I would rather just leave this than not define it in waiting for a refactor that may never come

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be easier to remember to update these references if the doc lives closer to the corresponding code.

@mellenburg
Copy link
Author

@orsenthil Thanks for including the spelling corrections in vim/sed short-hand as always 😄

* `$PKG_PATH`: the writable path for build artifacts

### Package Trees and Variants
In the package directory, the only things that can be placed there besides package-folders is a `*treeinfo.json` file and an `upstream.json` file. A `*treeinfo.json` describes an arbitrary set of packages and how they should be bundled into a bootstrap a tarball.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to add: the variant called 'installer' will not be built as a full 'variant' even though it will be built as a bootstrap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah!. I concentrated on the overview as I understood the system. If there are details that I missed, I hope that others who have the context in their at the moment can spot it. Branden and Ben would be great candidates for detailed review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: if using upstream, the treesinfo inside the tree will be ignored. there must be a treeinfo.json in the downstream package tree


The packages built by pkgpanda make up the core set of components that comprise DC/OS. However, to actually stand up a DC/OS istance, tools must be built with baked in configurations to deploy the specifc set of artifacts to a given provider (generic term for any entity providing hardware on which DC/OS will run). For this, DC/OS has a script called [release](../release) which has the function of taking the completed build artifacts, parsing its meta-data, rendering deploy templates with references to the artifacts, bundling the installer with the artifacts, and uploading everything to remote hosting services (S3 or Azure). Templates and the onprem installer are the current deploy methods native to DC/OS, but in theory form of provider could be implemented by just adding a new module to the [provider tools](../gen/build_deploy)

There is one more tool which plays a critical role in producing viable DC/OS builds and that is the [gen library](../). gen is capable of parsing templates for required arguments using conditional template logic and then resolving those targets versus an argument source. To learn more about gen, see [gen README.md](../gen/README.md)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken link to gen


In addition pkgpanda performs a few other critical functions:
* Under the hood of a DC/OS deployment, `pkgpanda` is also responsible for managing the symlinks and filepaths that will link artifacts to the runtime environment of the host system.
* In each build, the artifacts for every package are cached either locally or an a 3rd party storage provider (AWS and Azure). This greatly enchances build time as most DC/OS changes only touch one independent package at a time. The artifacts include sources so if a 3rd party source should ever suddenly disappear, there will be a reliable backup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/locally or an a/locally or on a/g

* In each build, the artifacts for every package are cached either locally or an a 3rd party storage provider (AWS and Azure). This greatly enchances build time as most DC/OS changes only touch one independent package at a time. The artifacts include sources so if a 3rd party source should ever suddenly disappear, there will be a reliable backup
* Package builds are isolated. By forcing builds to be in a docker container, all sources and depedencies must be specifcally declared.

The packages built by pkgpanda make up the core set of components that comprise DC/OS. However, to actually stand up a DC/OS istance, tools must be built with baked in configurations to deploy the specifc set of artifacts to a given provider (generic term for any entity providing hardware on which DC/OS will run). For this, DC/OS has a script called [release](../release) which has the function of taking the completed build artifacts, parsing its meta-data, rendering deploy templates with references to the artifacts, bundling the installer with the artifacts, and uploading everything to remote hosting services (S3 or Azure). Templates and the onprem installer are the current deploy methods native to DC/OS, but in theory form of provider could be implemented by just adding a new module to the [provider tools](../gen/build_deploy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/but in theory form of provider/but in theory a new form of provider/g

Copy link
Contributor

@vespian vespian Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to actually stand up a DC/OS istance

to actually spawn a DC/OS instance?

Copy link
Contributor

@vespian vespian Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking the completed build artifacts, parsing its meta-data,

singular vs. plural mismatch?


#### Sources
Sources are singular artifacts or git repositories that are required to complete the package build. They are described via JSON maps with variable fields; the only guaranteed field is `kind`. There are a few kinds of sources:
* `git`: Points to a git repository which can be private if the build host git client is configured correctly. Required fields: `kind`, `url`, `ref`, `ref_origin`. `ref` is the commit from the repo that should be built and `ref_origin` is the branch on which that commit live.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/live.$/lives./g

As for describing in relation to the `buildinfo.json`:
* If there are no required dependencies outside of the package directory, then no source information is required `buildinfo.json`
* If there is only one source, a `source` field must be included in the `buildinfo.json`. The data from this source will be mounted inside the container as `/pkg/src/$PKG_NAME`
* If there are multiple sources, a `sources` field must be provided in `buildinfo.json` and each of its subfields will be a source like the single source entry. Each of these sources will be mounted in the container as `/pkg/src/<<key name of source in JSON sources>>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 To this being explained, remember it taking some time to figure it out the first time I was fighting with a buildinfo.json

In the package directory, the only things that can be placed there besides package-folders is a `*treeinfo.json` file and an `upstream.json` file. A `*treeinfo.json` describes an arbitrary set of packages and how they should be bundled into a bootstrap a tarball.

#### treeinfo.json
Normally, all packages in the package directory are considered apart of the target set to be added to the bootstrap tarball, but a `*treeinfo.json` allows specifying subsets which we call variants. E.G. a `foobar.treeinfo.json` would describe how to build the foobar variant. By default, all packages are apart of the `<default>` variant. Specifically, consider [treeinfo.json](../packages/treeinfo.json), which is such a tree for the "default" variant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/apart/a part/g

"bootstrap_package_list": ["dcos-image"]
}
```
The the "exclude" field is to say that the package `"dcos-installer-ui"` should not be built in the default variant. The `"bootstrap_package_list"` is a whitelist of packages that will be placed in the default bootstrap tarball. The bootstrap tarball is the collection of packages transferred to host machines which then orchestrates the bootstrapping of the remaining packages to spin up DC/OS. As such, the installer UI code is completely useless to a DC/OS deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The the/The/g

This file will tell pkgpanda the packages directory containing it should be expanded to include the packages from the given upstream source. Thus, one may create a modified version of DC/OS by just defining the desired package modifications; no forked repositories required!

### Developing Packages
When building a release, `pkgpanda` is only called through its library. However, there is a CLI which is very helpful for developing and debugging packages. To setup pkgpanda locally, be sure docker is installed and able to run without sudo, then run the `./prep_local` script in the root directory (python virtualenv will be required). Finally, make sure the dcos-builder image has been prepped locally by doing `cd pkgpanda/build/docker/dcos-builder/; docker build -t dcos/dcos-builder:dcos-builder_dockerdir-latest .` Right now, this is handled apart of the release scripting, not pkgpanda.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the compound command can we do && instead of ; that will ensure the cd succeeds before trying to build the docker image, whereas the semicolon will not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

Copy link
Contributor

@gpaul gpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gold, thanks @mellenburg


DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda). This tool can read package metadata JSONs in a [package tree](../packages), verify the package dependency list is resolvable by walking through depedent packages, and then use the build script files accompanying the metadata to perform a docker-based build where the artifacts can be transferred out via a volume mount. Finally, these artifacts are packaged in XZ-compressed tar-balls.

In addition pkgpanda performs a few other critical functions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/In addition/In addition,/


In addition pkgpanda performs a few other critical functions:
* Under the hood of a DC/OS deployment, `pkgpanda` is also responsible for managing the symlinks and filepaths that will link artifacts to the runtime environment of the host system.
* In each build, the artifacts for every package are cached either locally or an a 3rd party storage provider (AWS and Azure). This greatly enchances build time as most DC/OS changes only touch one independent package at a time. The artifacts include sources so if a 3rd party source should ever suddenly disappear, there will be a reliable backup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/greatly enhances/greatly reduces/
Perhaps?


In addition pkgpanda performs a few other critical functions:
* Under the hood of a DC/OS deployment, `pkgpanda` is also responsible for managing the symlinks and filepaths that will link artifacts to the runtime environment of the host system.
* In each build, the artifacts for every package are cached either locally or an a 3rd party storage provider (AWS and Azure). This greatly enchances build time as most DC/OS changes only touch one independent package at a time. The artifacts include sources so if a 3rd party source should ever suddenly disappear, there will be a reliable backup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reliable backup/reliable backup./

In addition pkgpanda performs a few other critical functions:
* Under the hood of a DC/OS deployment, `pkgpanda` is also responsible for managing the symlinks and filepaths that will link artifacts to the runtime environment of the host system.
* In each build, the artifacts for every package are cached either locally or an a 3rd party storage provider (AWS and Azure). This greatly enchances build time as most DC/OS changes only touch one independent package at a time. The artifacts include sources so if a 3rd party source should ever suddenly disappear, there will be a reliable backup
* Package builds are isolated. By forcing builds to be in a docker container, all sources and depedencies must be specifcally declared.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/By forcing builds to be/By performing builds in a/
s/specifcally/specifically/
perhaps 'explicitly declared'?

* In each build, the artifacts for every package are cached either locally or an a 3rd party storage provider (AWS and Azure). This greatly enchances build time as most DC/OS changes only touch one independent package at a time. The artifacts include sources so if a 3rd party source should ever suddenly disappear, there will be a reliable backup
* Package builds are isolated. By forcing builds to be in a docker container, all sources and depedencies must be specifcally declared.

The packages built by pkgpanda make up the core set of components that comprise DC/OS. However, to actually stand up a DC/OS istance, tools must be built with baked in configurations to deploy the specifc set of artifacts to a given provider (generic term for any entity providing hardware on which DC/OS will run). For this, DC/OS has a script called [release](../release) which has the function of taking the completed build artifacts, parsing its meta-data, rendering deploy templates with references to the artifacts, bundling the installer with the artifacts, and uploading everything to remote hosting services (S3 or Azure). Templates and the onprem installer are the current deploy methods native to DC/OS, but in theory form of provider could be implemented by just adding a new module to the [provider tools](../gen/build_deploy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s|DC/OS istance|DC/OS instance|
s/baked in/baked-in/
s/specifc/specific/
s/but in theory form of/but in theory a new form of/
s/just adding/adding/ <-- :p
s/to the provider tools/to the provider tools./



#### Customized Docker Build Images
By default, all packages are built with the [dcos-builder Dockerfile](../pkgpanda/docker/dcos-builder), however, packages can be built with any docker image. There are two ways to specify other docker images:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, all packages are built with the dcos-builder Dockerfile. A different docker image can be specified. There are two ways to specify a different docker image:


#### Customized Docker Build Images
By default, all packages are built with the [dcos-builder Dockerfile](../pkgpanda/docker/dcos-builder), however, packages can be built with any docker image. There are two ways to specify other docker images:
* include a folder called `docker` along side the `build` and `buildinfo.json`. This file must contain a Dockerfile and any necessary data to be included with the image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/folder/directory/
(a folder is a user interface depiction of a directory, AFAICT)

s/along side/alongside/
s/and buildinfo.json./and buildinfo.json files./

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docker image will be built as part of the package building process.

* `/pkg/src/`: directory for all sources declared in `buildinfo.json`
* `/pkg/extra/`: mounted `extra` directory from the package folder
* `/opt/mesosphere/environment.export`: file including the environment variables that will link the build container environment to built packages and their artifacts. Most builds depending on other packages should source the environment with this file
* `/opt/mesosphere`: the various accumulation of dependent artifacts as they will be on the host at runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just:
the location of all packages that were listed as dependencies

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a little more complicated than that because it also includes the symlinked superposition of lib and bin, but I think I can communicate that better...

* `$PKG_PATH`: the writable path for build artifacts

### Package Trees and Variants
In the package directory, the only things that can be placed there besides package-folders is a `*treeinfo.json` file and an `upstream.json` file. A `*treeinfo.json` describes an arbitrary set of packages and how they should be bundled into a bootstrap a tarball.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/into a bootstrap a tarball/into a bootstrap tarball/

In the package directory, the only things that can be placed there besides package-folders is a `*treeinfo.json` file and an `upstream.json` file. A `*treeinfo.json` describes an arbitrary set of packages and how they should be bundled into a bootstrap a tarball.

#### treeinfo.json
Normally, all packages in the package directory are considered apart of the target set to be added to the bootstrap tarball, but a `*treeinfo.json` allows specifying subsets which we call variants. E.G. a `foobar.treeinfo.json` would describe how to build the foobar variant. By default, all packages are apart of the `<default>` variant. Specifically, consider [treeinfo.json](../packages/treeinfo.json), which is such a tree for the "default" variant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/E.G./For example,/
s/all package are apart of/all packages form part of/
s/Specifically, consider/Let's consider/

Copy link
Contributor

@vespian vespian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a PR against your PR: mesosphere#13 Also, please have a look at: http://english.stackexchange.com/a/37976

Thanks for documenting it, it's really a great step forward for making our build system awesome!

* `url_extract`: URL is to an artifact that is a compressed file that will be automatically decompressed before volume mounting. Supported compression types are: `.tar.gz`, `.tgz`, and `.zip`. Required fields are: `kind`, `url`, and `sha1`

As for describing in relation to the `buildinfo.json`:
* If there are no required dependencies outside of the package directory, then no source information is required `buildinfo.json`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required in buildinfo.json


#### Customized Docker Build Images
By default, all packages are built with the [dcos-builder Dockerfile](../pkgpanda/docker/dcos-builder), however, packages can be built with any docker image. There are two ways to specify other docker images:
* include a folder called `docker` along side the `build` and `buildinfo.json`. This file must contain a Dockerfile and any necessary data to be included with the image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to put some extra information on where this container is stored? I sometimes pull the image to i.e. debug stuff on my laptop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. will add

@@ -0,0 +1,114 @@
# Overview

DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda). This tool can read package metadata JSONs in a [package tree](../packages), verify the package dependency list is resolvable by walking through depedent packages, and then use the build script files accompanying the metadata to perform a docker-based build where the artifacts can be transferred out via a volume mount. Finally, these artifacts are packaged in XZ-compressed tar-balls.
Copy link
Contributor

@vespian vespian Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't add some background why we created our own packaging system. I sometimes see people ranting on HN that we have not-invented-here syndrome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify that the package dependency list is resolvable

Copy link
Contributor

@vespian vespian Mar 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the artifacts can be transferred out via a volume mount.

Not sure if we aren't going into too much detail here. This will most probably be either obvious or irrelevant to the reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't add some background why we created our own packaging system. I sometimes see people ranting on HN that we have not-invented-here syndrome.

Could link to this: https://dcos.io/docs/1.9/overview/design/installation/

"ref_origin": "master"
}
```
This file will tell pkgpanda the packages directory containing it should be expanded to include the packages from the given upstream source. Thus, one may create a modified version of DC/OS by just defining the desired package modifications; no forked repositories required!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be useful for users to also show how they can refer to upstream repo in buildinfo.json. For example:
https://github.com/mesosphere/dcos-enterprise/blob/master/packages/dcos-integration-test/ee.buildinfo.json#L21-L24

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will drop in that treeinfo example

The artifacts are actually delivered to hosts via one of the deployment methods crafted in the `do_create` function of the modules in `gen.build_deploy`.

## Templates
The modules `gen.build_deploy.aws` and `gen.build_deploy.azure` provide templates that interact directly with the specific provider services and APIs. By leveraging the native tools of a hardware provider, DC/OS can be spun up much faster with appropriate configurations. The downside is that relying on provider APIs can make upgrading much harder as many more settings outside of DC/OS need to be touched. Finally, some settings need to be baked into a template as provider APIs might not allow the required level of configuration flexibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools of a hardware provider

cloud provider?

Copy link
Contributor

@branden branden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this, there's a ton of good info here that needs to be documented.

The length and depth of this doc is overwhelming for an overview, and portions of this doc are so detailed that it may need to be updated if we change corresponding code. I think this will be easier to consume and maintain if we split most of it into separate docs under pkgpanda/docs/, gen/docs/, and release/docs/, and link to them from a doc that describes the build process in broad strokes. That will allow us to keep the overview tailored to an audience of first-time or occasional contributors, while keeping the details of (for example) buildinfo semantics in a reference doc that's easy to find and lives near the code it describes.

@@ -0,0 +1,114 @@
# Overview

DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda). This tool can read package metadata JSONs in a [package tree](../packages), verify the package dependency list is resolvable by walking through depedent packages, and then use the build script files accompanying the metadata to perform a docker-based build where the artifacts can be transferred out via a volume mount. Finally, these artifacts are packaged in XZ-compressed tar-balls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package tree is the tree of packages w/ dependencies inferred from a treeinfo and the buildinfos for all its packages, not the packages dir itself.

@@ -0,0 +1,114 @@
# Overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first top-level header in a Markdown file should be a descriptive title for the entire doc. The filename should be a shorter alias (maybe just build.md) and the longer title should go here. "Overview" would be a good first sub-section.

@@ -0,0 +1,114 @@
# Overview

DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda). This tool can read package metadata JSONs in a [package tree](../packages), verify the package dependency list is resolvable by walking through depedent packages, and then use the build script files accompanying the metadata to perform a docker-based build where the artifacts can be transferred out via a volume mount. Finally, these artifacts are packaged in XZ-compressed tar-balls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't add some background why we created our own packaging system. I sometimes see people ranting on HN that we have not-invented-here syndrome.

Could link to this: https://dcos.io/docs/1.9/overview/design/installation/

From the `packages` directory, one can run `mkpanda tree` which will essentially do a full, locally-cached DC/OS build. Alternatively, one can name a variant tree like so: `mkpanda tree installer`. This will instruct pkgpanda to only make the packages necessary for building the completed variant.

## Deployment Artifacts
The artifacts are actually delivered to hosts via one of the deployment methods crafted in the `do_create` function of the modules in `gen.build_deploy`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be easier to remember to update these references if the doc lives closer to the corresponding code.

DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda). This tool can read package metadata JSONs in a [package tree](../packages), verify the package dependency list is resolvable by walking through depedent packages, and then use the build script files accompanying the metadata to perform a docker-based build where the artifacts can be transferred out via a volume mount. Finally, these artifacts are packaged in XZ-compressed tar-balls.

In addition pkgpanda performs a few other critical functions:
* Under the hood of a DC/OS deployment, `pkgpanda` is also responsible for managing the symlinks and filepaths that will link artifacts to the runtime environment of the host system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment is unrelated to build, shouldn't be in this doc. This sounds like an excerpt from a separate doc about pkgpanda.


The packages built by pkgpanda make up the core set of components that comprise DC/OS. However, to actually stand up a DC/OS istance, tools must be built with baked in configurations to deploy the specifc set of artifacts to a given provider (generic term for any entity providing hardware on which DC/OS will run). For this, DC/OS has a script called [release](../release) which has the function of taking the completed build artifacts, parsing its meta-data, rendering deploy templates with references to the artifacts, bundling the installer with the artifacts, and uploading everything to remote hosting services (S3 or Azure). Templates and the onprem installer are the current deploy methods native to DC/OS, but in theory form of provider could be implemented by just adding a new module to the [provider tools](../gen/build_deploy)

There is one more tool which plays a critical role in producing viable DC/OS builds and that is the [gen library](../). gen is capable of parsing templates for required arguments using conditional template logic and then resolving those targets versus an argument source. To learn more about gen, see [gen README.md](../gen/README.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more detail than is necessary to explain where gen fits into the build process. Here we can just say it's responsible for generating config files for DC/OS components from user-provided parameters, and then link to a gen doc for anyone who wants to know more.

@mellenburg
Copy link
Author

@vespian Thanks for the PR, but I ended up checking those edits one at a time anyway, so you can go ahead and close the PR. A couple points:

  • You propose switching artifacts to artefacts. I think this is incorrect and the link you posts confirms this view. An artifact is something made by human intention. An artefact is something that was produced by a natural system, experimental error, or perhaps a tool. Given that every time I use the word artifact I am referring to something we intended to build, package, and ship, it seems appropriate. I.E. Artefact would be a relevant if we were talking about the broken symlinks in any decompressed bootstrap tarball
  • You propose switching customize to customise. I am an American so customise hurts my eyes. I don't think I will bother with this kind of edit unless we decide organizationally to always use British or American spellings. Otherwise, I say leave it up to the author because its entirely topical

@mellenburg
Copy link
Author

@branden I am down to break this out into a better distribution of files, however, in order for that to be done well, https://github.com/dcos/dcos/blob/master/pkgpanda/docs needs to get cleaned up or it will just be a confusing mess. Specifically, the following looks so out of date or irrelevant and should just be burned:
https://github.com/dcos/dcos/blob/master/pkgpanda/docs/architecture.md
https://github.com/dcos/dcos/blob/master/pkgpanda/docs/deployer.md
https://github.com/dcos/dcos/blob/master/pkgpanda/docs/modules.md

So, mind if I burn those? As for where the new stuff goes, this has a little bit of good info, but half of it is "not implemented yet/TODO": https://github.com/dcos/dcos/blob/master/pkgpanda/docs/for_packagers.md. So I can trim the junk and make it more tutorial-oriented like this doc and split out buildinfo and treeinfo into their own doc.md files if necessary

@branden
Copy link
Contributor

branden commented Mar 17, 2017

@mellenburg Yeah those docs look interesting from a historical perspective but don't seem relevant otherwise. 💥

Updating for_packagers.md is a great idea. I could see a couple pkgpanda docs coming out of this PR: one for writing and building a package (buildinfo, package variants, mkpanda), and one for building a package tree (treeinfo, bootstrap, tree variants, mkpanda --tree).

Copy link
Contributor

@gpaul gpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks bro 🚢

@mellenburg mellenburg force-pushed the mellenburg/build_docs branch 2 times, most recently from 62d2f56 to a610edb Compare April 3, 2017 17:55

```bash
/etc/mesosphere/roles/{master,slave}
/etc/mesosphere/roles/{master,slave,aws,aws_master}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include slave_public here. AWS roles probably aren't worth mentioning in a general doc, they're specific to that deployment method.

active.old/
active.buildinfo.full.json.old
environment.old
environment.export.old
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These .old directories aren't guaranteed to be present. They're created when a new package set is activated. They're more implementation detail than API, so probably not relevant to this doc.

@@ -1,18 +1,16 @@
# Package concepts

**Package Name**
* Package Name *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary spaces

@@ -0,0 +1,15 @@
# Package Basics
DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda) which operates on a "package tree". The [packages](../packages) directory is the root of all the packages and `pkgpanda` is built to construct it into a "package tree". A "package tree" is constructed by looking in each directory in a root and parsing a `buildinfo.json` and `build` file from each directory. If these are present, then the directory is considered a package and added to the tree. This tool can read these metadata JSONs in a "package tree", verify the package dependency list is resolvable by walking through dependent packages, and then use the build script files accompanying the metadata to perform a Docker-based build. Finally, these artifacts are packaged in XZ-compressed tar-balls. For more information on why these design choices were made, take a look at the [docs](https://dcos.io/docs/1.9/overview/design/installation/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to remember to update this URL with every release. Let's link to the version-agnostic https://dcos.io/docs/overview/design/installation/, which will redirect to the doc for stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning Docker and XZ feels overly detailed for a basics doc. Probably better covered in a more detailed doc under pkgpanda/.

* Package builds are isolated. By performing builds in a docker container, all sources and dependencies must be explicitly declared.

The packages built by pkgpanda make up the core set of components that comprise DC/OS. However, to actually spawn a DC/OS instance, tools must be built with baked-in configurations to deploy the specific set of artifacts to a given provider (generic term for any entity providing hardware on which DC/OS will run). For this, DC/OS has a script called [release](../release). Its function is:
* injesting the completed build artifacts and parsing their meta-data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingesting, metadata

`name--id` combination package name + arbitrary information (most often a version indicator). The packaging system
needs to extract the package name from a package id. Valid characters are [a-zA-Z0-9@._+-]. A package-id may not
contain '-' or '--'. Once a package-id is utilized, it should never be re-used with different package contents.
`name--id` combination package name + unique, reproducible package hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name--version

The version can be any valid string. It's not always a hash, and nothing enforces that it is. Config packages include setup in the version, and late binding config packages just have a version of setup.


Every pkgpanda package may put items in several well-known directories to have them available to other packages.
Every pkgpanda package may put items in several well-known directories to have them available to other packages. Additionally, these directories will be sym-linked into system environment paths upon package activation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symlinked

This added sentence is restating the first one. Symlinking is the means by which package files in well-known directories are made available to other packages.

lib/ # Will be linked into $LD_LIBRARY_PATH
bin/ # Will be linked into $PATH
etc/ # Will be linked into system's /etc
dcos.target.wants/ # Will be linked into /etc/systemd/system/dcos.target.wants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can be more explicit here and say files under lib/ will be symlinked to /opt/mesosphere/lib/, etc. Also etc/ is linked to /opt/mesosphere/etc/, not /etc/.


The purpose of the above configuration is to generate two bootstrap tarballs with each build. The default tarball will contain the code that handles the import of the remaining DC/OS packages (as dictated by the provider `gen.build_deploy` module) and setup the hosts to startup DC/OS. The installer tarball will be used to create a mock DC/OS environment which can provide packages to hosts via the `dcos_installer` program (see below). It is important to note the distinction that `installer` is a bootstrap variant, and not a DC/OS variant. Thus, in a new variant `installer.foobar.treeinfo.json` would allow crafting a variant onprem installer for the `foobar` DC/OS variant.

## buildinfo.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives the impression that the packages/ dir should contain a top-level buildinfo.json, which isn't true. Probably best to describe package variants in the package doc and link to this doc for context on why you'd want to build one.


Packages are always extracted and run from a predictable location. The standard prefix is
`/opt/mesosphere/{name}-{id}/` where `{name}` is the package name and `{id}`, a id associated with the package . This
is effectively the `PREFIX` of the installed package in the `autotools` terms. A package may rely on the absolute path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general term for this is "install prefix".

This is the install prefix for the package.

Copy link
Contributor

@branden branden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, just a few small things left.

@@ -1,15 +1,13 @@
# Package Basics
DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda) which operates on a "package tree". The [packages](../packages) directory is the root of all the packages and `pkgpanda` is built to construct it into a "package tree". A "package tree" is constructed by looking in each directory in a root and parsing a `buildinfo.json` and `build` file from each directory. If these are present, then the directory is considered a package and added to the tree. This tool can read these metadata JSONs in a "package tree", verify the package dependency list is resolvable by walking through dependent packages, and then use the build script files accompanying the metadata to perform a Docker-based build. Finally, these artifacts are packaged in XZ-compressed tar-balls. For more information on why these design choices were made, take a look at the [docs](https://dcos.io/docs/1.9/overview/design/installation/).
DC/OS uses a custom packaging tool called [pkgpanda](../pkgpanda) which operates on a "package tree". The [packages](../packages) directory is the root of all the packages and `pkgpanda` is built to construct it into a "package tree". A "package tree" is constructed by looking in each directory in a root and parsing a `buildinfo.json` and `build` file from each directory. If these are present, then the directory is considered a package and added to the tree. This tool can read these metadata JSONs in a "package tree", verify the package dependency list is resolvable by walking through dependent packages, and then use the build script files accompanying the metadata to . For more information on why these design choices were made, take a look at the [docs](https://dcos.io/docs/overview/design/installation/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfinished sentence:

and then use the build script files accompanying the metadata to .


The `pkginfo.json` contains Packages depending on each other.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like "Packages" here should be lower-cased.

`/opt/mesosphere/{name}-{id}/` where `{name}` is the package name and `{id}`, a id associated with the package . This
is effectively the `PREFIX` of the installed package in the `autotools` terms. A package may rely on the absolute path
to the package contents to remaining constant.
`/opt/mesosphere/{name}--{version}/`, where `{name}--{version}` is the full package ID. This path will be the install prefix for the pacakge. A package may rely on the absolute path to the package contents remaining constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pacakge

@@ -3,7 +3,7 @@ pkgpanda packages expect a file system organized as described here so that symli
may be generated with the bootstrap tarball before actually being placed on the host system.

```bash
/etc/mesosphere/roles/{master,slave,aws,aws_master}
/etc/mesosphere/roles/{master,slave,slave_public,aws,aws_master}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These AWS roles should go, they're AWS-specific and aren't helpful for a general doc. If we wanted to be complete we'd need to add Azure roles too. But if we include all the roles for all the deployment methods, this line will just be a nuisance to maintain and will probably become out of date. We should stick to the roles that apply in all deployments: master, slave, and slave_public.


The name which other packages will know this package by and use. Package names must be valid Linux folder names, should
be case insensitive most often lower case only. Valid characters are `[a-zA-Z0-9@._+-]`. They may not start with a hyphen
or a dot. Must be at least one character long. A package name may not contain '--'.

*Package ID*

`name--id` combination package name + unique, reproducible package hash
`name--id` combination package name + arbitrary information (most often a version indicator). The packaging system needs to extract the package name from a package id. Valid characters are `[a-zA-Z0-9@._+-]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name--version. That's the terminology used within pkgpanda itself, and it's confusing to tell someone that the package ID contains an ID.

lib/ # Will be linked to /opt/mesosphere/lib
bin/ # Will be linked to /opt/mesosphere/bin
etc/ # Will be linked to /opt/mesosphere/etc
dcos.target.wants/ # Will be linked into /etc/systemd/system/dcos.target.wants depending on role
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "depending on role" applies to all well-known dirs. For example, you can add a config file intended just for masters by putting it in $PKG_PATH/etc_master/.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

mesos -> /opt/mesosphere/packages/mesos--deadbeefpackage
marathon -> ...
mesos -> /opt/mesosphere/packages/mesos--version
marathon -> ... /opt/mesosphere/packages/marathon--version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ... should be removed.

Copy link
Contributor

@lingmann lingmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge improvement, thanks @mellenburg !

@lingmann
Copy link
Contributor

lingmann commented Apr 4, 2017

@branden @vespian can you please update your reviews? I believe this is a big improvement over what we had before, so if you are both in agreement with that, then I think we should land it. We can continue to iterate with future PR's, it won't be the last documentation PR I'm sure. ;-)

Copy link
Contributor

@branden branden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

mellenburg pushed a commit to mesosphere/dcos that referenced this pull request Apr 5, 2017
@mellenburg mellenburg mentioned this pull request Apr 5, 2017
@spahl spahl merged commit dffda93 into dcos:master Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants