-
Notifications
You must be signed in to change notification settings - Fork 365
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Moving Genie 3 concepts to the reference documentation (#444)
- Loading branch information
Showing
13 changed files
with
1,660 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
== Concepts | ||
|
||
include::_dataModel.adoc[] | ||
|
||
include::_howItWorks.adoc[] | ||
|
||
include::_netflixExample.adoc[] | ||
|
||
include::_netflixDeployment.adoc[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
=== Data Model | ||
|
||
The Genie 3 data model contains modifications and additions to the | ||
https://netflix.github.io/genie/concepts/2/DataModel.html[Genie 2 data model] to enable even more flexibility, | ||
modularity and meta data retention. | ||
|
||
Several limitations of the Genie 2 data model were uncovered in its more than two years of production use at Netflix. A | ||
few of those limitations were: | ||
|
||
* Inability to link multiple applications to a command | ||
* Lack of insight into what the original user request for a job contained | ||
* Too much normalization, particularly between tags and jobs, causing inefficient joins and slow performance | ||
* Lack of system data for job dimensions like attachment size, output size, etc. | ||
The changes made to the data model will become apparent in the following sections. | ||
|
||
==== Caveats | ||
|
||
* What we called `entities` in Genie 2 we'll call `resources` in Genie 3 to decouple the idea from database | ||
implementation to user API level | ||
* The specific resource fields are *NOT* defined in this document. These fields are available in the | ||
https://netflix.github.io/genie/docs/{revnumber}/rest/[REST API documentation] | ||
** This documentation covers the higher level purposes of the resources and not their contents | ||
|
||
==== Resources | ||
|
||
The following sections describe the various resources available from the Genie REST APIs. You should reference the | ||
https://netflix.github.io/genie/docs/{revnumber}/rest/[API Docs] for how to manipulate these resources. These sections | ||
will focus on the high level purposes for each resource and how they rely and/or interact with other resources within | ||
the system. | ||
|
||
===== Tagging | ||
|
||
One important aspect to comment on is tagging of resources. Genie relies heavily on tags for how the system discovers | ||
resources like clusters and commands for a job. Each of the core resources has a set of tags that can be associated | ||
with them. These tags can be of set to whatever you want but it is recommended to come up with some sort of consistent | ||
structure for your tags to make it easier for users to understand their purpose. For example at Netflix we've adopted | ||
some of the following standards for our tags: | ||
|
||
* `sched:{something}` | ||
** This corresponds to any schedule like that this resource (likely a cluster) is expected to adhere to | ||
** e.g. `sched:sla` or `sched:adhoc` | ||
* `type:{something}` | ||
** e.g. `type:yarn` or `type:presto` for a cluster or `type:hive` or `type:spark` for a command | ||
* `ver:{something}` | ||
** The specific version of said resource | ||
** e.g. two different Spark commands could have `ver:1.6.1` vs `ver:2.0.0` | ||
* `data:{something}` | ||
** Used to classify the type of data a resource (usually a command) will access | ||
** e.g. `data:prod` or `data:test` | ||
|
||
===== Configuration Resources | ||
|
||
The following resources (applications, commands and clusters) are considered configuration, or admin, resources. | ||
They're generally set up by the Genie administrator and available to all users for user with their jobs. | ||
|
||
====== Application Resource | ||
|
||
An application resource represents pretty much what you'd expect. It is a reusable set of binaries, configuration files | ||
and setup files that can be used to install and configure (surprise!) an application. Generally these resources are | ||
used when an application isn't already installed and on the `PATH` on a Genie node. | ||
|
||
When a job is run and applications are involved in running the job Genie will download all the dependencies, | ||
configuration files and setup files of each application and store it all in the job working directory. It will then | ||
execute the setup script in order to install that application for that job. Genie is "dumb" as to the contents or | ||
purpose of any of these files so the onus is on the administrators to create and test these packages. | ||
|
||
Applications are very useful for decoupling application binaries from a Genie deployment. For example you could deploy | ||
a Genie cluster and change the version of Hadoop, Hive, Spark that Genie will download without actually re-deploying | ||
Genie. Applications can be combined together via a command. This will be explained more in the Command section. | ||
|
||
The first entity to talk about is an application. Applications are linked to commands in order for binaries and | ||
configurations to be downloaded and installed at runtime. Within Netflix this is frequently used to deploy new clients | ||
without redeploying a Genie cluster. | ||
|
||
At Netflix our applications frequently consists of a zip of all the binaries uploaded to s3 along with a setup file to | ||
unzip and configure environment variables for that application. | ||
|
||
It is important to note applications are entirely optional and if you prefer to just install all client binaries on a | ||
Genie node beforehand you're free to do that. It will save in overhead for job launch times but you will lose | ||
flexibility in the trade-off. | ||
|
||
====== Command Resource | ||
|
||
Commands resources primarily represent what a user would enter at the command line if you wanted to run a process on a | ||
cluster and what dependencies (applications) you would need on your PATH to make that possible. | ||
|
||
Commands can have configuration and setup files just like applications but primarily they should have an ordered list | ||
of applications associated with them if necessary. For example lets take a typical scenario involving running Hive. To | ||
run Hive you generally need a few things: | ||
|
||
. A cluster to run its processing on (we'll get to that in the Cluster section) | ||
. A hive-site.xml file which says what metastore to connect to amongst other settings | ||
. Hive binaries | ||
. Hadoop configurations and binaries | ||
|
||
So a typical setup for Hive in Genie would be to have one, or many, Hive commands configured. Each command would have | ||
its own hive-site.xml pointing to a specific metastore (prod, test, etc). The command would depend on Hadoop and Hive | ||
applications already configured which would have all the default Hadoop and Hive binaries and configurations. All this | ||
would be combined in the job working directory in order to run Hive. | ||
|
||
You can have any number of commands configured in the system. They should then be linked to the clusters they can | ||
execute on. Clusters are explained next. | ||
|
||
====== Cluster | ||
|
||
A cluster stores all the details of an execution cluster including connection information, properties, etc. Some | ||
cluster examples are Hadoop, Spark, Presto, etc. Every cluster can be linked to a set of commands that it can run. | ||
|
||
Genie does *not* launch clusters for you. It merely stores metadata about the clusters you have running in your | ||
environment so that jobs using commands and applications can connect to them. | ||
|
||
Once a cluster has been linked to commands your Genie instance is ready to start running jobs. The job resources are | ||
described in the following section. One important thing to note is that the list of commands linked to the cluster | ||
is a priority ordered list. That means if you have two pig commands available on your system for the same cluster the | ||
first one found in the list will be chosen provided all tags match. See <<How it Works>> for more details. | ||
|
||
===== Job Resources | ||
|
||
The following resources all relate to a user submitting and monitoring a given job. They are split up from the Genie 2 | ||
Job idea to provide better separation of concerns as usually a user doesn't care about certain things. What node a | ||
job ran on or its Linux process exit code for example. | ||
|
||
Users interact with these entities directly though all but the initial job request are *read-only* in the sense you can | ||
only get their current state back from Genie. | ||
|
||
====== Job Request | ||
|
||
This is the resource you use to kick off a job. It contains all the information the system needs to run a job. | ||
Optionally the REST APIs can take attachments. All attachments and file dependencies are downloaded into the root of | ||
the jobs working directory. The most important aspects are the command line arguments, the cluster criteria and the | ||
command criteria. These dictate the which cluster, command and arguments will be used when the job is executed. See the | ||
<<How it Works>> section for more details. | ||
|
||
====== Job | ||
|
||
The job resource is created in the system after a Job Request is received. All the information a typical user would be | ||
interested in should be contained within this resource. It has links to the command, cluster and applications used to | ||
run the job as well as the meta information like status, start time, end time and others. See the | ||
https://netflix.github.io/genie/docs/{revnumber}/rest/[REST API documentation] for more details. | ||
|
||
====== Job Execution | ||
|
||
The job execution resource contains information about where a job was run and other information that may be more | ||
interesting to a system admin than a regular user. Frequently useful in debugging. | ||
|
||
A job contains all the details of a job request and execution including any command line arguments. Based on the | ||
request parameters, a cluster and command combination is selected for execution. Job requests can also supply necessary | ||
files to Genie either as attachments or using the file dependencies field if they already exist in an accessible file | ||
system. As a job executes, its details are recorded in the job record within the Genie database. | ||
|
||
==== Wrap-up | ||
|
||
Hopefully this guide provides insight into how the Genie data model is thought out and works together. It is meant to | ||
be very generic and support as many use cases as possible without modifications to the Genie codebase. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
=== How it Works | ||
|
||
This section is meant to provide context for how Genie can be configured with Clusters, Commands and Applications (see | ||
<<Data Model>> for details) and then how these work together in order to run a job on a Genie node. | ||
|
||
==== Resource Configuration | ||
|
||
This section describes how configuration of Genie works from an administrator point of view. This isn't how to | ||
install and configure the Genie application itself. Rather it is how to configure the various resources involved in | ||
running a job. | ||
|
||
===== Register Resources | ||
|
||
All resources (clusters, commands, applications) should be registered with Genie before attempting to link them | ||
together. Any files these resources depend on should be uploaded somewhere Genie can access them (S3, web server, | ||
mounted disk, etc). | ||
|
||
Tagging of the resources, particularly Clusters and Commands, is extremely important. Genie will use the tags in order | ||
to find a cluster/command combination to run a job. You should come up with a convenient tagging scheme for your | ||
organization. At Netflix we try to stick to a pattern for tags structures like `{tag category}:{tag value}`. For | ||
example `type:yarn` or `data:prod`. This allows the tags to have some context so that when users look at what resources | ||
are available they can find what to submit their jobs with so it is routed to the correct cluster/command combination. | ||
|
||
===== Linking Resources | ||
|
||
Once resources are registered they should be linked together. By linking we mean to represent relationships between the | ||
resources. | ||
|
||
====== Commands for a Cluster | ||
|
||
Adding commands to a cluster means that the administrator acknowledges that this cluster can run a given set of | ||
commands. If a command is not linked to a cluster it cannot be used to run a job. | ||
|
||
The commands are added in priority order. For example say you have different Spark commands you want to add to a given | ||
YARN cluster but you want Genie to treat one as the default. Here is how those commands might be tagged: | ||
|
||
Spark 1.6.0 (id: spark16) | ||
* `type:sparksubmit` | ||
* `ver:1.6` | ||
* `ver:1.6.0` | ||
|
||
Spark 1.6.1 (id: spark161) | ||
* `type:sparksubmit` | ||
* `ver:1.6.1` | ||
|
||
Spark 2.0.0 (id: spark200) | ||
* `type:sparksubmit` | ||
* `ver:2.0` | ||
* `ver:2.0.0` | ||
|
||
Now if we added the commands to the cluster in this order: `spark16, spark161, spark200` and a user submitted a job | ||
only requesting a command tagged with `type:sparksubmit` (as in they don't care what version just the default) they | ||
would get Spark 1.6.0. However if we later deemed 2.0.0 to be ready to be the default we would reorder the commands to | ||
`spark200, spark16, spark161` and that same job if submitted again would now run with Spark 2.0.0. | ||
|
||
====== Applications for a Command | ||
|
||
Linking application(s) to commands means that a command has a dependency on said application(s). The order of the | ||
applications added is important because Genie will setup the applications in that order. Meaning if one application | ||
depends on another (e.g. Spark depends on Hadoop on classpath for YARN mode) Hadoop should be ordered first. All | ||
applications must successfully be installed before Genie will start running the job. | ||
|
||
==== Job Submission | ||
|
||
OK. The system admin has everything registered and linked together. Things could obviously change but that's mostly | ||
transparent to end users. They just want to run jobs.How does that work? This section attempts to walk through what | ||
happens at a high level. The example section lower down will walk through a step by step example. | ||
|
||
===== User Submits a Job Request | ||
|
||
In order to submit a job request there is some work a user will have to do up front. What kind of job are they running? | ||
What cluster do they want to run on? What command do they want to use? Do they care about certain details like version | ||
or just want the defaults? Once they determine the answers to the questions they can decide how they want to tag their | ||
job request for the `clusterCriterias` and `commandCriteria` fields. | ||
|
||
General rule of thumb for these fields is to use the lowest common denominator of tags to accomplish what a user | ||
requires. This will allow the most flexibility for the job to be moved to different clusters or commands as need be. | ||
For example if they want to run a Spark job and don't really care about version it is better to just say | ||
"type:sparksubmit" (assuming this is tagging structure at your organization) only instead of that *and* "ver:2.0.0". | ||
This way when version 2.0.1 or 2.1.0 comes down the pipe the job moves along with the new default. Obviously if they do | ||
care about version they should set it or any other specific tag. | ||
|
||
The `clusterCriterias` field is an array of `ClusterCriteria` objects. This is done to provide a fallback mechanism. If | ||
the no match is found for the first `ClusterCriteria` and `commandCriteria` combination it will move onto the second | ||
and so on until all options are exhausted. This is handy if it is desirable to run a job on some cluster that is only | ||
up some of the time but other times it isn't and its fine to run it on some other cluster that is always available. | ||
|
||
Other things a user needs to consider when submitting a job. All dependencies which aren't sent as attachments must | ||
already be uploaded somewhere Genie can access them. Somewhere like S3, web server, shared disk, etc. | ||
|
||
Users should familiarize themselves with whatever the `executable` for their desired command includes. It's possible | ||
the system admin has setup some default parameters they should know are there so as to avoid duplication or unexpected | ||
behavior. Also they should make sure they know all the environment variables that may be available to them as part of | ||
the setup process of all the cluster, command and application setup processes. | ||
|
||
===== Genie Receives the Job Request | ||
|
||
When Genie receives the job request it does a few things immediately: | ||
|
||
. If the job request doesn't have an id it creates a GUID for the job | ||
. It saves the job request to the database so it is recorded | ||
.. If the ID is in use a 409 will be returned to the user saying there is a conflict | ||
. It creates job and job execution records in data base for consistency | ||
. It saves any attachments in a temporary location | ||
|
||
Next Genie will attempt to find a cluster and command matching the requested tag combinations. If none is found it will | ||
send a failure back to the user and mark the job failed in the database. | ||
|
||
If a combination is found Genie will then attempt to determine if the node can run the job. By this it means it will | ||
check the amount of client memory the job requires against the available memory in the Genie allocation. If there is | ||
enough the job will be accepted and will be run on this node and the jobs memory is subtracted from the available pool. | ||
If not it will be rejected with a 503 error message and user should retry later. | ||
|
||
The order of priority for how memory for a job is determined is: | ||
|
||
. The memory a user requested in the job request | ||
.. Cannot exceed the max memory allowed by system admin for a given job | ||
. The memory set as the default for a given command by the admins | ||
. The default memory size for a job as set by the system admin | ||
|
||
Successful job submission results in a 202 message to the user stating it's accepted and will be processed | ||
asynchronously by the system. | ||
|
||
===== After Job Accepted | ||
|
||
Once a job has been accepted to run on a Genie node a workflow is executed in order to setup the job working directory | ||
and launch the job. Some minor steps left out for brevity. | ||
|
||
. Job is marked in `INIT` state in the database | ||
. A job directory is created under the admin configured jobs directory with a name of the job id | ||
. A run script file is created with the name `run` under the job working directory | ||
.. Currently this is a bash script | ||
. Kill handlers are added to the run script | ||
. Directories for Genie logs, application files, command files, cluster files are created under the job working | ||
directory | ||
. Default environment variables are added to the run script to export their values | ||
. Cluster configuration files are downloaded and stored in the job work directory | ||
. Cluster related variables are written into the run script | ||
. Application configuration and dependency files are downloaded and stored in the job directory if any applications are | ||
needed | ||
. Application related variables are written into the run script | ||
. Command configuration and dependency files are downloaded and store in the job directory | ||
. Command related variables are written into the run script | ||
. All job dependency files (including attachments) are downloaded into the job working directory | ||
. Job related variables are written into the run script | ||
. Job script is executed in a forked process. | ||
. Script `pid` stored in database `job_executions` table and job marked as `RUNNING` in database | ||
. Monitoring process created for pid | ||
|
||
===== After Job Running | ||
|
||
Once the job is running Genie will poll the PID periodically waiting for it to no longer be used. | ||
|
||
NOTE: Assumption made as to the amount of process churn on the Genie node. We're aware PID's can be reused but | ||
reasonably this shouldn't happen within the poll period given the amount of available PID to the processes a typical | ||
Genie node will run. | ||
|
||
Once the pid no longer exists Genie checks the done file for the exit code. It marks the job succeeded, failed or | ||
killed depending on that code. | ||
|
||
===== Cleaning Up | ||
|
||
To save disk space Genie will delete application dependencies from the job working directory after a job is completed. | ||
This can be disabled by an admin. If the job is marked as it should be archived the working directory will be zipped up | ||
and stored in the default archive location under the `{jobId}.tar.gz`. | ||
|
||
==== User Behavior | ||
|
||
Users can check on the status of their job using the `status` API and get the output using the output APIs. See the | ||
https://netflix.github.io/genie/docs/{revnumber}/rest/[REST Documentation] for specifics on how to do that. | ||
|
||
==== Wrap Up | ||
|
||
WARNING: TODO |
Oops, something went wrong.