## Authoring Jobs in AWS Glue
https://docs.aws.amazon.com/glue/latest/dg/author-job.html

A job is the business logic that performs the extract, transform, and load (ETL) work in AWS Glue. When you start a job, AWS Glue runs a script that extracts data from sources, transforms the data, and loads it into targets. You can create jobs in the ETL section of the AWS Glue console. For more information, see [Working with Jobs on the AWS Glue Console](https://docs.aws.amazon.com/glue/latest/dg/console-jobs.html).

The following diagram summarizes the basic workflow and steps involved in authoring a job in AWS Glue:
<img src="https://docs.aws.amazon.com/glue/latest/dg/images/AuthorJob-overview.png" align="left" alt="populate catalog" width = "800">

Topics

- [Workflow Overview](https://docs.aws.amazon.com/glue/latest/dg/author-job.html#author-job-workflow)
- [Adding Jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/add-job.html)
- [Editing Scripts in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/edit-script.html)
- [Working with MongoDB Connections in ETL Jobs](https://docs.aws.amazon.com/glue/latest/dg/integrate-with-mongo-db.html)
- [Developing Scripts Using Development Endpoints](https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint.html)
- [Managing Notebooks](https://docs.aws.amazon.com/glue/latest/dg/notebooks-with-glue.html)

Workflow Overview
When you author a job, you supply details about data sources, targets, and other information. The result is a generated Apache Spark API (PySpark) script. You can then store your job definition in the AWS Glue Data Catalog.

The following describes an overall process of authoring jobs in the AWS Glue console:

1. You choose a data source for your job. The tables that represent your data source must already be defined in your Data Catalog. If the source requires a connection, the connection is also referenced in your job. If your job requires multiple data sources, you can add them later by editing the script.

2. You choose a data target of your job. The tables that represent the data target can be defined in your Data Catalog, or your job can create the target tables when it runs. You choose a target location when you author the job. If the target requires a connection, the connection is also referenced in your job. If your job requires multiple data targets, you can add them later by editing the script.

3. You customize the job-processing environment by providing arguments for your job and generated script. For more information, see [Adding Jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/add-job.html).

4. Initially, AWS Glue generates a script, but you can also edit this script to add sources, targets, and transforms. For more information about transforms, see [Built-In Transforms](https://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html).

5. You specify how your job is invoked, either on demand, by a time-based schedule, or by an event. For more information, see [Starting Jobs and Crawlers Using Triggers](https://docs.aws.amazon.com/glue/latest/dg/trigger-job.html).

6. Based on your input, AWS Glue generates a PySpark or Scala script. You can tailor the script based on your business needs. For more information, see [Editing Scripts in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/edit-script.html).