## Security in AWS Glue

Cloud security at AWS is the highest priority. As an AWS customer, you benefit from a data center and network architecture that is built to meet the requirements of the most security-sensitive organizations.

__Security__ is a shared responsibility between AWS and you. The [shared responsibility model](https://aws.amazon.com/compliance/shared-responsibility-model/) describes this as security of the cloud and security in the cloud:

- __Security of the cloud__ – AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud. AWS also provides you with services that you can use securely. Third-party auditors regularly test and verify the effectiveness of our security as part of the [AWS compliance programs](https://aws.amazon.com/compliance/programs/). To learn about the compliance programs that apply to AWS Glue, see [AWS Services in Scope by Compliance Program](https://aws.amazon.com/compliance/services-in-scope/).

- __Security in the cloud__ – Your responsibility is determined by the AWS service that you use. You are also responsible for other factors including the sensitivity of your data, your company’s requirements, and applicable laws and regulations.

This documentation helps you understand how to apply the shared responsibility model when using AWS Glue. The following topics show you how to configure AWS Glue to meet your security and compliance objectives. You also learn how to use other AWS services that help you to monitor and secure your AWS Glue resources.

Topics

- [Data Protection in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/data-protection.html)
- [Identity and Access Management in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/authentication-and-access-control.html)
- [Logging and Monitoring in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/logging-and-monitoring.html)
- [Compliance Validation for AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/compliance.html)
- [Resilience in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/disaster-recovery-resiliency.html)
- [Infrastructure Security in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/infrastructure-security.html)

### Data Protection in AWS Glue

AWS Glue offers several features that are designed to help protect your data.

Topics

- [Encryption at Rest](https://docs.aws.amazon.com/glue/latest/dg/encryption-at-rest.html)
- [Encryption in Transit](https://docs.aws.amazon.com/glue/latest/dg/encryption-in-transit.html)
- [FIPS Compliance](https://docs.aws.amazon.com/glue/latest/dg/fips-compliance.html)
- [Key Management](https://docs.aws.amazon.com/glue/latest/dg/key-management.html)
- [AWS Glue Dependency on Other AWS Services](https://docs.aws.amazon.com/glue/latest/dg/dependency-on-other-services.html)
- [Development Endpoints](https://docs.aws.amazon.com/glue/latest/dg/dev-endpoints.html)

### Encryption at Rest

AWS Glue supports data encryption at rest for [Authoring Jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/author-job.html) and [Developing Scripts Using Development Endpoints](https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint.html). You can configure extract, transform, and load (ETL) jobs and development endpoints to use [AWS Key Management Service (AWS KMS)](https://aws.amazon.com/kms/) keys to write encrypted data at rest. You can also encrypt the metadata stored in the [AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/components-overview.html#data-catalog-intro) using keys that you manage with AWS KMS. Additionally, you can use AWS KMS keys to encrypt job bookmarks and the logs generated by crawlers and ETL jobs.

You can encrypt metadata objects in your AWS Glue Data Catalog in addition to the data written to Amazon Simple Storage Service (Amazon S3) and Amazon CloudWatch Logs by jobs, crawlers, and development endpoints. When you create jobs, crawlers, and development endpoints in AWS Glue, you can provide encryption settings by attaching a security configuration. Security configurations contain Amazon S3-managed server-side encryption keys (SSE-S3) or customer master keys (CMKs) stored in AWS KMS (SSE-KMS). You can create security configurations using the AWS Glue console.

You can also enable encryption of the entire Data Catalog in your account. You do so by specifying CMKs stored in AWS KMS.

!Important
AWS Glue supports only symmetric CMKs. For more information, see [Customer Master Keys (CMKs)](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) in the AWS Key Management Service Developer Guide.

With encryption enabled, when you add Data Catalog objects, run crawlers, run jobs, or start development endpoints, SSE-S3 or SSE-KMS keys are used to write data at rest. In addition, you can configure AWS Glue to only access Java Database Connectivity (JDBC) data stores through a trusted Secure Sockets Layer (SSL) protocol.

In AWS Glue, you control encryption settings in the following places:

- The settings of your Data Catalog.

- The security configurations that you create.

- The server-side encryption setting (SSE-S3 or SSE-KMS) that is passed as a parameter to your AWS Glue ETL (extract, transform, and load) job.

For more information about how to set up encryption, see [Setting Up Encryption in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/set-up-encryption.html).

Topics

- [Encrypting Your Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/encrypt-glue-data-catalog.html)
- [Encrypting Connection Passwords](https://docs.aws.amazon.com/glue/latest/dg/encrypt-connection-passwords.html)
- [Encrypting Data Written by Crawlers, Jobs, and Development Endpoints](https://docs.aws.amazon.com/glue/latest/dg/encryption-security-configuration.html)

### Encryption in Transit

AWS provides Secure Sockets Layer (SSL) encryption for data in motion. You can configure [encryption settings](https://docs.aws.amazon.com/glue/latest/dg/console-security-configurations.html) for crawlers, ETL jobs, and development endpoints using security configurations in AWS Glue. You can enable AWS Glue Data Catalog encryption via the settings for the Data Catalog.

As of September 4, 2018, AWS KMS (bring your own key and server-side encryption) for AWS Glue ETL and the AWS Glue Data Catalog is supported.

### FIPS Compliance

If you require FIPS 140-2 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint. For more information about the available FIPS endpoints, see [Federal Information Processing Standard (FIPS) 140-2](http://aws.amazon.com/compliance/fips/).

### Key Management

You can use AWS Identity and Access Management (IAM) with AWS Glue to define users, AWS resources, groups, roles and fine-grained policies regarding access, denial, and more.

You can define the access to the metadata using both resource-based and identity-based policies, depending on your organization’s needs. Resource-based policies list the principals that are allowed or denied access to your resources, allowing you to set up policies such as cross-account access. Identity policies are specifically attached to users, groups, and roles within IAM.

For a step-by-step example, see [Restrict access to your AWS Glue Data Catalog with resource-level IAM permissions and resource-based policies](http://aws.amazon.com/blogs/big-data/restrict-access-to-your-aws-glue-data-catalog-with-resource-level-iam-permissions-and-resource-based-policies/) on the AWS Big Data Blog.

The fine-grained access portion of the policy is defined within the Resource clause. This portion defines both the AWS Glue Data Catalog object that the action can be performed on, and what resulting objects get returned by that operation.

A development endpoint is an environment that you can use to develop and test your AWS Glue scripts. You can add, delete, or rotate the SSH key of a development endpoint.

As of September 4, 2018, AWS KMS (bring your own key and server-side encryption) for AWS Glue ETL and the AWS Glue Data Catalog is supported.

### AWS Glue Dependency on Other AWS Services

For a user to work with the AWS Glue console, that user must have a minimum set of permissions that allows them to work with the AWS Glue resources for their AWS account. In addition to these AWS Glue permissions, the console requires permissions from the following services:

Amazon CloudWatch Logs permissions to display logs.

AWS Identity and Access Management (IAM) permissions to list and pass roles.

Amazon CloudFront permissions to work with stacks.

Amazon Elastic Compute Cloud (Amazon EC2) permissions to list virtual private clouds (VPCs), subnets, security groups, instances, and other objects (to set up Amazon EC2 items such as VPCs when running jobs, crawlers, and creating development endpoints).

Amazon Simple Storage Service (Amazon S3) permissions to list buckets and objects, and to retrieve and save scripts.

Amazon Redshift permissions to work with clusters.

Amazon Relational Database Service (Amazon RDS) permissions to list instances.

### Development Endpoints

A development endpoint is an environment that you can use to develop and test your AWS Glue scripts. You can use AWS Glue to create, edit, and delete development endpoints. The __Dev Endpoints__ tab on the AWS Glue console lists all the development endpoints that are created. You can add, delete, or rotate the SSH key of a development endpoint. You can also create notebooks that use the development endpoint.

You provide configuration values to provision the development environments. These values tell AWS Glue how to set up the network so that you can access the development endpoint securely, and so that your endpoint can access your data stores. Then, you can create a notebook that connects to the development endpoint. You use your notebook to author and test your ETL script.

Use an AWS Identity and Access Management (IAM) role with permissions similar to the IAM role that you use to run AWS Glue ETL jobs. Use a virtual private cloud (VPC), a subnet, and a security group to create a development endpoint that can connect to your data resources securely. You generate an SSH key pair to connect to the development environment using SSH.

You can create development endpoints for Amazon S3 data and within a VPC that you can use to access datasets using JDBC.

You can install an Apache Zeppelin notebook on your local machine and use it to debug and test ETL scripts on a development endpoint. Or, you can host the Zeppelin notebook on an Amazon EC2 instance. A notebook server is a web-based environment that you can use to run your PySpark statements.

AWS Glue tags Amazon EC2 instances with a name that is prefixed with `aws-glue-dev-endpoint`.

You can set up a notebook server on a development endpoint to run PySpark statements with AWS Glue extensions. For more information about Zeppelin notebooks, see [Apache Zeppelin](http://zeppelin.apache.org/).


## Identity and Access Management in AWS Glue

Access to AWS Glue requires credentials. Those credentials must have permissions to access AWS resources, such as an AWS Glue table or an Amazon Elastic Compute Cloud (Amazon EC2) instance. The following sections provide details on how you can use [AWS Identity and Access Management (IAM)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) and AWS Glue to help secure access to your resources.

Topics

- [Authentication](https://docs.aws.amazon.com/glue/latest/dg/authentication-and-access-control.html#authentication)
- [Managing Access Permissions for AWS Glue Resources](https://docs.aws.amazon.com/glue/latest/dg/access-control-overview.html)
- [Granting Cross-Account Access](https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html)
- [Specifying AWS Glue Resource ARNs](https://docs.aws.amazon.com/glue/latest/dg/glue-specifying-resource-arns.html)
- [AWS Glue Access Control Policy Examples](https://docs.aws.amazon.com/glue/latest/dg/glue-policy-examples.html)
- [AWS Glue API Permissions: Actions and Resources Reference](https://docs.aws.amazon.com/glue/latest/dg/api-permissions-reference.html)



### Authentication
You can access AWS as any of the following types of identities:

__AWS account root user__ – When you first create an AWS account, you begin with a single sign-in identity that has complete access to all AWS services and resources in the account. This identity is called the AWS account root user and is accessed by signing in with the email address and password that you used to create the account. We strongly recommend that you do not use the root user for your everyday tasks, even the administrative ones. Instead, [adhere to the best practice of using the root user only to create your first IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#create-iam-users). Then securely lock away the root user credentials and use them to perform only a few account and service management tasks.

__IAM user__ – An [IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html) is an identity within your AWS account that has specific custom permissions (for example, permissions to create a table in AWS Glue). You can use an IAM user name and password to sign in to secure AWS webpages like the [AWS Management Console](https://console.aws.amazon.com/), [AWS Discussion Forums](https://forums.aws.amazon.com/), or the [AWS Support Center](https://console.aws.amazon.com/support/home#/).

In addition to a user name and password, you can also [generate access keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) for each user. You can use these keys when you access AWS services programmatically, either through one of the [several SDKs](https://aws.amazon.com/tools/#sdk) or by using the [AWS Command Line Interface (CLI)](https://aws.amazon.com/cli/). The SDK and CLI tools use the access keys to cryptographically sign your request. If you don’t use AWS tools, you must sign the request yourself. AWS Glue supports Signature Version 4, a protocol for authenticating inbound API requests. For more information about authenticating requests, see [Signature Version 4 Signing Process](https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html) in the AWS General Reference.

__IAM role__ – An [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) is an IAM identity that you can create in your account that has specific permissions. An IAM role is similar to an IAM user in that it is an AWS identity with permissions policies that determine what the identity can and cannot do in AWS. However, instead of being uniquely associated with one person, a role is intended to be assumable by anyone who needs it. Also, a role does not have standard long-term credentials such as a password or access keys associated with it. Instead, when you assume a role, it provides you with temporary security credentials for your role session. IAM roles with temporary credentials are useful in the following situations:

- __Federated user access__ – Instead of creating an IAM user, you can use existing identities from AWS Directory Service, your enterprise user directory, or a [web identity provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers.html). These are known as federated users. AWS assigns a role to a federated user when access is requested through an identity provider. For more information about federated users, see [Federated users and roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_access-management.html#intro-access-roles) in the IAM User Guide.

- __AWS service access__ – A service role is an IAM role that a service assumes to perform actions on your behalf. Service roles provide access only within your account and cannot be used to grant access to services in other accounts. An IAM administrator can create, modify, and delete a service role from within IAM. For more information, see Creating a role to delegate permissions to an AWS service in the IAM User Guide.

- __Applications running on Amazon EC2__ – You can use an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) to manage temporary credentials for applications that are running on an EC2 instance and making AWS CLI or AWS API requests. This is preferable to storing access keys within the EC2 instance. To assign an AWS role to an EC2 instance and make it available to all of its applications, you create an instance profile that is attached to the instance. An instance profile contains the role and enables programs that are running on the EC2 instance to get temporary credentials. For more information, see Using an [IAM role to grant permissions to applications running on Amazon EC2 instances](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) in the IAM User Guide.




## Managing Access Permissions for AWS Glue Resources

You can have valid credentials to authenticate your requests, but unless you have the appropriate permissions, you can't create or access an AWS Glue resource such as a table in the AWS Glue Data Catalog.

Every AWS resource is owned by an AWS account, and permissions to create or access a resource are governed by permissions policies. An account administrator can attach permissions policies to IAM identities (that is, users, groups, and roles). Some services (such as AWS Glue and Amazon S3) also support attaching permissions policies to the resources themselves.

Note
An account administrator (or administrator user) is a user who has administrative privileges. For more information, see IAM Best Practices in the IAM [User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html).

When granting permissions, you decide who is getting the permissions, the resources they get permissions for, and the specific actions that you want to allow on those resources.

Note
You can grant access to your data by using AWS Glue methods or by using AWS Lake Formation grants. The AWS Glue methods use AWS Identity and Access Management (IAM) policies to achieve fine-grained access control. Lake Formation uses a simpler `GRANT/REVOKE` permissions model similar to the `GRANT/REVOKE` commands in a relational database system.

This section describes using the AWS Glue methods. For information about using Lake Formation grants, see Granting Lake Formation Permissions in the AWS Lake Formation Developer Guide.

Topics

Using Permissions Policies to Manage Access to Resources
AWS Glue Resources and Operations
Understanding Resource Ownership
Managing Access to Resources
Specifying Policy Elements: Actions, Effects, and Principals
Specifying Conditions in a Policy
Identity-Based Policies (IAM Policies) for Access Control
AWS Glue Resource Policies for Access Control
Using Permissions Policies to Manage Access to Resources
A permissions policy is defined by a JSON object that describes who has access to what. The syntax of the JSON object is largely defined by AWS Identity and Access Management (IAM). For more information, see IAM JSON Policy Reference in the IAM User Guide.

Note
This section discusses using IAM in the context of AWS Glue, but it does not provide detailed information about the IAM service. For more information, see What Is IAM? in the IAM User Guide.

For a table showing all of the AWS Glue API operations and the resources that they apply to, see AWS Glue API Permissions: Actions and Resources Reference.

To learn more about IAM policy syntax and descriptions, see IAM JSON Policy Reference in the IAM User Guide.

AWS Glue supports two kinds of policies:

Identity-Based Policies (IAM Policies) for Access Control

AWS Glue Resource Policies for Access Control

By supporting both identity-based and resource policies, AWS Glue gives you fine-grained control over who can access what metadata.

For more examples, see AWS Glue Resource-Based Access Control Policy Examples.

AWS Glue Resources and Operations
AWS Glue provides a set of operations to work with AWS Glue resources. For a list of available operations, see AWS Glue API.

Understanding Resource Ownership
The AWS account owns the resources that are created in the account, regardless of who created the resources. Specifically, the resource owner is the AWS account of the principal entity (that is, the AWS account root user, an IAM user, or an IAM role) that authenticates the resource creation request. The following examples illustrate how this works:

If you use the AWS account root user credentials of your AWS account to create a table, your AWS account is the owner of the resource (in AWS Glue, the resource is a table).

If you create an IAM user in your AWS account and grant permissions to create a table to that user, the user can create a table. However, your AWS account, which the user belongs to, owns the table resource.

If you create an IAM role in your AWS account with permissions to create a table, anyone who can assume the role can create a table. Your AWS account, to which the user belongs, owns the table resource.

Managing Access to Resources
A permissions policy describes who has access to what. The following section explains the available options for creating permissions policies.

Note
This section discusses using IAM in the context of AWS Glue. It doesn't provide detailed information about the IAM service. For complete IAM documentation, see What Is IAM? in the IAM User Guide. For information about IAM policy syntax and descriptions, see IAM JSON Policy Reference in the IAM User Guide.

Policies that are attached to an IAM identity are referred to as identity-based policies (IAM policies). Policies that are attached to a resource are referred to as resource-based policies.

Topics

Identity-Based Policies (IAM Policies)
Resource-Based Policies
Identity-Based Policies (IAM Policies)
You can attach policies to IAM identities. For example, you can do the following:

Attach a permissions policy to a user or a group in your account – To grant a user permissions to create an AWS Glue resource, such as a table, you can attach a permissions policy to a user or group that the user belongs to.

Attach a permissions policy to a role (grant cross-account permissions) – You can attach an identity-based permissions policy to an IAM role to grant cross-account permissions. For example, the administrator in account A can create a role to grant cross-account permissions to another AWS account (for example, account B) or an AWS service as follows:

Account A administrator creates an IAM role and attaches a permissions policy to the role that grants permissions on resources in account A.

Account A administrator attaches a trust policy to the role identifying account B as the principal who can assume the role.

Account B administrator can then delegate permissions to assume the role to any users in account B. Doing this allows users in account B to create or access resources in account A. The principal in the trust policy can also be an AWS service principal if you want to grant an AWS service permissions to assume the role.

For more information about using IAM to delegate permissions, see Access Management in the IAM User Guide.

The following is an example identity-based policy that grants permissions for one AWS Glue action (GetTables). The wildcard character (*) in the Resource value means that you are granting permission to this action to obtain names and details of all the tables in a database in the Data Catalog. If the user also has access to other catalogs through a resource policy, it is given access to these resources too.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GetTables",
            "Effect": "Allow",
            "Action": [
                "glue:GetTables"
            ],
            "Resource": "*"
        }
    ]
}
For more information about using identity-based policies with AWS Glue, see Identity-Based Policies (IAM Policies) for Access Control. For more information about users, groups, roles, and permissions, see Identities (Users, Groups, and Roles) in the IAM User Guide.

Resource-Based Policies
Other services, such as Amazon S3, also support resource-based permissions policies. For example, you can attach a policy to an S3 bucket to manage access permissions to that bucket.

Specifying Policy Elements: Actions, Effects, and Principals
For each AWS Glue resource, the service defines a set of API operations. To grant permissions for these API operations, AWS Glue defines a set of actions that you can specify in a policy. Some API operations can require permissions for more than one action in order to perform the API operation. For more information about resources and API operations, see AWS Glue Resources and Operations and AWS Glue AWS Glue API.

The following are the most basic policy elements:

Resource – You use an Amazon Resource Name (ARN) to identify the resource that the policy applies to. For more information, see AWS Glue Resources and Operations.

Action – You use action keywords to identify resource operations that you want to allow or deny. For example, you can use create to allow users to create a table.

Effect – You specify the effect, either allow or deny, when the user requests the specific action. If you don't explicitly grant access to (allow) a resource, access is implicitly denied. You can also explicitly deny access to a resource, which you might do to make sure that a user cannot access it, even if a different policy grants access.

Principal – In identity-based policies (IAM policies), the user that the policy is attached to is the implicit principal. For resource-based policies, you specify the user, account, service, or other entity that you want to receive permissions (applies to resource-based policies only). AWS Glue doesn't support resource-based policies.

To learn more about IAM policy syntax and descriptions, see IAM JSON Policy Reference in the IAM User Guide.

For a table showing all of the AWS Glue API operations and the resources that they apply to, see AWS Glue API Permissions: Actions and Resources Reference.

Specifying Conditions in a Policy
When you grant permissions, you can use the access policy language to specify the conditions when a policy should take effect. For example, you might want a policy to be applied only after a specific date. For more information about specifying conditions in a policy language, see Condition in the IAM User Guide.

To express conditions, you use predefined condition keys. There are AWS-wide condition keys and AWS Glue–specific keys that you can use as appropriate. For a complete list of AWS-wide keys, see Available Keys for Conditions in the IAM User Guide.