# Built-in Transformations

In this lesson, we will explore the built-in transformations available in AWS Glue. By the end of this lesson, you will be able to identify these transformations, understand how to apply them in ETL jobs, and explore relevant use cases.

## Learning Objectives
- Identify built-in transformations available in AWS Glue.
- Understand how to apply transformations in ETL jobs.
- Explore the use cases for each transformation.

## Why This Matters

Built-in transformations simplify the ETL process by providing predefined functions that can be easily applied to data. This reduces the need for custom coding, increases efficiency, and ensures data quality, which is crucial for accurate analytics.

## Built-in Transformations Overview

Built-in transformations are predefined functions provided by AWS Glue that allow users to manipulate and prepare data during the ETL process. These transformations can handle various data types and formats, making it easier to clean, enrich, and structure data for analysis.

In [None]:
# Example: Using the 'Drop Fields' Transformation
# This transformation removes unnecessary fields from a dataset.
transformed_data = data.drop_fields(['unnecessary_field1', 'unnecessary_field2'])
# The transformed_data will now exclude the specified fields.

### Micro-exercise: Built-in Transformations Overview
**Prompt:** List the built-in transformations available in AWS Glue and describe their functionalities.

**Hint:** Refer to the AWS Glue documentation for a comprehensive list.

In [None]:
# Starter code for Micro-exercise 1
# List of built-in transformations
built_in_transformations = ['Drop Fields', 'Map', 'Filter', 'Join', 'Select Fields']
# Print the transformations
for transformation in built_in_transformations:
    print(transformation)

## Applying Transformations

Applying transformations involves selecting the appropriate built-in transformation and configuring it to meet the specific needs of the dataset. This process is critical for ensuring that the data is in the right format and quality for downstream analytics.

In [None]:
# Example: Using the 'Map' Transformation
# This transformation applies a function to each element in a dataset.
transformed_data = data.map(lambda x: {'new_field': x['old_field'] * 2})
# The transformed_data will now contain a new field with doubled values.

### Micro-exercise: Applying Transformations
**Prompt:** Demonstrate how to apply a specific transformation in an ETL job using AWS Glue.

**Starter Code:**
```python
# Example starter code for applying a transformation
transformed_data = data.apply_transformation('transformation_name')
```
**Hint:** Make sure to check the data types before applying transformations.

In [None]:
# Starter code for Micro-exercise 2
# Applying a transformation
transformed_data = data.apply_transformation('Map')
# Print the transformed data
print(transformed_data)

## Examples of Built-in Transformations
### Example 1: Using the 'Drop Fields' Transformation
This example demonstrates how to remove unnecessary fields from a dataset to streamline data processing.
```python
transformed_data = data.drop_fields(['unnecessary_field1', 'unnecessary_field2'])
```
### Example 2: Using the 'Map' Transformation
This example shows how to apply a function to each element in a dataset, transforming the data as needed.
```python
transformed_data = data.map(lambda x: {'new_field': x['old_field'] * 2})
```

## Micro-exercises
1. **Prompt:** List the built-in transformations available in AWS Glue and describe their functionalities.
   **Hint:** Refer to the AWS Glue documentation for a comprehensive list.
2. **Prompt:** Demonstrate how to apply a specific transformation in an ETL job using AWS Glue.
   **Starter Code:**
   ```python
   # Example starter code for applying a transformation
   transformed_data = data.apply_transformation('transformation_name')
   ```
   **Hint:** Make sure to check the data types before applying transformations.

## Main Exercise: Applying Built-in Transformations to a Dataset
In this exercise, you will select a dataset in AWS Glue, choose a built-in transformation to apply, and run the ETL job to observe the results. You will also document the transformation process and the output.

**Starter Code:**
```python
# Starter code for the main exercise
selected_data = glueContext.create_dynamic_frame.from_catalog(database='your_database', table_name='your_table')
transformed_data = selected_data.apply_transformation('your_transformation')
```
**Expected Outcomes:**
- A transformed dataset reflecting the applied transformations.
- Documentation of the transformation steps taken and the rationale behind them.

In [None]:
# Starter code for Main Exercise
# Load data from AWS Glue catalog
selected_data = glueContext.create_dynamic_frame.from_catalog(database='your_database', table_name='your_table')
# Apply a transformation
transformed_data = selected_data.apply_transformation('Drop Fields')
# Show the transformed data
transformed_data.show()

## Common Mistakes
- Not understanding the data type requirements for transformations.
- Applying transformations without validating the output, leading to data quality issues.

## Recap & Next Steps
In this lesson, we covered the built-in transformations available in AWS Glue, how to apply them in ETL jobs, and explored relevant use cases. Understanding these concepts is crucial for effective data preparation and ensuring data quality. In the next lesson, we will delve deeper into advanced transformation techniques.