# Transformations in Action

In this lesson, we will explore how to apply transformations in ETL jobs, analyze their impact on data quality, and evaluate their performance.

## Learning Objectives
- Demonstrate the application of transformations in ETL jobs.
- Analyze the impact of transformations on data quality.
- Evaluate transformation performance.

## Why This Matters

Effective application of transformations is key to successful ETL processes and data preparation. Ensuring data quality is critical for accurate analysis and decision-making.

## Applying Transformations in ETL
Transformations are operations that modify data as it moves from source to destination in an ETL process. This can include filtering, aggregating, and joining data.

In [None]:
# Example of applying a transformation: Filtering data
# Let's assume we have a DataFrame 'data' with a column 'age'.
filtered_data = data[data['age'] > 18]  # Keep only rows where age is greater than 18
print(filtered_data)

### Micro-Exercise: Applying Transformations
Discuss how transformations can affect data quality. Consider the following transformations: filtering, joining, and aggregating.

## Data Quality Considerations
Data quality refers to the condition of data based on factors such as accuracy, completeness, reliability, and relevance. Transformations can significantly impact these factors.

In [None]:
# Example of checking data quality after transformation
# Let's assume 'cleaned_data' is the result of a transformation.
# We will check for null values and duplicates.
null_count = cleaned_data.isnull().sum()
duplicate_count = cleaned_data.duplicated().sum()
print(f'Null values: {null_count}, Duplicates: {duplicate_count}')

### Micro-Exercise: Data Quality Evaluation
Evaluate the performance of a specific transformation in an ETL job. Analyze the execution time and resource usage of the following transformation: data.join().

## Examples
### Example 1: Data Cleansing
This example demonstrates how to remove duplicates and correct formatting issues in a dataset.

In [None]:
# Example code for data cleansing
cleaned_data = data.drop_duplicates().apply(lambda x: x.strip() if isinstance(x, str) else x)
print(cleaned_data)

### Example 2: Data Aggregation
This example shows how to aggregate sales data by region to analyze performance.

In [None]:
# Example code for data aggregation
aggregated_data = sales_data.groupby('region').sum()
print(aggregated_data)

## Micro-Exercises
1. Discuss how transformations can affect data quality.
2. Evaluate the performance of a specific transformation in an ETL job.

## Main Exercise: Transforming Data for Analytics
Select a dataset and apply various transformations to improve its quality. Document the impact of these transformations on data quality.

In [None]:
# Starter code to start transformations
transformed_data = original_data.copy()
# Apply transformations here
transformed_data = transformed_data.drop_duplicates()  # Example transformation
# Document the changes made
print(transformed_data)

## Common Mistakes
- Overlooking data quality checks after transformations.
- Failing to document the transformation process and its impact.

## Recap & Next Steps
In this lesson, we learned about applying transformations in ETL processes, the importance of data quality, and how to evaluate transformation performance. Next, we will explore more advanced transformation techniques and their applications.