# Cloud & DevOps for Data Engineering: Exercise


## 1. Docker Practice
- Create a Dockerfile to containerize a basic Python script (e.g., prints "Hello, Docker!").
- Build the Docker image using your Dockerfile.
- Run a container from your image and confirm that the expected output is displayed.
- Briefly explain how containerization benefits data engineering workflows.


```dockerfile
# Dockerfile
# Your code here

# Build: docker build -t myapp .
# Run: docker run myapp
```



---
## 2. CI/CD Pipeline
- Build a GitHub Actions workflow that automatically runs your project’s test suite on every push and pull request.
- The workflow should use a suitable environment (e.g., Python, Node, etc.), install dependencies, and execute your tests.
- Ensure that failed tests prevent merging, promoting code quality through continuous integration.


```yaml
# yaml file

# .github/workflows/ci.yml
# Your code here
```



---
## 3. Serverless Function (AWS Lambda)
- Develop a Python AWS Lambda handler that processes a simple event (e.g., returns a greeting or echoes input).
- Demonstrate how to test the Lambda function locally using AWS SAM CLI (or a similar tool).
- Explain the key components of the handler (event, context) and best practices for local testing before deploying to AWS.


In [None]:
# Your code here

---
## 4. Infrastructure as Code (IaC)

- Use Terraform to define and provision cloud resources in a declarative way.
- Task: Write a Terraform configuration file that creates an S3 bucket in AWS.
- Use a test AWS account and choose a globally unique bucket name.
- This exercise helps you practice infrastructure automation and version control for cloud resources.



```hcl
# hcl file

# Your code here
```


---

### Challenge
- Implement cloud-based monitoring for your data pipeline.
- Choose either AWS CloudWatch or GCP Stackdriver for this task.
- Set up logging to capture pipeline events and errors.
- Configure at least one alert/notification (alarm) for pipeline failures or anomalies.
- Briefly document your setup and explain how you would use these tools to troubleshoot and ensure pipeline reliability.


#### Your code here