# Infrastructure as Code with Terraform

## 1. Terraform introduction

What is Infrastructure as Code with Terraform?

Infrastructure as code (IaC) tools allow you to manage infrastructure with configuration files rather than through a graphical user interface. IaC allows you to build, change, and manage your infrastructure in a safe, consistent, and repeatable way by defining resource configurations that you can version, reuse, and share.

With Terraform, you describe your desired infrastructure in a declarative language, which looks like plain text files. These files contain the "code" that represents your infrastructure blueprint. It's like creating a recipe for your infrastructure.

When you run Terraform, it reads these code files and communicates with the cloud provider or infrastructure platform of your choice (like AWS, Azure, Google Cloud, etc.). Terraform then automatically sets up and manages the required resources to match the configuration you defined in your code.

### 1.1. Target configuration

- Setting up a stream-based pipeline infrastructure in AWS, using Terraform
- Project infrastructure modules (AWS): 
    1. Kinesis Streams (Producer & Consumer)
    2. Lambda (Serving API)
    3. Cloud Watch event (trigger Lambda whenever an event arrives in the Producer Kinesis stream)
    4. S3 Bucket (Model artifacts)
    5. ECR (Image Registry)


![title](images/AWS-stream-pipeline.png)

### 1.2. Install Terraform and AWS configuration

1. To install Terraform in Linux follow the steps described in this [link](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli).
2. To set up the AWS configuration follow these [steps](https://github.com/DataTalksClub/mlops-zoomcamp/tree/main/06-best-practices/code#iac) (scroll down to IaC section)

## 2 Terraform: Modules and outputs variables

In Terraform, modules are a way to organize and package reusable pieces of infrastructure code. Think of modules as functions in programming – they abstract complex logic and allow you to call them wherever needed, promoting code reusability and maintainability.

Modules have input and output variables, which enable them to be flexible and customizable. When you use a module, you can provide input values specific to your use case, and the module returns output values that you can use elsewhere in your Terraform configuration.

We create the folder ```code/infrastructure``` with two files:

- ```main.tf``` : main configuration file where you define the actual resources (virtual machines, networks, databases, etc)
- ```variables.tf```: used to declare input variables that can be used throughout your Terraform configuration.

### 2.1 ```main.tf```

```python
# --> Backend Configuration block
terraform {
  # Terraform Version Requirement
  required_version = ">= 1.0"
  # where Terraform should store its state file (Amazon S3)
  backend "s3" {
    # name of the S3 bucket where the state file will be stored
    bucket  = "tf-state-mlops-zoomcamp"
    # filename for the state file within the S3 bucket
    key     = "mlops-zoomcamp-stg.tfstate"
    # AWS region where the S3 bucket is located
    region  = "eu-west-1"
    # state file is encrypted for better security
    encrypt = true
  }
}

# --> AWS Provider Configuration block
provider "aws" {
  # defined in a separate variables.tf
  region = var.aws_region
}

# --> AWS Plugin block
# retrieves the current AWS caller ID
# The result is stored in a data object called current_identity
data "aws_caller_identity" "current_identity" {}

# assigns the AWS account ID to a local variable
locals {
  # Equivalent to accessing: Class.object_instance.instance_property
  account_id = data.aws_caller_identity.current_identity.account_id
}



```
*Notes*:
- You need to create your state S3 bucket manually beforehand (```tf-state-mlops-zoomcamp```), as Terraform won't create it for you 
- The provider block allows to install plugins from an official library, in this case AWS
- Local variables' scope is ```main.tf``` whereas variables in ```variables.tf``` can be used in the context of the entire project
- There will be different state files depending on the stages of the project (development, production, CI/CD,...). A suffix will determine the particular stage.

### 2.2 ```variables.tf```

```python
# This block defines a variable called aws_region
variable "aws_region" {
  # brief description of the variable's purpose
  description = "AWS region to create resources"
  # If no value is given to aws_region, it default to "eu-west-1"
  default     = "eu-west-1"
}

variable "project_id" {
  description = "project_id"
  default = "mlops-zoomcamp"
}

# Kinesis input stream
variable "source_stream_name" {
  description = ""
}

# Kinesis output stream
variable "output_stream_name" {
  description = ""
}

variable "model_bucket" {
  description = "s3_bucket"
}

variable "lambda_function_local_path" {
  description = ""
}

variable "docker_image_local_path" {
  description = ""
}

variable "ecr_repo_name" {
  description = ""
}

variable "lambda_function_name" {
  description = ""
}

```

### 2.3 Modules

Let's create custom modules in ```code/infrastructure/modules``` for:
1. Kinesis, ```kinesis/```
2. S3 bucket, ```s3/```
3. ECR, ```ecr/```
4. Lambda, ```lambda/```


The Cloud Watch event will be created within Lambda module. 

### 2.3.1 Kinesis module
We will start working on Kinesis by creating two files:
1. ```code/infrastructure/modules/kinesis/```
    - ```main.tf```
        ```python
            # --> AWS Kinesis Data Stream resource (server, application,...)
            resource "aws_kinesis_stream" "stream" {
              # Variables defined in kinesis/variables.tf
              name             = var.stream_name
              # number of shards in the Kinesis stream.
              shard_count      = var.shard_count
              # retention period of data in the Kinesis stream
              retention_period = var.retention_period
              # shard-level metrics to f monitoring the Kinesis stream 
              shard_level_metrics = var.shard_level_metrics
              # specify tags for the Kinesis stream
              tags = {
                CreatedBy = var.tags
              }
            }
        
            # --> Output variable Block (could have been a separate output.tf)
            output "stream_arn" {
              # exposes the ARN (Amazon Resource Name) of the Kinesis stream 
              value = aws_kinesis_stream.stream.arn
            }

        ```
    - ```variables.tf```

#### Producer stream

We can now add the following code to the ```infrastructure/main.tf``` file as shown below:

```python
# ... previous code

# Create Kinesis Data Stream for ride events
module "source_kinesis_stream" {
  source = "./modules/kinesis"
  retention_period = 48
  shard_count = 2
  stream_name = "${var.source_stream_name}-${var.project_id}"
  tags = var.project_id
}
```
You are instantiating the Kinesis module (```infrastructure/modules/kinesis```) to create a Kinesis Data Stream for ride events. The module will be instantiated using the configuration provided in the Kinesis module's ```main.tf``` and ```variables.tf```. If not found there, it will search in the parent directory ```infrastructure/variables.tf```

Now we can test that everything so far is working by running (from ```code/infrastructure/```):

```
$ terraform init
```
which will initialize Terraform by downloading the AWS provider and installing it (in ```.terraform```). Next we run:

```
$ terraform plan
```
which describes your execution plan (which actions Terraform will take in order to build the infrastructure to match the configuration). 

- It will ask for the ```source_stream_name``` value, as it hasn't been configured yet. Give the name ```ride-events```

Finally we can run:

```
$ terraform apply
```
It will ask for the ```source_stream_name``` again. Now you are generating the actual infrastructure on your AWS cloud. You will need to confirm this action.

Now we can confirm that it's created by checking the AWS cloud console.

- Note: This infrastructure that we have created will not charge you unless you are using them (they remain idle). However, make sure to destroy your Terraform resources after you are done to avoid any unnecessary costs:

```
$ terraform destroy
```

#### Consumer stream

We add the following code to the ```infrastructure/main.tf```:

```python

# ... previous code

# Producer stream
# ...

# Create Kinesis Data Stream for ride predictions
module "output_kinesis_stream" {
  source = "./modules/kinesis"
  retention_period = 48
  shard_count = 2
  stream_name = "${var.output_stream_name}-${var.project_id}"
  tags = var.project_id
}
```
You are instantiating another Kinesis module (infrastructure/modules/kinesis) to create a Kinesis Data Stream for ride predicitons. We have a pair of Kinesis streams now.

### 2.3.2. S3 module

We will now create a S3 bucket for model artifacts:

2. ```code/infrastructure/modules/s3/```
    - ```main.tf```
        ```python
           resource "aws_s3_bucket" "s3_bucket" {
             bucket = var.bucket_name
             # only the bucket owner has access to the bucket
             acl    = "private"
             # bucket can be destroyed even if it contains objects
             force_destroy = true
           }
           
           # bucket name to be exposed as output
           output "name" {
             value = aws_s3_bucket.s3_bucket.bucket
           }
        ```
    - ```variables.tf```

and update ```infrastructure/main.tf```:

```python

# ... previous code

# Producer stream
# ...

# Consumer stream
# ...

# model bucket
module "s3_bucket" {
  source = "./modules/s3"
  bucket_name = "${var.model_bucket}-${var.project_id}"
}
```

Now you are instantiating the S3 bucket module (```./modules/s3```) to create an AWS S3 bucket for storing models. 

- Note: this S3 is created **during** our Terraform project. It's not the same S3 bucket that we created in the beginning to store our Terraform state.

When executing Terraform (as shown above), it will ask for a value for ```s3_bucket```. You can give the name ```mlflow-models```

### 2.3.3 ECR module

Terraform builds all modules in parallel. However, we want to create our Kinesis streams, S3 bucket, and ECR (not implemented yet) before Lambda. Otherwise Lambda may not find
- the S3 bucket to take the model from
- the Kinesis stream to take the source event from.
- the docker image to run on (from the ECR registry)

Thus, while building the docker image registry (ECR) module:

3. ```code/infrastructure/modules/ecr/```
    - ```main.tf```
        ```python
           # --> AWS Elastic Container Registry (ECR) repository resource
           resource "aws_ecr_repository" "repo" {
              name                 = var.ecr_repo_name
              # image tags can be updated or overwritten
              image_tag_mutability = "MUTABLE"
              # image scanning will not be performed automatically when 
              # new images are pushed to the repository
              image_scanning_configuration {
                scan_on_push = false
              }
              # ECR repository to be destroyed even if it contains images
              force_delete = true
            }
        
            # In practice, the Image build-and-push step is handled
            # separately by the CI/CD pipeline and not the IaC script.
            # But because the lambda config would fail without an
            # existing Image URI in ECR, we can also upload any base
            # image to bootstrapthe lambda config, unrelated to your
            # Inference logic
            # --> The null_resource ensures that the Docker image is available
            # --> in ECR before the rest of the infrastructure, such as the 
            # --> Lambda function, is provisioned, preventing errors due to 
            # --> missing image URIs in ECR.
            resource null_resource ecr_image {
               # The Docker image will be rebuilt and pushed to ECR when
               # the associated files change
               triggers = {
                 python_file = md5(file(var.lambda_function_local_path))
                 docker_file = md5(file(var.docker_image_local_path))
               }
               # Executes a local shell command when this null_resource 
               # is created or updated
               provisioner "local-exec" {
                 # multi-line shell script
                 command = <<EOF
                         # logs into the ECR repository
                         aws ecr get-login-password # ... 
                             | docker login # ...
                         # Changes the working directory to parent directory 
                         cd ../
                         # builds a Docker image
                         docker build -t # ...
                         # Pushes the Docker image to the ECR repository
                         docker push # ...
                     EOF
               }
            }

            # This block creates the lambda_image (Lambda) to retrieve information
            # about the ECR image
            data aws_ecr_image lambda_image {
             # Terraform waits for the image to be uploaded to ECR  
             # before lambda config runs.
             depends_on = [
               null_resource.ecr_image
             ]
             repository_name = var.ecr_repo_name
             image_tag       = var.ecr_image_tag
            }
            
            # output that exposes the full URI of the uploaded ECR image
            output "image_uri" {
              value     =  # ...
            }
            
        ```       
            
    - ```variables.tf```
    

- Notes : the ```null_resource``` is using the Dockerfile and ```lambda_function.py``` defined in ```Part-B/code/```

Now we can update our baseline ```infrastructure/main.tf```:

```python

# ... previous code

# Producer stream
# ...

# Consumer stream
# ...

# S3 bucket
# ...

# image registry
module "ecr_image" {
   source = "./modules/ecr"
   ecr_repo_name = "${var.ecr_repo_name}_${var.project_id}"
   account_id = local.account_id
   lambda_function_local_path = var.lambda_function_local_path
   docker_image_local_path = var.docker_image_local_path
}
```

You are instantiating the ECR image module (```./modules/ecr```) to create an AWS ECR repository and push a Docker image to it.

#### Create environment variables for runtime

Now when executing Terraform many runtime variables are required (especially due to the Docker commands that ECR brings). Instead of introducing them manually, we will create environment variables for runtime:

   1. Create a new directory ```infrastructure/vars```
   2. We create two files
       - Stage : ```stg.tfvars```
       - Production: ```prod.tfvars```
   3. These files will hold the environment variables for stage and production phases that are required during runtime
   4. Now when executing Terraform from ```code/infrastructure/```:
        - ```$ terraform init```
        - ```$ terraform plan -var-file=vars/stg.tfvars```
        - ```$ terraform apply -var-file=vars/stg.tfvars```

### 2.3.4 Lambda module

We now build the Lambda module:

4. ```code/infrastructure/modules/lambda/```
    - ```main.tf```
        ```python
            # This block creates an AWS Lambda function resource
            resource "aws_lambda_function" "kinesis_lambda" {
              function_name = var.lambda_function_name
              # This can also be any base image to bootstrap the lambda config,
              # unrelated to your Inference service on ECR
              # which would be anyway updated regularly via a CI/CD pipeline
                
              # Docker image URI to be used for the Lambda function
              image_uri     = var.image_uri   # required-argument
              # indicate that the Lambda function will use a container image
              package_type  = "Image"
              # defined in 'iam.tf'
              role          = aws_iam_role.iam_lambda.arn
              # enable active tracing for the Lambda function
              tracing_config {
                mode = "Active"
              }
              # Define environment variables for the Lambda function (optional)
              environment {
                variables = {
                  PREDICTIONS_STREAM_NAME = var.output_stream_name
                  MODEL_BUCKET = var.model_bucket
                }
              }
              # Max time (in seconds) to evaluate Lambda
              timeout = 180
            }

            # AWS Lambda event invoke configuration resource
            resource "aws_lambda_function_event_invoke_config" "kinesis_lambda_event" {
              function_name = aws_lambda_function.kinesis_lambda.function_name
              # maximum age of an event (in seconds) that the Lambda 
              # function can invoke
              maximum_event_age_in_seconds = 60
              # there should be no retries if the Lambda function invocation fails
              maximum_retry_attempts       = 0
            }
            
            # AWS Lambda event mapping resource (to producer stream)
            resource "aws_lambda_event_source_mapping" "kinesis_mapping" {
              # specifies the ARN of the Kinesis stream to which the 
              # Lambda function will be mapped
              event_source_arn  = var.source_stream_arn
              function_name     = aws_lambda_function.kinesis_lambda.arn
              # Lambda function should start processing records from the most
              # recent records in the Kinesis stream
              starting_position = "LATEST"
              # This ensures that the necessary IAM role policy attachment 
              # is in place before creating the Lambda event source mapping
              depends_on = [
                aws_iam_role_policy_attachment.kinesis_processing
              ]
              # enabled           = var.lambda_event_source_mapping_enabled
              # batch_size        = var.lambda_event_source_mapping_batch_size
            }
        ```
     - ```variables.tf```
    

Now we can update our baseline ```infrastructure/main.tf```:

```python

# ... previous code

# Producer stream
# ...

# Consumer stream
# ...

# S3 bucket
# ...

# ECR
# ...

module "lambda_function" {
  source = "./modules/lambda"
  # This specifies the Docker container image URI stored in the Elastic
  # Container Registry (ECR). The Lambda function will use this image 
  # to run the code.
  image_uri = module.ecr_image.image_uri
  # creates a unique name for the Lambda function
  lambda_function_name = "${var.lambda_function_name}_${var.project_id}"
  # It specifies the name of the S3 bucket where the Lambda function can
  # access the machine learning model.
  model_bucket = module.s3_bucket.name
  # creates a unique name for the output Kinesis stream 
  output_stream_name = "${var.output_stream_name}-${var.project_id}"
  # specifies the Amazon Resource Name (ARN) of the output Kinesis stream
  output_stream_arn = module.output_kinesis_stream.stream_arn
  source_stream_name = "${var.source_stream_name}-${var.project_id}"
  source_stream_arn = module.source_kinesis_stream.stream_arn
}

```

You are defining an AWS Lambda function to consume data from a Kinesis stream using an ECR container image, as well as the necessary event source mapping to connect the Lambda function with the Kinesis stream.

#### Configuring Lambda dependencies

The lambda module has a lot of dependencies wrt Kinesis, S3 and ECR. Therefore, we need to define various AWS Identity and Access Management (IAM) resources required for the Lambda function to interact with other AWS services such as Kinesis, CloudWatch, and S3. This ensures that your Lambda function has the necessary access to perform its intended operations on these services:

1. Create ```lambda/iam.tf``` file.
    - The first blocks connect lambda with Kinesis:
    ```python
       # --> This block creates an AWS IAM Role Resource
       resource "aws_iam_role" "iam_lambda" {
          # Creates a unique IAM role name based on the Lambda
          name = "iam_${var.lambda_function_name}"
          # defines which AWS services are allowed to assume this role.
          # In this case, it allows both Lambda and Kinesis services to
          # assume the role.
          assume_role_policy = 
          # ...
        }
        
        # --> This block creates an IAM policy resource for Kinesis
        resource "aws_iam_policy" "allow_kinesis_processing" {
          # Allows Kinesis processing based on the Lambda function's name
          name        = "allow_kinesis_processing_${var.lambda_function_name}"
          path        = "/"
          description = "IAM policy for logging from a lambda"
          
          # Specifies the permissions granted to the IAM role related 
          # to Kinesis. In this case, it allows various Kinesis-related
          # actions on all Kinesis resources
          policy = 
            # ...
        }
        
        # --> This block grants the IAM role the permissions defined in the 
        #     IAM policy
        resource "aws_iam_role_policy_attachment" "kinesis_processing" {
          role       = aws_iam_role.iam_lambda.name
          policy_arn = aws_iam_policy.allow_kinesis_processing.arn
        }
        
        # --> This block creates an inline IAM policy resource
        resource "aws_iam_role_policy" "inline_lambda_policy" {
          name       = "LambdaInlinePolicy"
          role       = aws_iam_role.iam_lambda.id
          # ensures that this policy is created after the IAM role
          depends_on = [aws_iam_role.iam_lambda]
          # defines the inline policy's permissions. In this case, it grants
          # the Lambda function permission to put records to the Kinesis stream
          policy     = 
            # ...
        }
        
    ```

- The next blocks set up CloudWatch logging:

```python
        # --> This block creates an AWS Lambda permission for CloudWatch
        #     (to trigger Lambda)
        resource "aws_lambda_permission" "allow_cloudwatch_to_trigger_lambda_function" 
        {
          statement_id  = "AllowExecutionFromCloudWatch"
          action        = "lambda:InvokeFunction"
          function_name = aws_lambda_function.kinesis_lambda.function_name
          # indicates that the permission is granted to the CloudWatch 
          # Events service.
          principal     = "events.amazonaws.com"
          # This attribute specifies the ARN of the event source (Kinesis stream)
          # that triggers the Lambda function.
          source_arn    = var.source_stream_arn
        }
        
        # --> This block creates an IAM policy for Logging
        resource "aws_iam_policy" "allow_logging" {
          name        = "allow_logging_${var.lambda_function_name}"
          path        = "/"
          description = "IAM policy for logging from a lambda"
          # specifies the permissions granted to the IAM role for logging
          # purposes. In this case, it allows actions related to CloudWatch Logs
          policy = 
            # ...
        }
        
        # --> This block grants the IAM role the permissions defined 
        #     in the IAM policy for logging
        resource "aws_iam_role_policy_attachment" "lambda_logs" {
          role       = aws_iam_role.iam_lambda.name
          policy_arn = aws_iam_policy.allow_logging.arn
        }
        
```

- And finally permissions for the S3 bucket

```python
        
        # --> This block creates an IAM policy for S3
        resource "aws_iam_policy" "lambda_s3_role_policy" {
          name = "lambda_s3_policy_${var.lambda_function_name}"
          description = "IAM Policy for s3"
          # it allows various S3-related actions, including listing 
          # buckets, accessing bucket locations, and performing actions
          # on the specified s3 bucket and its contents
        policy = 
          # ...
        }
        
        # --> This block grants the IAM role the permissions defined 
        #     in the IAM policy for S3 actions.
        resource "aws_iam_role_policy_attachment" "iam-policy-attach" {
          role       = aws_iam_role.iam_lambda.name
          policy_arn = aws_iam_policy.lambda_s3_role_policy.arn
        }

```

## 3 Terraform: putting everything together

![title](images/terraform.png)

Now we have the entire pipeline deployed on AWS using Terraform. Let's try to execute the entire infrastructure:

- Insert ride event record into our input Kinesis stream (Producer stream)
- This will trigger the Lambda service
- A version of the ML model (artifact) will be taken from s3 bucket
- Ride predictions (from the logic of the docker image in ECR) will be generated 
- Publish these predictions into the output Kinesis stream (consumer stream)

Therefore (from ```code/infrastructure```):

1. Create infrastructure
```bash
# Initialize state file (.tfstate)
$ terraform init
```
```bash
# Check changes to new infra plan
$ terraform plan -var-file=vars/stg.tfvars
```
```bash
# Create new infra
$ terraform apply -var-file=vars/stg.tfvars
```

2. Additionally, we need to define some environmental variables for Lambda:

    ![title](images/clip.png)
    
    - Output Kinesis stream (to write the records)
    - S3 bucket to take the model artifacts from
    - Run ID (the version of the model it's supposed to take)
    

   The script taking care of this is ```code/scripts/deploy_manual.sh```

   We can now run the script:

   ```bash
   $ ./deploy_manual.sh
   ```
It's copying the artifacts into our production bucket and updating the Lambda with the aforementioned env variables.





3. Now let's try to put a record into our Kinesis stream: 
![title](images/clip2.png)
```bash
$ export KINESIS_STREAM_INPUT="stg_ride_events-mlops-zoomcamp"
$ aws kinesis put-record  \
        --stream-name ${KINESIS_STREAM_INPUT}   \
        --partition-key 1  --cli-binary-format raw-in-base64-out  \
        --data '{"ride": {
                    "PULocationID": 130,
                    "DOLocationID": 205,
                    "trip_distance": 3.66
                    },
                 "ride_id": 156}'
```
  and you should get a ```ShardId``` and ```SequenceNumber``` as output. You can also check on CloudWatch logs.

4. Destroy infrastructure after use (from ```code/infrastructure```)::

```bash
# Delete infra after your work, to avoid costs on any running services
$ terraform destroy
```