# How to Develop CEL Filter Expressions for Cloud Custodian (C7N)

See https://github.com/google/cel-spec

See https://github.com/cloud-custodian/cloud-custodian/issues/5759 for the Cloud Custodian rationale for including CEL as a replacement for the filter language.

We want to move from filter text in C7N DSL to CEL filter text.

## Goal

Here's the target state.

```
policies:
   - name: compute-check
      resource: gcp.instance
      filters:
        - type: cel
           expr: |
               Resource.creationTimestamp < timestamp("2018-08-03T16:00:00-07:00") &&
               Resource.deleteProtection == false &&
               ((Resource.name.startsWith(
                   "projects/project-123/zones/us-east1-b/instances/dev") ||
               (Resource.name.startsWith(
                   "projects/project-123/zones/us-east1-b/instances/prod"))) &&
               Resource.instanceSize == "m1.standard")
```

We've replaced a legacy YAML filter expression with a CEL expression using easier-to-read logic and comparison operators. 

C7n provides several global objects to the CEL engine:

- `Resource` is the cloud resource JSON document.

- `Now` is the current time (not used in this example.)

- `Event` is the (optional) state change event from the cloud.

## Building an IDE

There are a few things we need to test and debug a CEL expression.

1. A CEL engine.
2. Some Resource objects to test against.
3. A way to run the CEL engine against the Resources.

## The IDE CEL Engine

There are several steps to creating and evaluating a CEL expression.

1. Create an environment.
2. Parse the expression.
3. Build a "program" from the expression and any additional functions required.
4. Evaluate the program with the variable bindings.

We're often happiest creating a mock `CELFilter` class we can use in a notebook to develop and test.
We might wind up using some additional things here, but for now, this seems most helpful.


In [18]:
import celpy
from typing import Dict, Any

class CELFilter:
    decls = {
        "Resource": celpy.celtypes.MapType,
        "Now": celpy.celtypes.TimestampType,
    }

    def __init__(self, expr: str) -> None:
        env = celpy.Environment(annotations=CELFilter.decls)
        ast = env.compile(expr)
        self.functions = {}  # c7nlib.FUNCTIONS may need to be mocked to help develop or debug.
        self.prgm = env.program(ast, self.functions)
        
    def process(self, resource: celpy.celtypes.Value, now: str) -> bool:
        activation = {
            "Resource": resource,
            "Now": celpy.celtypes.TimestampType(now),
        }
        return self.prgm.evaluate(activation)

In [19]:
my_filter = CELFilter(
"""
resource.creationTimestamp < timestamp("2018-08-03T16:00:00-07:00") &&
resource.deleteProtection == false &&
((resource.name.startsWith(
   "projects/project-123/zones/us-east1-b/instances/dev") ||
(resource.name.startsWith(
   "projects/project-123/zones/us-east1-b/instances/prod"))) &&
resource.instanceSize == "m1.standard")
"""
)

In [20]:
example_1_doc = {
    "creationTimestamp": "2018-07-06T05:04:03Z",
    "deleteProtection": False,
    "name": "projects/project-123/zones/us-east1-b/instances/dev/ec2",
    "instanceSize": "m1.standard",
}

In [21]:
my_filter.process(celpy.json_to_cel(example_1_doc), "2018-08-04T08:00:00Z")

BoolType(False)

Hm. It seemed like it should have been True. 

## JSON Conversion

We have a handy JSON -> CEL function available. The subtlety is that it doesn't know what's supposed to be a timestamp.

In [5]:
import json
document = json.loads(
"""
{
    "creationTimestamp": "2018-07-06T05:04:03Z",
    "deleteProtection": false,
    "name": "projects/project-123/zones/us-east1-b/instances/dev/ec2",
    "instanceSize": "m1.standard"
}
"""
)

example_2_doc = celpy.json_to_cel(document)
example_2_doc

MapType({StringType('creationTimestamp'): StringType('2018-07-06T05:04:03Z'), StringType('deleteProtection'): BoolType(False), StringType('name'): StringType('projects/project-123/zones/us-east1-b/instances/dev/ec2'), StringType('instanceSize'): StringType('m1.standard')})

In [6]:
my_filter.process(example_2_doc, "2018-08-04T08:00:00Z")

BoolType(False)

We often have a problem with resources not the right data type.
In this case, we're comparing strings with timestamps, which is (effectively) False

We have some choices:

- Conversion in CEL. This is robust and clear.

- Conversion of the source document before CEL evaluation. This can depend on C7N integration features. This (in turn) requires a careful definition of the source schema for the JSON in order to perform the conversions. This seems fraught with potential complexities.

## Conversion in CEL

We can convert input strings to more useful CEL types explicitly. 

Instead of `Resource.creationTimestamp`, we use `timestamp(Resource.creationTimestamp)`. 

In [16]:
CEL2 = """
timestamp(Resource.creationTimestamp) < timestamp("2018-08-03T16:00:00-07:00") &&
! Resource.deleteProtection &&
((Resource.name.startsWith(
   "projects/project-123/zones/us-east1-b/instances/dev") ||
(Resource.name.startsWith(
   "projects/project-123/zones/us-east1-b/instances/prod"))) &&
Resource.instanceSize == "m1.standard")
"""
my_filter_2 = CELFilter(CEL2)

In [17]:
my_filter_2.process(example_2_doc, "2018-08-04T08:00:00Z")

BoolType(True)

Yay! 

Let's review and see how this changed things?

## Digging into details

We have a document, `example_2_doc`. Let's create a `now` object to work with, also. 

We can evaluate different sub-expressions with our document and now value.

In [9]:
now = "2018-08-04T08:00:00Z"

In [15]:
CELFilter("Resource.creationTimestamp").process(example_2_doc, now)

StringType('2018-07-06T05:04:03Z')

Ah. It's a string. We needed to make it a timestamp.

In [10]:
CELFilter("timestamp(Resource.creationTimestamp)").process(example_2_doc, now)

TimestampType('2018-07-06T05:04:03Z')

In [11]:
CELFilter("Resource.deleteProtection").process(example_2_doc, now)

BoolType(False)

In [12]:
CELFilter("""
(Resource.name.startsWith(
   "projects/project-123/zones/us-east1-b/instances/dev") ||
(Resource.name.startsWith(
   "projects/project-123/zones/us-east1-b/instances/prod")))
""").process(example_2_doc, now)

BoolType(True)

## Visibility via logger

We can enable logging. In a notebook, we have to be careful because the log lines will go to the notebook log if we're not careful. We want to have our own handlers to capture the output in a separate file.

And, yes, this can be **verbose**. Suggestions are welcome.

This is generally not recommended unless you suspect you've found a bug in CEL or c7nlib.

In [13]:
import logging
logging.basicConfig()
logging.getLogger('').setLevel(logging.WARNING)
# logging.getLogger('').setLevel(logging.INFO)  # kind of loud.

In [14]:
CELFilter("Resource.deleteProtection").process(example_2_doc, now)

BoolType(False)

## Summary

Creating a CEL filter means bringing two things together:

- CEL text 

- External library functions.

Processing a CEL filter means applying the filter against a resource:

- The CELFilter instance is applied to a resource to compute a filter result (True or False)

It helps to build a small `CELFilter` placeholder to help us design and debug CEL expressions.

Details vary slightly, it's difficult to postulate a single, standard design.

We can then provide one or more document examples to determine if the filter works.

We can also evaluate pieces and parts of the overall filter expression to determine if the document is processed correctly.