# Stream Filtering

Experiment: can we use Kinesis data analytics to filter the records written to one stream to populate other streams? Or more accurately, how do we do this, and what's the latency for a record written to the main stream to hit the filtered stream?

## Setup

First, we need some streams

In [None]:
import boto3

kinesis_client = boto3.client('kinesis')

In [None]:
# Create some streams
main_stream_response = kinesis_client.create_stream(
    StreamName='main', 
    ShardCount = 1)

In [None]:
kinesis_client.describe_stream(StreamName='main')

In [None]:
kinesis_client.create_stream(StreamName='filtered', ShardCount=1)

In [None]:
kinesis_client.describe_stream(StreamName='filtered')

In [None]:
from datetime import datetime, timezone

def timestamp():
    the_time = datetime.now(timezone.utc)
    return the_time.isoformat()

## Stream Write

In [None]:
import uuid

event = {
    "specversion":"1.0",
    "type":"newFoo",
    "source":"foo",
    "id":str(uuid.uuid4()),
    "time":timestamp(),
    "data":{"fooaddr":"foostuffval",
           "foolist": [1,2,3],
           "barobj": {
               "baraatr1":"yes",
               "barattr2":False,
               "barattr3":122.22
           }}
}

In [None]:
event['source']

In [None]:
import json

prr = kinesis_client.put_record(
    StreamName='main',
    Data=json.dumps(event).encode(),
    PartitionKey=event['source']
)

In [None]:
prr

## Analytics App

manual experiment

* write structured records to stream
* capture in schema, filter to output stream
* show they make it
* show non-conformant errors are ignore

Discovered schema:

```
'RecordColumns': [{'Name': 'specversion',
         'Mapping': '$.specversion',
         'SqlType': 'DECIMAL(1,1)'},
        {'Name': 'type', 'Mapping': '$.type', 'SqlType': 'VARCHAR(8)'},
        {'Name': 'source', 'Mapping': '$.source', 'SqlType': 'VARCHAR(4)'},
        {'Name': 'id', 'Mapping': '$.id', 'SqlType': 'VARCHAR(64)'},
        {'Name': 'COL_time', 'Mapping': '$.time', 'SqlType': 'VARCHAR(32)'},
        {'Name': 'fooaddr',
         'Mapping': '$.data.fooaddr',
         'SqlType': 'VARCHAR(16)'},
        {'Name': 'foolist',
         'Mapping': '$.data.foolist[0:]',
         'SqlType': 'VARCHAR(8)'},
        {'Name': 'baraatr1',
         'Mapping': '$.data.barobj.baraatr1',
         'SqlType': 'VARCHAR(4)'},
        {'Name': 'barattr2',
         'Mapping': '$.data.barobj.barattr2',
         'SqlType': 'BOOLEAN'},
        {'Name': 'barattr3',
         'Mapping': '$.data.barobj.barattr3',
         'SqlType': 'DECIMAL(5,2)'}]},
```


```sql
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (specversion DECIMAL(1,1), source VARCHAR(4), type VARCHAR(4), id VARCHAR(64), COL_time VARCHAR(32), fooaddr VARCHAR(16), HMMMMM DO I NEED TO HAVE PARENT/CHILD TABLES, e.g. JSON to relational mapping????);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
-- Select all columns from source stream
SELECT STREAM ticker_symbol, sector, change, price
FROM "SOURCE_SQL_STREAM_001"
-- LIKE compares a string to a string pattern (_ matches all char, % matches substring)
-- SIMILAR TO compares string to a regex, may use ESCAPE
WHERE sector SIMILAR TO '%TECH%';
```

In [None]:
ka = boto3.client('kinesisanalyticsv2')

In [None]:
# Dump an existing application

ka.describe_application(
    ApplicationName='sample'
)

In [None]:
# Pump the base cloud events.io schema through the stream to let the schema derivation tool define the schema for us.

seed_event = {
    "specversion":"1.0",
    "type":"newFoo",
    "source":"foo",
    "id":str(uuid.uuid4()),
    "time":timestamp()
}

kinesis_client.put_record(
    StreamName='main',
    Data=json.dumps(seed_event).encode(),
    PartitionKey=event['source']
)

### Sample Describe Application Output

```console
{'ApplicationDetail': {'ApplicationARN': 'arn:aws:kinesisanalytics:us-east-1:111111111111:application/sample',
  'ApplicationDescription': 'sample app',
  'ApplicationName': 'sample',
  'RuntimeEnvironment': 'SQL-1_0',
  'ApplicationStatus': 'READY',
  'ApplicationVersionId': 2,
  'CreateTimestamp': datetime.datetime(2020, 1, 14, 14, 12, tzinfo=tzlocal()),
  'LastUpdateTimestamp': datetime.datetime(2020, 1, 14, 14, 13, 13, tzinfo=tzlocal()),
  'ApplicationConfigurationDescription': {'SqlApplicationConfigurationDescription': {'InputDescriptions': [{'InputId': '2.1',
      'NamePrefix': 'SOURCE_SQL_STREAM',
      'InAppStreamNames': ['SOURCE_SQL_STREAM_001'],
      'KinesisStreamsInputDescription': {'ResourceARN': 'arn:aws:kinesis:us-east-1:111111111111:stream/main',
       'RoleARN': 'arn:aws:iam::111111111111:role/service-role/kinesis-analytics-sample-us-east-1'},
      'InputSchema': {'RecordFormat': {'RecordFormatType': 'JSON',
        'MappingParameters': {'JSONMappingParameters': {'RecordRowPath': '$'}}},
       'RecordEncoding': 'UTF-8',
       'RecordColumns': [{'Name': 'specversion',
         'Mapping': '$.specversion',
         'SqlType': 'DECIMAL(1,1)'},
        {'Name': 'type', 'Mapping': '$.type', 'SqlType': 'VARCHAR(8)'},
        {'Name': 'source', 'Mapping': '$.source', 'SqlType': 'VARCHAR(4)'},
        {'Name': 'id', 'Mapping': '$.id', 'SqlType': 'VARCHAR(64)'},
        {'Name': 'COL_time', 'Mapping': '$.time', 'SqlType': 'VARCHAR(32)'}]},
      'InputParallelism': {'Count': 1},
      'InputStartingPositionConfiguration': {}}]}}},
 'ResponseMetadata': {'RequestId': '90b86041-a805-4479-9ac8-6fd837c418f2',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '90b86041-a805-4479-9ac8-6fd837c418f2',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '1272',
   'date': 'Tue, 14 Jan 2020 22:13:38 GMT'},
  'RetryAttempts': 0}}

```

### Role Policy Outline
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadInputKinesis",
            "Effect": "Allow",
            "Action": [
                "kinesis:DescribeStream",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords"
            ],
            "Resource": [
                "arn:aws:kinesis:us-east-1:111111111111:stream/main"
            ]
        },
        {
            "Sid": "WriteOutputKinesis",
            "Effect": "Allow",
            "Action": [
                "kinesis:DescribeStream",
                "kinesis:PutRecord",
                "kinesis:PutRecords"
            ],
            "Resource": [
                "arn:aws:kinesis:region:account-id:stream/%STREAM_NAME_PLACEHOLDER%"
            ]
        },
        {
            "Sid": "WriteOutputFirehose",
            "Effect": "Allow",
            "Action": [
                "firehose:DescribeDeliveryStream",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:region:account-id:deliverystream/%FIREHOSE_NAME_PLACEHOLDER%"
            ]
        },
        {
            "Sid": "ReadInputFirehose",
            "Effect": "Allow",
            "Action": [
                "firehose:DescribeDeliveryStream",
                "firehose:Get*"
            ],
            "Resource": [
                "arn:aws:firehose:region:account-id:deliverystream/%FIREHOSE_NAME_PLACEHOLDER%"
            ]
        },
        {
            "Sid": "ReadS3ReferenceData",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::kinesis-analytics-placeholder-s3-bucket/kinesis-analytics-placeholder-s3-object"
            ]
        },
        {
            "Sid": "ReadEncryptedInputKinesisStream",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:region:account-id:key/%SOURCE_STREAM_ENCRYPTION_KEY_PLACEHOLDER%"
            ],
            "Condition": {
                "StringEquals": {
                    "kms:ViaService": "kinesis.us-east-1.amazonaws.com"
                },
                "StringLike": {
                    "kms:EncryptionContext:aws:kinesis:arn": "arn:aws:kinesis:us-east-1:111111111111:stream/main"
                }
            }
        },
        {
            "Sid": "WriteEncryptedOutputKinesisStream1",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey"
            ],
            "Resource": [
                "arn:aws:kms:region:account-id:key/%DESTINATION_STREAM_ENCRYPTION_KEY_PLACEHOLDER%"
            ],
            "Condition": {
                "StringEquals": {
                    "kms:ViaService": "kinesis.us-east-1.amazonaws.com"
                },
                "StringLike": {
                    "kms:EncryptionContext:aws:kinesis:arn": "arn:aws:kinesis:region:account-id:stream/%STREAM_NAME_PLACEHOLDER%"
                }
            }
        },
        {
            "Sid": "WriteEncryptedOutputKinesisStream2",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey"
            ],
            "Resource": [
                "arn:aws:kms:region:account-id:key/%DESTINATION_STREAM_ENCRYPTION_KEY_PLACEHOLDER%"
            ],
            "Condition": {
                "StringEquals": {
                    "kms:ViaService": "kinesis.us-east-1.amazonaws.com"
                },
                "StringLike": {
                    "kms:EncryptionContext:aws:kinesis:arn": "arn:aws:kinesis:region:account-id:stream/%STREAM_NAME_PLACEHOLDER%"
                }
            }
        },
        {
            "Sid": "WriteEncryptedOutputKinesisStream3",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey"
            ],
            "Resource": [
                "arn:aws:kms:region:account-id:key/%DESTINATION_STREAM_ENCRYPTION_KEY_PLACEHOLDER%"
            ],
            "Condition": {
                "StringEquals": {
                    "kms:ViaService": "kinesis.us-east-1.amazonaws.com"
                },
                "StringLike": {
                    "kms:EncryptionContext:aws:kinesis:arn": "arn:aws:kinesis:region:account-id:stream/%STREAM_NAME_PLACEHOLDER%"
                }
            }
        },
        {
            "Sid": "UseLambdaFunction",
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction",
                "lambda:GetFunctionConfiguration"
            ],
            "Resource": [
                "arn:aws:lambda:region:account-id:function:%FUNCTION_NAME_PLACEHOLDER%:%FUNCTION_VERSION_PLACEHOLDER%"
            ]
        }
    ]
}
```

In [None]:
# Describe role output

iam = boto3.client('iam')

In [None]:
iam.get_role(RoleName='kinesis-analytics-sample-us-east-1')

In [None]:
iam.list_role_policies(RoleName='kinesis-analytics-sample-us-east-1')

In [None]:
car = ka.create_application(
    AppName = 'Dave',
    ApplicationDescription = 'Dave the wonder app',
    RuntimeEnvironment = 'SQL-1_0',
    ServiceExecutionRole = 'uh-oh'
    # Oh crap how do we specify all this stuff - maybe create one from the console and dump it...
)

## Stream Read

In [None]:
## Read from stream

shardId = prr['ShardId']
print('shard id is %s' % shardId)

gsir = kinesis_client.get_shard_iterator(
    StreamName='main',
    ShardId=shardId,
    ShardIteratorType='TRIM_HORIZON'
)
print(gsir)

In [None]:
## Read from currne position of the iterator
grr = kinesis_client.get_records(
    ShardIterator=gsir['ShardIterator']
)

print(grr)

In [None]:
records = grr['Records']
for r in records:
    print(r)

## Cleanup

In [None]:
kinesis_client.delete_stream(StreamName='main')
kinesis_client.delete_stream(StreamName='filtered')

In [None]:
kinesis_client.list_streams()