# CLI08 - Stream Workers

## Overview

Macrometa GDN allows you to integrate streaming data and take appropriate actions. Most stream processing use cases involve collecting, analyzing, and integrating or acting on data generated during business activities by various sources.

| Stage | Description | 
|  :----:  |    :----:   |
| Collect | Receive or capture data from various data sources.| 
| Analyze | Analyze data to identify interesting patterns and extract information.| 
| Act | Take actions based on the findings. For example, running simple code, calling an external service, or triggering a complex integration.| 
| Integrate | Provide processed data for consumer consumption.| 

You can process streams to perform the following actions with your data:

- Transform data from one format to another. For example, from XML to JSON.
- Enrich data received from a specific source by combining it with databases and services.
- Correlate data by joining multiple streams to create an aggregate stream.
- Clean data by filtering it and by modifying the content in messages. For example, obfuscating sensitive information.
- Derive insights by identifying event patterns in data streams.
- Summarize data with time windows and incremental aggregations.
- Real-time ETL for collections, tailing files, and scraping HTTP endpoints.
- Integrating stream data and trigger actions based on the data. This can be a single service request or a complex enterprise integration flow.

In this tutorial we will build a simple stream worker for finding various heart rate measures like average bpm, minimum bpm etc.

## Pre-requisite

Lets Assume 
- you have already made a tenant account, and have a username and password
- you have installed the Macrometa CLI as explained in section 01
- you have generated an API Key as explained in section 01

In [None]:
npm install -g gdnsl

## 1. Importing Libraries & Define Variables

In [86]:
# Variables - Queries
url="https://gdn.paas.macrometa.io";
apiKey="XXXX"; # apiKey goes here if applicable
email="email"; # Email goes here
# You can either use LOCAL or ALL also Please enter the name of the regions. For multiple regions enter comma-separated names. For example: region1, region2.
regions="ALL"
heart_rates_collection="HeartRates";
heart_rates_statistics_collection="HeartRateStatistics";
heart_rate_statistics_worker="HeartRateStatisticsWorker";
mock_heart_rate_data_generator="MockHeartRateDataGenerator";

## 2. Connecting to GDN

In [None]:
echo "Creating gdnsl.yaml file" 

echo "url: $url
email: $email
apikey: $apiKey
regions:
  - $regions" > gdnsl.yaml

echo "------- CONNECTION SETUP  ------"
# if you are running this from terminal then you can ignore above step and run below command
# gdnsl init

## 3. Creating Collections

In [None]:
echo " ------- CREATE GEO-REPLICATED COLLECTION  ------"
gdnsl collection create $heart_rates_collection --type doc --stream 
echo "Created collection: $heart_rates_collection"
gdnsl collection create $heart_rates_statistics_collection --type doc 
echo "Created collection: $heart_rates_statistics_collection"

## 4. Validate Stream Application

### 4.1 Validating heart rate simulator definition

#### Option1: Use mockaroo api to generate the mock heart rate data 

NOTE: If you are using mockaroo API then you need to signup to mockaroo and paste the API Key. You can find API key here https://www.mockaroo.com/myaccount

In [None]:
mockarooAPIKey="XXXX";

echo "--- Validate $mock_heart_rate_data_generator Application";

gdnsl stream-worker create --name $mock_heart_rate_data_generator \
--description "Mock data generator by calling mockaroo api for heart rate" \
--trigger "HeartRateDataGeneratorTrigger WITH ( interval = 10 sec );" \
--table "HeartRates (name string, bpm int);" \
--sink "MockarooServiceCallSink WITH (type='http-call', sink.id='mockaroo-service', publisher.url='https://api.mockaroo.com/api/a6e130b0?count=10&key=$mockarooAPIKey', map.type='json', method='GET') (triggered_time string);" \
--source "MockarooServiceResponseSink WITH (type='http-call-response', sink.id='mockaroo-service', map.type='json', http.status.code='200') (name string, bpm int);" \
--query "INSERT INTO MockarooServiceCallSink
SELECT time:currentTimestamp() as triggered_time 
FROM HeartRateDataGeneratorTrigger;" \
--query "INSERT INTO HeartRates SELECT name, bpm FROM MockarooServiceResponseSink;" \
--validate

#### Option2: Use custom stream worker to generate the heart rate data 

In [None]:
echo "--- Validate $mock_heart_rate_data_generator Application";

gdnsl stream-worker create --name $mock_heart_rate_data_generator \
--description "Mock data generator by strem worker" \
--trigger "HeartRateDataGeneratorTrigger WITH ( interval = 10 sec );" \
--table "HeartRates (name string, bpm int);" \
--query "@info(name = 'ConsumeProcessedData')
    INSERT INTO HeartRates
    SELECT 
    js:eval(\"['Vasili', 'Rivalee', 'Betty', 'Jennifer', 'Alane', 'Sarena', 'Bruno', 'Carolee', 'Emmott', 'Andre'][Math.floor(Math.random() * 10)]\",\"string\") as name,
    js:eval(\"Math.floor(Math.random() * 40) + 40\",\"int\") as bpm
    FROM HeartRateDataGeneratorTrigger;" \
--validate

### 4.2 Validating heart rate statistics stream worker

In [None]:
echo "--- Validate $heart_rate_statistics_worker Application";
  
# The stream app will be created by default in the local region. Optionally, you can send dclist to deploy stream
gdnsl stream-worker create --name $heart_rate_statistics_worker \
--description "Calculate the statistics for heart rates" \
--source "HeartRates WITH (type = 'database', collection = 'HeartRates', collection.type='doc', replication.type='global', map.type='json') (name string, bpm int);" \
--table "HeartRateStatistics (eventTime long, name string, minBpm int, maxBpm int, avgBpm double);" \
--query "INSERT INTO HeartRateStatistics
                SELECT 
                    eventTimestamp() as eventTime,
                    name as name,
                    min(bpm) as minBpm,
                    max(bpm) as maxBpm,
                    avg(bpm) as avgBpm
                FROM HeartRates window sliding_time(1 min)
                group by name" \
--validate
                

## 5. Create Stream Application

### 5.1 Creating heart rate simulator definition

#### Option1: Use mockaroo api to generate the mock heart rate data 

In [None]:
echo "--- Create $mock_heart_rate_data_generator Application";

gdnsl stream-worker create --name $mock_heart_rate_data_generator \
--description "Mock data generator by calling mockaroo api for heart rate" \
--trigger "HeartRateDataGeneratorTrigger WITH ( interval = 10 sec );" \
--table "HeartRates (name string, bpm int);" \
--sink "MockarooServiceCallSink WITH (type='http-call', sink.id='mockaroo-service', publisher.url='https://api.mockaroo.com/api/a6e130b0?count=10&key=$mockarooAPIKey', map.type='json', method='GET') (triggered_time string);" \
--source "MockarooServiceResponseSink WITH (type='http-call-response', sink.id='mockaroo-service', map.type='json', http.status.code='200') (name string, bpm int);" \
--query "INSERT INTO MockarooServiceCallSink
SELECT time:currentTimestamp() as triggered_time 
FROM HeartRateDataGeneratorTrigger;" \
--query "INSERT INTO HeartRates SELECT name, bpm FROM MockarooServiceResponseSink;"

#### Option2: Use custom stream worker to generate the heart rate data

In [None]:
echo "--- Validate $mock_heart_rate_data_generator Application";

gdnsl stream-worker create --name $mock_heart_rate_data_generator \
--description "Mock data generator by strem worker" \
--trigger "HeartRateDataGeneratorTrigger WITH ( interval = 10 sec );" \
--table "HeartRates (name string, bpm int);" \
--query "@info(name = 'ConsumeProcessedData')
    INSERT INTO HeartRates
    SELECT 
    js:eval(\"['Vasili', 'Rivalee', 'Betty', 'Jennifer', 'Alane', 'Sarena', 'Bruno', 'Carolee', 'Emmott', 'Andre'][Math.floor(Math.random() * 10)]\",\"string\") as name,
    js:eval(\"Math.floor(Math.random() * 40) + 40\",\"int\") as bpm
    FROM HeartRateDataGeneratorTrigger;"

### 5.2 Creating heart rate statistics stream worker

In [None]:
echo "--- Create $heart_rate_statistics_worker Application";
  
# The stream app will be created by default in the local region. Optionally, you can send dclist to deploy stream
gdnsl stream-worker create --name $heart_rate_statistics_worker \
--description "Calculate the statistics for heart rates" \
--source "HeartRates WITH (type = 'database', collection = 'HeartRates', collection.type='doc', replication.type='global', map.type='json') (name string, bpm int);" \
--table "HeartRateStatistics (eventTime long, name string, minBpm int, maxBpm int, avgBpm double);" \
--query "INSERT INTO HeartRateStatistics
                SELECT 
                    eventTimestamp() as eventTime,
                    name as name,
                    min(bpm) as minBpm,
                    max(bpm) as maxBpm,
                    avg(bpm) as avgBpm
                FROM HeartRates window sliding_time(1 min)
                group by name"

## 6. Publish Stream Application

In [None]:
echo "Activating Stream Workers"
gdnsl stream-worker $heart_rate_statistics_worker --enable
gdnsl stream-worker $mock_heart_rate_data_generator --enable
echo "Activated Stream Workers"

## 7. Checking HeartRates documents using C8QL

Note: Please wait atlist 1 minute before running this. As we used sliding window of 1 minute for data aggrigation

In [None]:
gdnsl query "FOR doc IN $heart_rates_collection LIMIT 0, 5 RETURN doc"

## 8. Checking HeartRateStatistics documents using C8QL

In [None]:
gdnsl query "FOR doc IN $heart_rates_statistics_collection LIMIT 0, 5 RETURN doc"

## 9.  Delete StreamApp and Collections

In [None]:
echo " DELETE_DATA"
gdnsl stream-worker delete $heart_rate_statistics_worker;
gdnsl stream-worker delete $mock_heart_rate_data_generator;
gdnsl collection delete $heart_rates_collection;
gdnsl collection delete $heart_rates_statistics_collection;
echo "StreamApp and Collection deleted"

## Section Completed!

Congratulations! you have completed this tutorial.