# Py09 - CI/CD Tutorial

## Overview

In this tutorial we will build a simple CI/CD script for GDN plateform

## Pre-requisite

Let's assume your 

- tenant name is an email address
- user password is xxxxx.

if you need to install pyc8, you can run the cell below, otherwise you may skip it.

In [None]:
!pip install pyC8

## 1. Importing Libraries & Define Variables

The first step is to import the libraries we need and define the variables we will be using in this tutorial. This is also the right place to add your GDN login credentials. i.e. your email and password. You will also need to make sure you have specified the correct federation URL. In this example it is "gdn.paas.macrometa.io" and we are using the default geo fabric "_system".

In [None]:
import json
import datetime
from c8 import C8Client
from c8.request import Request
import random
import six
import base64
import time
import threading

# Variables
global_url = "gdn.paas.macrometa.io"
email = "email"  # <-- Email goes here
password = "password"  # <-- password goes here
geo_fabric = "_system"

COLLECTION_TYPE = {
    "DOCUMENT": "document",
    "KV": "graph",
}

fabric_list = ["cicd-fabric"]

collection_list = [
    {
        "name": "Doc1",
        "type": COLLECTION_TYPE["DOCUMENT"],
        "isEdge": False,
        "hasStream": True,
        "noOfInsertOperation": 100,
        "noOfUpdateOperation": 1,
        "noOfDeleteOperations": 100
    },
    {
        "name": "Doc2",
        "type": COLLECTION_TYPE["DOCUMENT"],
        "isEdge": False,
        "hasStream": True,
        "noOfInsertOperation": 100,
        "noOfUpdateOperation": 1,
        "noOfDeleteOperations": 100
    },
    {
        "name": "KV1",
        "type": COLLECTION_TYPE["KV"],
        "hasStream": True
    }
]

stream_list = [
    {
        "name": "gstream",
        "isLocal": False,
        "noOfInsertOperation": 10,
        "noOfUpdateOperation": 0,
        "noOfDeleteOperations": 0
    },
    {
        "name": "lstream",
        "isLocal": True,
        "noOfInsertOperation": 10,
        "noOfUpdateOperation": 0,
        "noOfDeleteOperations": 0
    }
]

query_worker_list = {
    "insert_data": {
        "name": "insertRecord",
        "value": """FOR i IN 1..100
              INSERT {
                    "firstname":CONCAT("Halie", TO_STRING(i)),
                    "lastname":CONCAT("Linkie", TO_STRING(i)),
                    "email":CONCAT("hlinkie0", TO_STRING(i),"@irs.gov"),
                    "zipcode": CONCAT("2950-53", TO_STRING(i))
                } INTO COLLECTION_NAME""",
        "parameter": {},
    },
    "get_data": {
        "name": "getRecords",
        "value": "FOR doc IN COLLECTION_NAME RETURN doc",
    },
    "update_data": {
        "name": "updateRecord",
        "value": """FOR doc IN COLLECTION_NAME 
      filter doc.email == 'hlinkie05@irs.gov'
      UPDATE { _key:doc._key,  \"lastname\": \"cena\" }
        IN COLLECTION_NAME""",
    },
    "get_count": {
        "name": "countRecords",
        "value": "RETURN COUNT(FOR doc IN COLLECTION_NAME RETURN 1)",
    },
    "delete_data": {
        "name": "deleteRecord",
        "value": """FOR doc IN COLLECTION_NAME 
          filter doc.email == 'hlinkie03@irs.gov'
          REMOVE doc 
          IN COLLECTION_NAME """,
    },
    "delete_all_data": {
        "name": "deleteAllRecord",
        "value": """FOR doc IN COLLECTION_NAME
        REMOVE doc 
        IN COLLECTION_NAME """,
    },
}

graph_list = [
    {
        "name": "social",
        "edgeDefinitions": {
            "collection": "relation",
            "from": ["female", "male"],
            "to": ["female", "male"],
        },
    },
    {"name": "children"},
]

stream_worker_list = [
    {
        "name": "MockHeartRateDataGenerator",
        "definition": '''
@App:name("MockHeartRateDataGenerator")
@App:qlVersion("2")

CREATE TRIGGER HeartRateDataGeneratorTrigger WITH ( interval = 10 sec );

CREATE TABLE HeartRates (name string, bpm int);


-- Note: Generating random bpm and name 
@info(name = 'ConsumeProcessedData')
INSERT INTO HeartRates
SELECT 
js:eval("['Vasili', 'Rivalee', 'Betty', 'Jennifer', 'Alane', 'Sarena', 'Bruno', 'Carolee', 'Emmott', 'Andre'][Math.floor(Math.random() * 10)]","string") as name,
js:eval("Math.floor(Math.random() * 40) + 40","int") as bpm
FROM HeartRateDataGeneratorTrigger;
''',
    },
]

web_socket_data = {}

web_socket_list = []

active_fabric = ""

DOCUMENT_OPERATIONS = {
    "UPDATE": "UPDATE",
    "INSERT": "INSERT",
    "DELETE": "DELETE",
}

info = {
    "noOfCollection": 0,
    "noOfStreamWorker": 0,
    "noOfQueryWorker": 0,
    "noOfStream": 0,
    "noOfFabric": 0,
    "noOfGraph": 0,
}

## 2. Connecting to GDN

Now that we have imported the required libraries and added our login details, we can connect to GDN. Do this by running the cell bellow.

You will see the cell output reflect a successful connection. If not go back to the first step and check the details you entered.

In [None]:
print("\n ------- CONNECTION SETUP  ------")
print("tenant: {}, geofabric:{}".format(email, geo_fabric))
client = C8Client(protocol='https', host=global_url, port=443,
                  email=email, password=password,
                  geofabric=geo_fabric)

tenant = client.tenant(email=email, password=password)
current_fabric = tenant.useFabric(geo_fabric)

## 3. Creating Helper Functions

Helper functions are basic building blocks for the CI/CD pipeline. These functions are responsible for creating/deleting collection, streams, query worker, stream worker, etc. based on the list provied in the section 1. 

In [None]:
# Logger function
def log(message):
  print("[{date}]: {message}".format(
      date=datetime.datetime.now(), message=message))

# Generate the random numbers
def get_random_number(min, max):
  if min == max:
    return min
  return random.randint(min, max-1)

# Create the document collection
def create_document_collection(name, has_stream=True, is_edge=False):
  has_collection = current_fabric.has_collection(name)
  if has_collection:
    return
  current_fabric.create_collection(name=name, edge=is_edge)
  if has_stream:
    request = Request(
        method='put',
        endpoint='/collection/{}/stream'.format(name),
        data={
            "hasStream": True
        }
    )

  def response_handler(resp):
    if not resp.is_success:
      raise ValueError("Failed to update stream flag {}".format(name))
  # In upcoming release this API call will be replaced by pyc8 internal function
  current_fabric._execute(request, response_handler)
  info["noOfCollection"] += 1


# Create the kv collection
def create_kv_collection(name, has_stream=True):
  has_collection = current_fabric.key_value.has_collection(name)
  if has_collection:
    return
  current_fabric.key_value.create_collection(name, {"stream": has_stream})
  info["noOfCollection"] += 1

# Create the query workers
def create_query_worker(query_worker, query_worker_list):
  try:
    for collection in collection_list:
      query_worker_name = "{collectionName}_{queryWorkerName}".format(
          collectionName=collection["name"], queryWorkerName=query_worker["name"])
      if collection["type"] != COLLECTION_TYPE["DOCUMENT"] or query_worker_name in query_worker_list:
        continue
      else:
        data = {
            "query": {
                "parameter": query_worker.get("parameter", {}),
                "name": query_worker_name,
                "value": query_worker["value"].replace("COLLECTION_NAME", collection["name"])
            }
        }
        current_fabric.save_restql(data)
  except Exception as error:
    log("Error while creating query worker {queryWorker} : -- {err}".format(
        queryWorker=query_worker["name"], err=error))

# Create the GeoFabric
def create_fabric(name, fabric_list, local_dc_name, all_dc_list):
  global_dc_list = set([] if len(all_dc_list) == 0 else
                       [all_dc_list[get_random_number(0, len(all_dc_list) - 1)],
                        all_dc_list[get_random_number(0, len(all_dc_list) - 1)], ]
                       )
  global_dc_list.add(local_dc_name)
  if name in fabric_list:
    return
  current_fabric.create_fabric(name, dclist=global_dc_list)
  info["noOfFabric"] += 1

# Create the graph
def crate_graph(name, details=None):
  has_graph = current_fabric.has_graph(name)
  if has_graph:
    return
  graph = current_fabric.create_graph(name)
  if details is not None:
    graph.create_edge_definition(
        edge_collection=details["collection"],
        from_vertex_collections=details["from"],
        to_vertex_collections=details["to"]
    )
  info["noOfGraph"] += 1

# Create the stream
def create_stream(name, is_local):
  is_stream_exist = current_fabric.has_stream(name, local=is_local)
  if is_stream_exist:
    return
  current_fabric.create_stream(name, is_local)
  info["noOfStream"] += 1


def websocket_listener(name, subscriber):
  global web_socket_data
  
  web_socket_data[name] = {}
  web_socket_data[name][DOCUMENT_OPERATIONS["UPDATE"]] = []
  web_socket_data[name][DOCUMENT_OPERATIONS["INSERT"]] = []
  web_socket_data[name][DOCUMENT_OPERATIONS["DELETE"]] = []
    
  try:
      while subscriber.connected:
        #Listen on stream for any receiving msg's
        message = json.loads(subscriber.recv())
        payload = json.loads(base64.b64decode(message["payload"]))
        operationType = message["properties"].get(
            "op", DOCUMENT_OPERATIONS["INSERT"])
        if operationType == DOCUMENT_OPERATIONS["UPDATE"]:
          web_socket_data[name][DOCUMENT_OPERATIONS["UPDATE"]].append(payload)
        elif operationType == DOCUMENT_OPERATIONS["DELETE"]:
          web_socket_data[name][DOCUMENT_OPERATIONS["DELETE"]].append(payload)
        else:
          web_socket_data[name][DOCUMENT_OPERATIONS["INSERT"]].append(payload)
        #Acknowledge the received msg.
        subscriber.send(json.dumps({'messageId': message['messageId']}))
  except Exception as e:
    log("WebSocket Closed {}".format(name))

#  Create the stream subscriber
def create_subscriber(
    name,
    is_local=True,
    is_collection_stream=False
):
  global web_socket_list
  
  try:
    subscriber = current_fabric.stream().subscribe(name, subscription_name=name,
                                                   local=is_local, isCollectionStream=is_collection_stream)
    web_socket_list.append(subscriber)
    x = threading.Thread(target=websocket_listener, args=(name, subscriber,))
    x.start()

  except Exception as error:
    error.message = \
        "Error while creating subscriber {} : -- {}".format(name, error)
    raise error

# Create the stream publisher
def create_publisher(name, is_local):
  global web_socket_list
  producer = current_fabric.stream().create_producer(
      name,
      local=is_local
  )
  web_socket_list.append(producer)
  for i in range(10):
    message = {
        "firstname": "Halie{}".format(i),
        "lastname": "Linkie{}".format(i),
        "email": "hlinkie0{}@irs.gov".format(i),
        "zipcode": "2950-53{}".format(i),
    }
    payload = {
        "payload": base64.b64encode(six.b(json.dumps(message))).decode("utf-8")
    }
    producer.send(json.dumps(payload))

# Create the stream based on provided stream list in stream_list
def create_streams():
  for stream in stream_list:
    create_stream(stream["name"], stream["isLocal"])

# Create the stream subscribers based on provided stream list in stream_list
def create_stream_subscribers():
  for stream in stream_list:
    create_subscriber(stream["name"], stream["isLocal"])

# Create the stream publisher based on provided stream list in stream_list
def create_stream_publisher():
  for stream in stream_list:
    create_publisher(stream["name"], stream["isLocal"])

# Create the new fabric
def create_new_fabrics():
  global active_fabric
  local_dc = client.get_local_dc()
  local_dc_name = local_dc["_key"]
  all_dc_list = client.get_dc_list()
  global_dc_list = list(filter(
      lambda dc_name: dc_name != local_dc_name,
      all_dc_list))
  existing_fabric_list = current_fabric.fabrics()
  for fabric in fabric_list:
    create_fabric(fabric, existing_fabric_list, local_dc_name, global_dc_list)
  active_fabric = fabric_list[0 if len(
      fabric_list) == 1 else get_random_number(0, len(fabric_list) - 1)]
  use_fabric()

# Create the collections based on provided collection list in collection_list
def create_collections():
  for collection in collection_list:
    if collection["type"] == COLLECTION_TYPE["KV"]:
      create_kv_collection(collection["name"], collection["hasStream"])
    else:
      create_document_collection(
          collection["name"], collection["hasStream"], collection["isEdge"])

# Create the query workers based on provided query worker list in query_worker_list
def create_query_workers():
  for key, value in query_worker_list.items():
    existing_query_workers = current_fabric.get_all_restql()
    existing_query_workers = list(
        map(lambda query_worker: query_worker["name"], existing_query_workers))
    create_query_worker(value, existing_query_workers)
    info["noOfQueryWorker"] += 1

# Create the subscriber based on provided collection list in collection_list
def create_collections_subscribers():
  _collectionList = list(filter(
      lambda collection: collection["type"] == COLLECTION_TYPE["DOCUMENT"], collection_list))
  for collection in _collectionList:
    create_subscriber(collection["name"], is_collection_stream=True)

# Create the graphs based on provided graph list in graph_list
def create_graphs():
  for graph in graph_list:
    crate_graph(graph["name"], graph.get("edgeDefinitions", None))

# Create the stream worker based on provided stream list in streamAppList
def create_stream_workers():
  stream_app_list = current_fabric.retrive_stream_app()
  stream_app_list = list(
      map(lambda stream_app: stream_app["name"], stream_app_list["streamApps"]))
  for worker in stream_worker_list:
    if worker["name"] not in stream_app_list:
        result = current_fabric.create_stream_app(worker["definition"])
        info["noOfStreamWorker"] += 1

# Activate the stream worker based on provided stream list in stream_worker_list
def activate_stream_workers(is_active=True):
  for worker in stream_worker_list:
    current_fabric.stream_app(worker["name"]).change_state(is_active)

# set current fabric to jsc8 client
def use_fabric():
  global current_fabric
  current_fabric = tenant.useFabric(active_fabric)
  log("Using fabric {}".format(active_fabric))

# Execute all the query workers provided in the query_worker_list
# for every document collection present in collection_list
def execute_query_workers():
  for query, value in query_worker_list.items():
    for collection in collection_list:
      if collection["type"] == COLLECTION_TYPE["DOCUMENT"]:
        current_fabric.execute_restql(
            "{collectionName}_{queryWorkerName}".format(
                collectionName=collection["name"], queryWorkerName=value["name"]),
            value.get("data", {}))

# Read the websocket data and verify the number of operations
def verify_streams():
  # Waiting for 5 seconds so subscriber receives all the data
  time.sleep(5)
  for stream in stream_list:
    wsData = web_socket_data.get(stream["name"], None)
    if not (wsData and len(wsData[DOCUMENT_OPERATIONS["INSERT"]]) == stream["noOfInsertOperation"] and
            len(wsData[DOCUMENT_OPERATIONS["UPDATE"]]) == stream["noOfUpdateOperation"] and
            len(wsData[DOCUMENT_OPERATIONS["DELETE"]]) == stream["noOfDeleteOperations"]):
      log("Websocket data validation failed for {streamName}".format(
          streamName=stream["name"]))

# Validate the collection stream websocket data and operations
def verify_collection_stream():
  # Waiting for 5 seconds so subscriber receives all the data
  time.sleep(5)
  tmpCol = list(filter(
      lambda collection: collection["type"] == COLLECTION_TYPE["DOCUMENT"], collection_list))
  for collection in tmpCol:
    wsData = web_socket_data[collection["name"]]
    if not (wsData and
            len(wsData[DOCUMENT_OPERATIONS["INSERT"]]) == collection["noOfInsertOperation"] and
            len(wsData[DOCUMENT_OPERATIONS["UPDATE"]]) == collection["noOfUpdateOperation"] and
            len(wsData[DOCUMENT_OPERATIONS["DELETE"]]) == collection["noOfDeleteOperations"]):
      log("Websocket data validation failed for collection {collectionName}".format(
          collectionName=collection["name"]))

# Delete all the query workers present in the fabric
def delete_query_workers():
  query_worker_list = current_fabric.get_all_restql()
  for query in query_worker_list:
    current_fabric.delete_restql(query["name"])

# Delete all the graphs present in the fabric
def delete_graphs():
  graph_list = current_fabric.graphs()
  for graph in graph_list:
    current_fabric.delete_graph(name=graph["name"], drop_collections=True)

# Delete all the collection present in the fabric
def delete_collections():
  collection_list = current_fabric.collections()
  for collection in list(filter(
          lambda collection: not collection["system"],
          collection_list)):
    current_fabric.delete_collection(collection["name"])

# Delete all the fabric workers present in the fabric_list
def delete_fabrics():
  for fabric in fabric_list:
    current_fabric.delete_fabric(fabric)

# Terminate all the websocket
def clear_web_socket():
  for web_socket in web_socket_list:
    web_socket.close()

# Delete all the streams present in the fabric
def delete_streams():
  streams_list = []
  global_stream = current_fabric.streams()
  local_stream = current_fabric.streams(True)
  streams_list = global_stream + local_stream

  def response_handler(resp):
    if not resp.is_success:
      raise ValueError("Failed to delete stream: {}")
  for stream in streams_list:
    request = Request(
        method='delete',
        endpoint='/streams/{}'.format(stream["name"])
    )
    # In upcoming release this API call will be replaced by pyc8 internal function
    current_fabric._execute(request, response_handler)

# Delete all the streams worker present in the fabric
def delete_stream_worker():
  all_steam_apps = current_fabric.retrive_stream_app()
  for stream_app in all_steam_apps["streamApps"]:
    current_fabric.stream_app(stream_app["name"]).delete()

# It will delete the query-workers, graphs, collections, streams, stream-workers and fabric
def clear():
  global active_fabric
  use_fabric()
  delete_query_workers()
  delete_graphs()
  delete_collections()
  delete_streams()
  delete_stream_worker()
  active_fabric = "_system"
  use_fabric()
  delete_fabrics()
  clear_web_socket()

## 4. Create CI/CD Pipeline

Below function contains the list of sub functions.Each subfunction is responsible for either creating or deleting the elements in the GDN. 

Current flow is as below 
- Create fabrics
- Pick any random fabric from created fabric
- Create collections
- Create query worker for each collection 
- Create stream workers 
- Activate the stream workers
- Create the collection subscribers
- Execute the query workers
- Execute the CRUD using import API and truncate the collection. This operation is operformed on each collection 
- Validate the collection streams after query worker and CRUD operation execution with expected output
- Create streams
- Create the stream subscribers
- Create the stream publisher. Publisher is pushing some records in the streams
- Verify the stream output by counting the number of object received in the stream.
- Unpublish the stream worker
- Delete all the queryworkers, all the graphs, all the collections, all the streams, all the stream workers, and delete the number of fabric we created in the starting of program


If you want to customize the CI/CD pipeline then please comment the appropriate steps and modify the CI/CD pipeline.

In [None]:
# It will execute GDN pipeline in series manner
def execute_cicd_pipeline():

    log("\n ------- CREATE FABRICS ------")
    create_new_fabrics()
    log("\n ------- CREATED FABRICS ------")

    use_fabric()

    log("\n ------- CREATE GEO-REPLICATED COLLECTION  ------")
    create_collections()
    log("\n ------- CREATED GEO-REPLICATED COLLECTION  ------")

    log("\n ------- CREATE GRAPHS ------")
    create_graphs()
    log("\n ------- CREATED GRAPHS  ------")

    log("\n ------- CREATE QUERY WORKER  ------")
    create_query_workers()
    log("\n ------- CREATED QUERY WORKER  ------")

    log("\n ------- CREATE STREAM WORKERS  ------")
    create_stream_workers()
    log("\n ------- CREATED STREAM WORKERS  ------")

    log("\n ------- PUBLISH STREAM WORKERS  ------")
    activate_stream_workers()
    log("\n ------- PUBLISH STREAM WORKERS  ------")
    
    log("\n ------- CREATE COLLECTION SUBSCRIBER  ------")
    create_collections_subscribers()
    log("\n ------- CREATED COLLECTION SUBSCRIBER  ------")

    log("\n ------- EXECUTE CRUD USING QUERY WORKER ------")
    execute_query_workers()
    log("\n ------- EXECUTED CRUD USING QUERY WORKER ------")

    log("\n ------- VERIFY COLLECTION STREAM ------")
    verify_collection_stream()
    log("\n ------- VERIFIED COLLECTION STREAM ------")

    log("\n ------- CREATE STREAM ------")
    create_streams()
    log("\n ------- CREATED STREAM ------")

    log("\n ------- SUBSCRIBE STREAM ------")
    create_stream_subscribers()
    log("\n ------- SUBSCRIBED STREAM ------")
    
    log("\n ------- CREATE STREAM PRODUCER ------")
    create_stream_publisher()
    log("\n ------- CREATED STREAM PRODUCER ------")

    log("\n ------- VERIFY STREAM ------")
    verify_streams()
    log("\n ------- VERIFIED STREAM ------")

    log("\n ------- UNPUBLISH STREAM WORKERS  ------")
    activate_stream_workers(False)
    log("\n ------- UNPUBLISH STREAM WORKERS  ------")

    # clearing everything from previous runs
    log("\n ------- CLEARING ------")
    clear()
    log("\n ------- CLEARED  ------")

    log(json.dumps(info, indent=4))

## 5. Execute CI/CD script

Below function is executing the CI/CD pipeline. 

In [None]:
# Execute the CI/CD pipeline
execute_cicd_pipeline()

## Section Completed!

Congratulations! you have completed this tutorial.