<a href="https://colab.research.google.com/github/gurjot-kaur/CSYE7245-Tutorials/blob/master/Diagrams_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CSYE 7245 Big Data Systems and Intelligence Analytics - Diagrams Tutorial

## Diagrams as Code



***Diagrams lets you draw the cloud system architecture in Python code***

* It is used for prototyping a new system architecture design without any design tools. You can also describe or visualize the existing system architecture as well. 

* Diagrams currently supports main major providers including: AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud etc. 

* It also supports On-Premise nodes, SaaS and major Programming frameworks and languages.

* It also allows you to track the architecture diagram changes in any version control system.

NOTE: It does not control any actual cloud resources nor does it generate cloud formation or terraform code. It is just for drawing the cloud system architecture diagrams.


Diagrams can be a good replacement for designing your workflows and pipeline architectures rather than using Draw.io or LucidCharts

### Install below packages 

***1. for installing graphviz*** -

- pip install graphviz

***2. for installing diagrams***- 

- pip install diagrams



In [None]:
!pip install graphviz



In [None]:
!pip install diagrams

Collecting diagrams
[?25l  Downloading https://files.pythonhosted.org/packages/0e/e9/3b53691337aa1e0648937fe08bdf81ec4daab190b18c4820e9a2ce27144d/diagrams-0.17.0-py3-none-any.whl (15.2MB)
[K     |████████████████████████████████| 15.2MB 282kB/s 
[?25hCollecting graphviz<0.14.0,>=0.13.2
  Downloading https://files.pythonhosted.org/packages/f5/74/dbed754c0abd63768d3a7a7b472da35b08ac442cf87d73d5850a6f32391e/graphviz-0.13.2-py2.py3-none-any.whl
Collecting contextvars<3.0,>=2.4; python_version >= "3.6" and python_version < "3.7"
  Downloading https://files.pythonhosted.org/packages/83/96/55b82d9f13763be9d672622e1b8106c85acb83edd7cc2fa5bc67cd9877e9/contextvars-2.4.tar.gz
Collecting immutables>=0.9
[?25l  Downloading https://files.pythonhosted.org/packages/99/e0/ea6fd4697120327d26773b5a84853f897a68e33d3f9376b00a8ff96e4f63/immutables-0.14-cp36-cp36m-manylinux1_x86_64.whl (98kB)
[K     |████████████████████████████████| 102kB 9.6MB/s 
Building wheels for collected packages: contextvars
  B

## Diagrams
* Diagram is a primary object representing a diagram.

* Diagram represents a global diagram context.

* You can create a diagram context with Diagram class. The first parameter of Diagram constructor will be used for output filename.

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2

with Diagram("Simple Diagram"):
    EC2("web")

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2

with Diagram("Simple Diagram", outformat="jpg"):
    EC2("web")

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2

with Diagram("Simple Diagram", filename="my_diagram"):
    EC2("web")

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2

with Diagram("Simple Diagram", show=False):
    EC2("web")

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2

graph_attr = {
    "fontsize": "45",
    "bgcolor": "transparent"
}

with Diagram("Simple Diagram", show=False, graph_attr=graph_attr):
    EC2("web")

## Nodes
* Node is a second object representing a node or system component.

* Node is an abstract concept that represents a single system component object.


In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB
from diagrams.aws.storage import S3

with Diagram("Web Services", show=False):
    ELB("lb") >> EC2("web") >> RDS("userdb") >> S3("store")
    ELB("lb") >> EC2("web") >> RDS("userdb") << EC2("stat")
    (ELB("lb") >> EC2("web")) - EC2("web") >> RDS("userdb")

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB

with Diagram("Workers", show=False, direction="TB"):
    lb = ELB("lb")
    db = RDS("events")
    lb >> EC2("worker1") >> db
    lb >> EC2("worker2") >> db
    lb >> EC2("worker3") >> db
    lb >> EC2("worker4") >> db
    lb >> EC2("worker5") >> db

In [None]:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB

with Diagram("Grouped Workers", show=False, direction="TB"):
    ELB("lb") >> [EC2("worker1"),
                  EC2("worker2"),
                  EC2("worker3"),
                  EC2("worker4"),
                  EC2("worker5")] >> RDS("events")

## Clusters
Cluster allows you group (or clustering) the nodes in an isolated group.

In [None]:
from diagrams import Cluster, Diagram
from diagrams.aws.compute import ECS
from diagrams.aws.database import RDS
from diagrams.aws.network import Route53

with Diagram("Simple Web Service with DB Cluster", show=False):
    dns = Route53("dns")
    web = ECS("service")

    with Cluster("DB Cluster"):
        db_master = RDS("master")
        db_master - [RDS("slave1"),
                     RDS("slave2")]

    dns >> web >> db_master

In [None]:
from diagrams import Cluster, Diagram
from diagrams.aws.compute import ECS, EKS, Lambda
from diagrams.aws.database import Redshift
from diagrams.aws.integration import SQS
from diagrams.aws.storage import S3

with Diagram("Event Processing", show=False):
    source = EKS("k8s source")

    with Cluster("Event Flows"):
        with Cluster("Event Workers"):
            workers = [ECS("worker1"),
                       ECS("worker2"),
                       ECS("worker3")]

        queue = SQS("event queue")

        with Cluster("Processing"):
            handlers = [Lambda("proc1"),
                        Lambda("proc2"),
                        Lambda("proc3")]

    store = S3("events store")
    dw = Redshift("analytics")

    source >> workers >> queue >> handlers
    handlers >> store
    handlers >> dw

## Edges
Edge is representing an edge between Nodes.

In [None]:
from diagrams import Cluster, Diagram, Edge
from diagrams.onprem.analytics import Spark
from diagrams.onprem.compute import Server
from diagrams.onprem.database import PostgreSQL
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.logging import Fluentd
from diagrams.onprem.monitoring import Grafana, Prometheus
from diagrams.onprem.network import Nginx
from diagrams.onprem.queue import Kafka

with Diagram(name="Advanced Web Service with On-Premise (colored)", show=False):
    ingress = Nginx("ingress")

    metrics = Prometheus("metric")
    metrics << Edge(color="firebrick", style="dashed") << Grafana("monitoring")

    with Cluster("Service Cluster"):
        grpcsvc = [
            Server("grpc1"),
            Server("grpc2"),
            Server("grpc3")]

    with Cluster("Sessions HA"):
        master = Redis("session")
        master - Edge(color="brown", style="dashed") - Redis("replica") << Edge(label="collect") << metrics
        grpcsvc >> Edge(color="brown") >> master

    with Cluster("Database HA"):
        master = PostgreSQL("users")
        master - Edge(color="brown", style="dotted") - PostgreSQL("slave") << Edge(label="collect") << metrics
        grpcsvc >> Edge(color="black") >> master

    aggregator = Fluentd("logging")
    aggregator >> Edge(label="parse") >> Kafka("stream") >> Edge(color="black", style="bold") >> Spark("analytics")

    ingress >> Edge(color="darkgreen") << grpcsvc >> Edge(color="darkorange") >> aggregator
    

In [None]:
from diagrams import Cluster, Diagram
from diagrams.gcp.analytics import BigQuery, Dataflow, PubSub
from diagrams.gcp.compute import AppEngine, Functions
from diagrams.gcp.database import BigTable
from diagrams.gcp.iot import IotCore
from diagrams.gcp.storage import GCS

with Diagram("Message Collecting", show=False):
    pubsub = PubSub("pubsub")

    with Cluster("Source of Data"):
        [IotCore("core1"),
         IotCore("core2"),
         IotCore("core3")] >> pubsub

    with Cluster("Targets"):
        with Cluster("Data Flow"):
            flow = Dataflow("data flow")

        with Cluster("Data Lake"):
            flow >> [BigQuery("bq"),
                     GCS("storage")]

        with Cluster("Event Driven"):
            with Cluster("Processing"):
                flow >> AppEngine("engine") >> BigTable("bigtable")

            with Cluster("Serverless"):
                flow >> Functions("func") >> AppEngine("appengine")

    pubsub >> flow
Image(filename='advanced_web_service_with_onpremise_colored.png',width=600, height=400)

NameError: ignored

## Message Collecting System and Data Lake on GCP

In [None]:
from diagrams import Cluster, Diagram
from diagrams.gcp.analytics import BigQuery, Dataflow, PubSub
from diagrams.gcp.compute import AppEngine, Functions
from diagrams.gcp.database import BigTable
from diagrams.gcp.iot import IotCore
from diagrams.gcp.storage import GCS

with Diagram("Message Collecting", show=False):
    pubsub = PubSub("pubsub")

    with Cluster("Source of Data"):
        [IotCore("core1"),
         IotCore("core2"),
         IotCore("core3")] >> pubsub

    with Cluster("Targets"):
        with Cluster("Data Flow"):
            flow = Dataflow("data flow")

        with Cluster("Data Lake"):
            flow >> [BigQuery("bq"),
                     GCS("storage")]

        with Cluster("Event Driven"):
            with Cluster("Processing"):
                flow >> AppEngine("engine") >> BigTable("bigtable")

            with Cluster("Serverless"):
                flow >> Functions("func") >> AppEngine("appengine")

    pubsub >> flow


## Airflow Diagrams 

Airflow Diagrams is an Airflow plugin that aims to easily visualise your Airflow DAGs on service level from providers like AWS, GCP, Azure, etc. via diagrams.

Install Airflow and Airflow Diagrams - 
* pip install apache-airflow
* pip install airflow-diagrams


In [None]:
!pip install airflow-diagrams

Collecting airflow-diagrams
  Downloading https://files.pythonhosted.org/packages/ba/e3/675acc23d98346cb00456e9ff610988b681722e8e6d3bfceb65186059eb7/airflow_diagrams-0.1.0-py3-none-any.whl
Installing collected packages: airflow-diagrams
Successfully installed airflow-diagrams-0.1.0


In [None]:
from airflow.models.dag import DAG
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.dates import days_ago

from airflow_diagrams import generate_diagram_from_dag

with DAG('example_dag', schedule_interval=None, default_args=dict(start_date=days_ago(2))) as dag:
    DummyOperator(task_id='run_this_1') >> [
        DummyOperator(task_id='run_this_2a'), DummyOperator(task_id='run_this_2b')
    ] >> DummyOperator(task_id='run_this_3')

generate_diagram_from_dag(dag=dag, diagram_file="example_dag.py")

In [None]:
from diagrams import Diagram
from diagrams.generic.blank import Blank

with Diagram("example_dag", show=False):
    run_this_1 = Blank("run_this_1")
    run_this_2a = Blank("run_this_2a")
    run_this_2b = Blank("run_this_2b")
    run_this_3 = Blank("run_this_3")
    
    run_this_1 >> run_this_2b
    run_this_2b >> run_this_3
    run_this_1 >> run_this_2a
    run_this_2a >> run_this_3
    

## References 

https://github.com/mingrammer/diagrams

https://diagrams.mingrammer.com/