Skip to content
Permalink
Browse files

added new docs

  • Loading branch information
basraven committed Jan 28, 2020
1 parent 3bb0cdb commit c6ecac335b55e940cdb2d0f6cab089586814efcb
@@ -6,13 +6,15 @@ bookToc: false

# The Event-Driven la1r
This site showcases everything which was implemented in my own home la1r.
The goal is to see how much Tracking, Automation and AI still brings convenience and what is just flat-out annoying.
The goal is to see how much (Realtime) Tracking, Automation and AI still brings convenience and what is just flat-out annoying.
Since I'm running this at home, me and people in my environment are my test subjects, please regard all the material and integrations tested as such.
All the code that's describe on this site can be found at https://github.com/basraven/la1r

## Key topics
The following key topics will be touched to showcase la1r:
* Kubernetes, Docker
* AI, Automation
* Streaming (Big Data) pipelines
* Home Automation, Home surveillance
* And many more things!

@@ -39,6 +41,7 @@ I created a planning to ensure la1r isn't a fully unguided experiment, but it is

## Why this site?
I noticed that combining all of these applications into a single integration environment can be challenging to keep all documentation and notes into a single place. To force myself to properly document my steps and also to give back to the open-source community, I decided to publish all of it on this domain. Hopefully it can also help others with similar aspirations.
Feel free to share this site with other enthausiasts.

### "Please **do** try this at home"
Since this overview is focussed on sharing, I would like to invite anyone to try everything at home, it will not have insane hardware requirements or require niche hardware setups.
@@ -8,13 +8,46 @@ weight: 2
To give a 1-minute overview: this is the conceptual model driving La1r:
![Conceptual](/svg/conceptual.svg)

It clearly shows how the Event-Driven architecture is the heart of La1r
It shows the conceptual layout of how all components in the Event-Driven architecture are structured.
Since we are now only focussing on the conceptual aspects, implementation details such as infrastructure stack are not discussed here.
The fundamental architecture for La1r follows [a Kappa architectural pattern](https://wikipedia.com/kappa-architecture) for processing data.
This means that the architecture will handle bulk/batch data the same way as it handles streaming/realtime data.

There has been made a distinction between two types of data streams:
* Raw Data Stream - Data that is not "Governed" and does not apply to the imposed event structures as described on this site. This is often the data sink for commercial-of-the-shelf (COTS) components with which need to be integrated. To save the hassel of writing custom extensions to those components, it is easier to push the data to a "raw" even stream and transform the raw events into structured events which conform to the even structures.
* Structured Data Stream - This data stream only contains data which conforms to the defined standard for events

To summarize, we identify several conceptual components:
* **Raw Data producer** - Any sensor, smart camera, etc. which is hooked up to the raw data bus and produces data from which well formed events can be produced.
* **Raw Data Bus** - The data vehicle which stores all incoming data which can be used to create events off, this can be very raw measurement data of data structured in an application specific format
* **Automation Event Transformation** - This can be any application which is connected to the Raw Data Bus and is able to create events conforming to the Event Specifications based on the data coming from the Raw Data Bus (this is non-restrictive and can also come from other places)
* **Event Specifications** - All specifications used to structure **all** events which are published on the Structured Event Bus. These Events specifications will also be published [on la1r.com](/)
* [**Structured Event Bus**](structured-event-bus) - The data vehicle which stores the structured events, forming the logical epi-center of the event-driven la1r. The majority of the events are sourced by transformed raw data from the Raw Data Bus
* **AI Processes** - This is identical to the Automated Event Transformation, only these processes primarily involve AI to add events to the Structured Events Bus. In addition it can take already published events and create new events on that behavior based on predictive models.
* **Structured Event Consumer** - This can be any device which consumes events which are published on the Structured Event Bus and acts on it with a certain behavior, for example a light switching on based on an event. This consumer also entails translating the Structured Event into a format a device is able to operate on.
* **Raw Data producer** - Any sensor, smart camera, etc. which is hooked up to the raw data stream and produces data from which well formed events can be produced.
* **Raw Data Stream** - The data vehicle which stores all incoming data which can be used to create events off, this can be very raw measurement data of data structured in an application specific format
* **Streaming Event Transformations** - This can be any application which is connected to the Raw Data Stream and is able to create events conforming to the Event Specifications based on the data coming from the Raw Data Stream (this is non-restrictive and can also come from other places)
* **Event Specifications** - All specifications used to structure **all** events which are published on the Structured Event Stream. These Events specifications will also be published [on la1r.com](/). These event specifications are not a direct part of the actual data flow, but are of a significant enough importance to name it in this diagram.
* **Structured Event Stream** - The data stream which stores the structured events, forming the logical epi-center of the event-driven la1r. The majority of the events are sourced by transformed raw data from the Raw Data Stream or results of analyzed raw/structured events
* **Streaming Analytics Processes** - This is identical to the Automated Event Transformation, only these processes analyze the data to find significant patterns which can be used by other (decoupled) processes futher downstream.
* **Structured Event Consumer** - This can be any device which consumes events which are published on the Structured Event Stream and acts on it with a certain behavior, for example a light switching on based on an event. This consumer also entails translating the Structured Event into a format a device is able to operate on.

## Conceptual Architecture Principles
The la1r architecture followes several conceptual principles which components in its architecture should follow.
Since this will not capture implementation specific / techncial principles, a section on technical principles is describe [in the technical setup page](./technical-setup)
1. Data is realtime and streaming - Always assume that data, streaming throught the la1r infrascture is in streaming "format". Do not unneccesarily store it, or batch it when realtime streaming solutions can also be applied
1. Don't assume information share - since an enterprise environment is conceptually simulated, it should also be simulated that (conceptual) teams are not fully aware of all integrations made by other (conceptual) teams. The implications of this is that there is a need for decoupling and formal informations definitions. An example of this is the site you're currently reading, but further efforts should be made such as formal separation of layers, environments and data to appropriately conform to this conceptual requirement.
1. Decentralized application paradigms where possible - To support the horizontal scaling capabilities, an effort should be made to apply decentralized paradigms, which often improve scalability and availability when implemented correctly.

{{< columns >}}
## Even Specifications
Since there needs to be a way of formally converging to an aligned data setup, a formal event specification setup is made.
This event specification will dictate how all events in the structured stream should be shaped.
Events not conforming to this standard can be disregarded.

[Read more](/docs/conceptual-setup/event-specifications)

<--->

## Governance Catalogs
Since we are still "simulating" an enterprise environment, and since my own memory is sub-optimal, appropriate governance catalogs need to be setup to fully capture the IT landscape on several domains.

[Read more](/docs/conceptual-setup/governance-catalogs)

<--->

{{< /columns >}}

This file was deleted.

@@ -0,0 +1,12 @@
# Governance Catalogs
Since we are still "simulating" an enterprise environment, and since my own memory is sub-optimal, appropriate governance catalogs need to be setup to fully capture the IT landscape on several domains:

* Application Catalogs - These capture which applications are currently running and are in scope of the la1r environments.
* Data Catalgos - These capture what important data is used where (in the landscape) and again, if it is not in the data catalog it is not regarded as part of the scope of La1r.

The concept of governance catalogs ia a reoccuring pattern which large organizations often lack (or at least lack maturity to appropriately apply) in their landscape.
A simple princple of "if it is not in the domain catalog, it doesn't exist" can be regarded in La1r, forcing well-practised governance.

## Wider topic of governance
Since governance is a topic which goes far beyond the reach of a catalog, you can regard this page as far from complete.
But to ensure I use my time as efficient as possible, I will (now) not dive too deep into other governance practises.

This file was deleted.

@@ -6,16 +6,20 @@ weight: 99
---
# What's next?
Since I'm (currently) only developing la1r by myself, there are only so many things you can do at once.
(Feel free to reach out through Github if you want to get involved!)
For this reason I created this planning page in which I track and prioritize what I will add to la1r next.
Feel free to add comments on this through Github!

# In progress
1. Spark 2.x Cluster in k8s
1. Streaming analyitics pipeline with Spark 2.x and Kafka

# Planned
1. Traefik auth proxy middleware with Authelia
1. Facial recognition from images and streaming video
1. Streaming Facial recognition from images and streaming video
1. Object recognition (garbage bin outside of our house) combined with garbage collection ical (https://inzamelkalender.gad.nl/ical/0402200001574396)
1. Formal managed bare-metal security setup
1. Formal managed bare-metal security camera setup
1. "View in repo" button for all pages of the la1r documentation. While reading documentation, for example about Ansible, the visitor should be able to view which scripts are currently discussed by clicking a button to the git repository.
1. Refactor mosquitto to [vernemq](https://vernemq.com/)


@@ -12,10 +12,10 @@ The reason for this is that the technical setup tries to comply to several appli
The technical setup can be divided into area's:

{{< columns >}}
## AI, Automation, Analytics
All the components which have a primary nature in AI, Automation and Analytics
## Data Processing
All the components which focus on processing data to fit in in the appropriate schema and to analyze the processed data with for example AI.

[Read more](/docs/technical-setup/ai-automation-analytics)
[Read more](/docs/technical-setup/data-processing)

<--->

@@ -43,9 +43,9 @@ Small document describing how secrets are managed in the different Ansible and k

{{< /columns >}}

## Application architecture principles
## Technical architecture principles
The la1r architecture followes several techical principles which components in its architecture should follow.
Since this will not capture conceptual principles, a section on conceptual principles is describe [in the conceptual setup page](./conceptual-setup)
1. Only the paranoid survive, apply and practice backup scenarios. - Backup scenarios should not only be implemented as tick in the box for our list of non functional requirements (nfrs), but should also be practiced where possible.
1. Aim for near horizontal scaling - all services should be able to scale with cluster size. The infrastructure architecture of my current implementation is rather rigid, but the applications on it should be aimed on flexible and horizontally scalable underlying infrastructure.
1. Decentralized application paradigms where possible - To support the horizontal scaling capabilities, an effort should be made to apply decentralized paradigms, which often improve scalability and availability when implemented correctly.
1. Don't assume information share - since an enterprise environment is conceptually simulated, it should also be simulated that (conceptual) teams are not fully aware of all integrations made by other (conceptual) teams. The implications of this is that there is a need for decoupling and formal informations definitions. An example of this is the site you're currently reading, but further efforts should be made such as formal separation of layers, environments and data to appropriately conform to this conceptual requirement.
1. Behind the VPN (openvpn) by default - since this is still a learning and experimental environment, I don't want to think about security first, every step of the way. This is why the master La1r server hosts a VPN virtual network. All services and internal dns are using that entrypoint by default. This does not mean that nothing is exposed to the outside world, but only the services explicitly exposed through the online-traefik instance are
1. Behind the VPN (openvpn) by default - since this is still a learning and experimental environment, I don't want to think about security first, every step of the way. This is why the master La1r server hosts a VPN virtual network. All services and internal dns are using that entrypoint by default. This does not mean that nothing is exposed to the outside world, but only the services explicitly exposed through the online-traefik instance are

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

@@ -1,2 +1,13 @@
# Ansible to prepare the playground
Since there's always a need for installing packages on the nodes directly and I don't want to just use a bunch of shell scripts all configuration and applications outside of k8s is deployed with Ansible which are directed by makefiles. Makefiles because I don't want to remember all commands that I need to spin up ansible by heart,Ansible because I want to semi-formalize the steps I take.
Since there's always a need for installing packages on the nodes directly and I don't want to just use a bunch of shell scripts all configuration and applications outside of k8s is deployed with Ansible which are directed by makefiles. Makefiles because I don't want to remember all commands that I need to spin up ansible by heart,Ansible because I want to semi-formalize the steps I take.

## Everything on bare-metal
Since I cannot put every step I take in Kubernetes, an example of this is how to setup Kubernetes itself, there is an need for a system such as Ansible.
The goal here is to document every step, it does not matter how small, into an Ansible Playbook script.
These Ansible scripts can be found on https://github.com/basraven/la1r/ansible

## Makefiles as operators
Since I want to formalize everything into scripts, there needs to be a way to formalize how to call the different playbook with the appropriate arguments.
This is why the Git repository contains 2 makefiles. There has been chosen for makefiles because the way these files are called is extremely predictable ```make <your command>```:
* [Makefile for Ansible](/) - This makefile contains all the Ansible Playbook calls which are made to construct la1r on bare metal
* [Makefile for Kubernetes](/) - This makefile contains all the used Kubernetes calls to setup the Kubernetes nodes. This also contains node setup scripts suchs as applying taints.
@@ -0,0 +1,25 @@
---
title: Data Processing
type: docs
bookToc: false
weight: 4
---
# Data Processing
There are several setups used in the area of Data Processing.
They all focus on contribution enhanced "intelligent" decision making in the La1r setup.

{{< columns >}}
## (Streaming) Transformations
Since there can be a large difference between how data is received, for example for commercial of the shelf (COTS) applications, and how it should be structured conforming to the described event standard, there is a need for (streaming) Transformations.

[Read more](/docs/technical-setup/data-processing/streaming-transformations)

<--->

## (Streaming) Analytics
To enhance La1r with intelligent analytics and decision making, streaming analytics is applied to facilitate these needs.

[Read more](/docs/technical-setup/data-processing/streaming-analytics)


{{< /columns >}}
@@ -0,0 +1,8 @@
---
title: (Streaming) Analytics
---
# Streaming analyitics as default
Since La1r is applying a Kappa architecture (see [conceptual setup](./conceptual-setup) for more details on this), it is essential that as many of it's processes occur in a streaming fasion.
This also includes all the performed analytics.
Streaming analytics brings new considerations, such as messaging ordering and quality of prefix data.
Since these concepts are handled out-of-the-box (OOTB) in Spark 2.x, Spark 2.x is considered as the default method of applying streaming analytics.

0 comments on commit c6ecac3

Please sign in to comment.
You can’t perform that action at this time.