# 1. Networking for IIOT

### Contents:
- [Overview](#Overview)
- [IIOT End Devices](#IIOT-End-Devices)
    - [Introduction to some protocols](#Introduction-to-some-protocols)
    - [Sensors](#Sensors)
- [IIOT Gateways](#IIOT-Gateways)
    - [Services running on Raspberry](#Services-running-on-Raspberry)
    - [Setting up the pi](#Setting-up-the-Pi)
        - [OS](#OS)
        - [Username and password](#Username-and-password)
        - [Mosquitto broker](#Mosquitto-broker)
        - [Setting up Internet Connectivity](#Setting-up-Internet-Connectivity)
- [IIOT Middleware](#IIOT-Middleware)
    - [Setting up Elasticsearch](#Setting-up-Elasticsearch)
    - [Setting up Kibana](#Setting-up-Kibana)
    - [Kibana Aggregations](#Kibana-Aggregations)

## Overview
***
### Functional Layers in IIOT Network implementation
***
> 1. IIOT End Devices - IED
> 2. IIOT Gateways - IGW
> 3. IIOT Middleware - IMW
> 4. IIOT Cloud Server - ICS

#### IIOT END DEVICES(IED)
***
- The IoT end-devices or IEDs contain the sensors (or actuators) and are closest to the end node being monitored. They gather data periodically by reading the sensors attached to them and relay them to the IOT Gateway over a suitable protocol/channel like MODBUS, BLE, or directly over the GPIO pins depending on the type of sensor
- Different type of sensors connect differently to the Gateways, like we can connect:
    - Vibration sensor over Bluetooth(BLE)
    - Energy Meter over MODBUS
    - Proximity Sensor over GPIOS of RpiZ
    - Thermocouple Temperature Sensor over I2C
    
#### IIOT GATEWAYS(IGW)
***
- The gateways are devices which are next up in the hierarchy of the network.
- The gateways provide the following services:
    1. **Reliable Connectivity:** between the IIOT Server and the IIOT End Devices. The IGW may buffer the incoming data from various IEDs incase the upstream connection to the IMW is broken and try to push the data once the IMW is reachable again.
    2. **Protocol and data bridge:** makes possible the interoperation of IEDs using different protocols and data formats - i.e. a Sensor might be sending reading of LDR analog values using plain text over UART Serial and the IGW can translate the same to a JSON message which might be then sent to the IMW. Similarly an Energy meter might be using a protocol like MODBUS to convey the sensor readings and the IGW will translate the MODBUS formatted datapacket into a JSON message to be sent to the IMW. Similarly JSON commands sent to IEDs containing actuators may be converted to the proper data format and protocol by the IGW before forwarded to the IED.
    3. **Management:** The gateway can automatically detect and add new IEDs added to the network or remove faulty IEDs from the network intelligently. In this way the IEDs can be managed via the IGW.
    4. **Storage and Analysis:** In certain cases, the IGW can itself act as a data store and can also perform lightweight analysis on the incomming data and push only the very relevant information to the IMW and filtering out the rest. Also the IGW can take certain actions like commanding an actuator based on the analysis of the incoming sensor data and hence the closed loop time to take critical actions can be reduced.
 
#### IIOT MIDDLEWARE(IMW)
***
- Next up in the network is the IIOT Middleware which provides the following services in general (which may vary slightly between implementations:
    - **Persistent data storage**
    - **Analysis on the data**
    - **Automated provisioning** (discovery and removal) of IIOT gateways
    
####  IIOT Cloud Server (ICS)
- The final layer in the network is the cloud server.
- This server has a public IPv4 and/or IPv6 address and is reachable from anywhere on the internet.
- The cloud server can be used for -
    - permanent data store
    - online/offline analysis
    - realtime visualisation of the raw/processed data
    - VPN access to the Middleware and the local network in the remote factory floor

#### OVERVIEW DIAGRAM
***
![IOTGW-OVERVIEW.jpg](attachment:IOTGW-OVERVIEW.jpg)

***

## IIOT End Devices
***
![IOTGW-OVERVIEW-HIGHLIGHT-IED.png](attachment:IOTGW-OVERVIEW-HIGHLIGHT-IED.png)

- The end devices are those devices that take data from the machines directly or indirectly and send that data via a network to the observer, to make evaluations and optimizations based on that data.
- The raw data can be converted to a desired format and we can run any desired analytics on that data.
- The main objective is that using this data we should be able to increase the efficiency of machines, by reducing the energy consumed and reducing the idle time of machines if possible and also we could make a digital twin for the owner to see in real time.
- The sensors are the end devices in our implementations
- An Arduino board can also be used as an end device that will interact with the sensors and the IGW
- As already mentioned in the overview we can use different methods for interacting the sensors with the IGW like:
    - Vibration sensor over BLE
    - Energy Meter over MODBUS
    - Proximity sensor over GPIOs of RPiZ
    - Thermocouple Temperature Sensor over I2C

### Introduction to some protocols
***
**Bluetooth Low Energy(BLE)**
- Not actually a protocol
- is a wireless personal area network technology
- Compared to Classic Bluetooth, Bluetooth Low Energy is intended to provide considerably reduced power consumption and cost while maintaining a similar communication range. 
- Bluetooth Low Energy uses the same 2.4 GHz radio frequencies as classic Bluetooth
- It offers very low power consumption and hence battery life can be very long. 
- It can not be used for higher data rates as offered by wifi and cellular technologies.
- For more details [click here](https://en.wikipedia.org/wiki/Bluetooth_Low_Energy)

**MODBUS**
- Modbus is a communication protocol developed by Modicon systems. In simple terms, it is a method used for transmitting information over serial lines between electronic devices. The device requesting the information is called the Modbus Master and the devices supplying information are Modbus Slaves
- For more details [click here](https://en.wikipedia.org/wiki/Modbus)

**I2C**
- I2C is a serial communication protocol, so data is transferred bit by bit along a single wire (the SDA line). Like SPI, I2C is synchronous, so the output of bits is synchronized to the sampling of bits by a clock signal shared between the master and the slave. The clock signal is always controlled by the master.
- For more details [click here](http://www.circuitbasics.com/basics-of-the-i2c-communication-protocol/)

### Sensors
***

**Vibration Sensor**

- As mentioned above we usually connect the vibration sensor to our IGW ie RPiZ using BLE
- The vibration sensor is also called a piezoelectric sensor. These sensors are flexible devices which are used for measuring various processes. This sensor uses the piezoelectric effects while measuring the changes within acceleration, pressure, temperature, force otherwise strain by changing to an electrical charge. 
![Vibration%20Sensor.jpg](attachment:Vibration%20Sensor.jpg)

**Proximity Sensor**

- A proximity sensor is a sensor able to detect the presence of nearby objects without any physical contact. A proximity sensor often emits an electromagnetic field or a beam of electromagnetic radiation (infrared, for instance), and looks for changes in the field or return signal.
![Proximity%20sensor.jpg](attachment:Proximity%20sensor.jpg)

**Energy Meter**

- Energy Meter Definition: The meter which is used for measuring the energy utilises by the electric load is known as the energy meter. The energy is the total power consumed and utilised by the load at a particular interval of time. It is used in domestic and industrial AC circuit for measuring the power consumption.
![Energy%20Meter1.png](attachment:Energy%20Meter1.png)

**Temperature Sensor**

- Temperature sensor is a device, to measure the temperature through an electrical signal it requires a thermocouple or RTD (Resistance Temperature Detectors). The thermocouple is prepared by two dissimilar metals which generate the electrical voltage indirectly proportional to change the temperature. The RTD is a VARIABLE RESISTENCE , it will change the electrical resistance indirectly proportional to changes in the temperature in a precise, and nearly linear manner.
![Temp%20Sensor.jpg](attachment:Temp%20Sensor.jpg)

***

## IIOT Gateways
***
![IOTGW-OVERVIEW-HIGHLIGHT-GW.png](attachment:IOTGW-OVERVIEW-HIGHLIGHT-GW.png)
***
- The IIOT gateway in our implementation is a Raspberry Pi-zero-W which is a small Single Board Computer (SBC).
- The RPiZw runs a Debian OS(Raspbian) which is similar to the Ubuntu/Linux OS we use in our desktop.
- Many softwares are run on the RPiZw which enable the RPiZw to provide the necessary services needed to be present on a IIOT Gateway.
- To read more about the pi [click here](https://www.raspberrypi.org/documentation/)

### Services running on Raspberry
***
1. IIOT End Device Driver (IED Driver)
2. Mosquitto MQTT Broker
3. MQTT to IIOT Middleware Publish Service.

**The diagram below shows the microservices running on the Raspberry Pi**
***
![IOTGW-DETAILS.jpg](attachment:IOTGW-DETAILS.jpg)

### Setting up the Pi
***
### OS
***
- We need to install the Raspbian OS or NOOBS on the pi which is very similar to the Linux distributions
- For our purposes we will use the Raspbian OS
- You can download it from [here](https://www.ra+spberrypi.org/downloads/raspbian/)
***
![Raspbian.png](attachment:Raspbian.png)
***
- After the ISO image has been downloaded on our Pc we need to burn the image on a memory card(16GB) to be used in the Pi
- You can use [Win32 disk imager](https://sourceforge.net/projects/win32diskimager/) to burn the image
- We can also include the drivers on the image before burning it to an SD card, so that our end devices can interact with our Pi
***
### Username and password
***
- By default the username of the pi is 'pi' and the password is 'raspberry'.
- So the first thing we might want to do is to change these defaults for security reasons.
- After connecting our pi to a monitor and other peripherals we can easily change the username and password and also create additional users if necessary.

-  In the command line type raspi-config and hit enter we will see a menu pop up!


![raspi-config-password.png](attachment:raspi-config-password.png)

- Simply select the change password option and type in the new password 
Or we can also simply type "passwd" to change the password of the current user

- In order to change the username 'pi' we will have to log in a the root user since it's not possible to rename an account while your logged into it. To log in as root user first we have to enable it, to do so type the following command whilst logged in as the default pi user:

> sudo passwd root

- Choose a secure password for the root user. You can disable the root account later if you wish.

- Now logout of the user pi using the command:

>logout

- And then logout back in as the user 'root' using the password you just created. Now we can rename the the default pi user name. The following method renames the user 'pi' to 'newname', replace this with whatever you want. Type the command:

> usermod -l newname pi

- Now the user name has been changed the user's home directory name should also be changed to 
reflect the new login name:

> usermod -m -d /home/newname newname

- Now logout and login back in as newname. You can change the default password from raspberry to something more secure by typing following command and entering a new password when prompted:

> passwd

- If you wish you can disable the root user account again but first double check newname still has 'sudo' privileges. Check the following update command works:

> sudo apt-get update

- If it works then you can disable the root account by locking the password:

> sudo passwd -l root
***
### Mosquitto broker
***
![IOTGW-DETAILS-HIGHLIGHT-MQTTBROKER.png](attachment:IOTGW-DETAILS-HIGHLIGHT-MQTTBROKER.png)
***
- To download and setup the mosquitto mqtt broker visit the following link: [Mosquitto Broker](https://mosquitto.org/download/)
- Follow the instructions specific to your OS
- After installation to start the mosquitto service type: <font color = red>sudo service mosquitto start</font> for linux
- For illustration you may use the terminal to subscribe to a topic and see the sensor data being published to that topic(python/sin6) in this case
- We use python code to publish the sensor data to the MQTT topic
- Another Python code subscribes to the topic and then pushes the data to the Database
- A **sample python code** to illustrate the publishing of data to an MQTT topic is as follows:

In [None]:
import paho.mqtt.client as mqttClient
import datetime
import time
import numpy as np
import matplotlib.pyplot as plot

def on_connect(client, userdata, flags, rc):
 
    if rc == 0:
 
        print("Connected to broker")
 
        global Connected                #Use global variable
        Connected = True                #Signal connection 
 
    else:
 
        print("Connection failed")
    
Connected = False

client = mqttClient.Client()                       #create new instance
# client.username_pw_set(user, password=password)  #set username and password
client.on_connect= on_connect                      #attach function to callback
client.connect("localhost", port=1883)             #connect to broker
client.loop_start()                                #start the loop


# publishing a sine wave to an MQ
# Get x values of the sine wave
time1= np.arange(0, 10, 0.1); 
# Amplitude of the sine wave is sine of a variable like time
amplitude=np.sin(time1)

# while Connected != True:    #Wait for connection
#     time.sleep(0.1)

try:
        
    for i in range(100):
        value = amplitude[i]
        client.publish("python/sin6",value)
        time.sleep(1)

except KeyboardInterrupt:
    print('stop it')
client.disconnect()
client.loop_stop()

- You must now be able to see the values being published in the terminal.
- Also in the next section we will use a python code to subscribe to this topic and then push the data to the Database 
***
### Setting up Internet Connectivity
***
- This is only for a wireless connection
- Alternatively we can simply connect an ehternet cable
- Navigate to wpa_supplicant:
> sudo nano /etc/wpa_supplicant/wpa_supplicant config
- Then at the bottom of the file we can add our wireless network 
> network={ <br>
ssid = "testing" <br>
psk = "testing passwd" <br>
}
***
### Providing a Static IP to the RPi
***
There are two ways of achieving this goal:
1. Assigning a static IP to Rpi using a router
2. Assigning a static IP to RPi with DHCPCD
**Here we will only describe the later:**
***
Raspbian Jessie, or Jessie Lite – the current Raspbian operating systems at the moment – have a DHCP client daemon (DHCPCD) that can communicate with the DHCP servers from routers. The configuration file of a DHCP client daemon allows you to change the private IP address of a computer and set it up in the long term. The following instructions will assign a static IPv4 address with 32 bits (not to be confused with an IPv6 address, which has 128 bits available) to the Raspberry Pi.

Before you begin with the assignment of a private IP address for Raspberry Pi, check whether DHCPCD is already activated using the following command:
> sudo service dhcpcd status

In case it’s not, activate DHCPCD as follows:
>sudo service dhcpcd start<br>
sudo systemctl enable dhcpcd

Now make sure that the configuration of the file /etc/network/interfaceshas the original status. For this, the ‘iface’ configuration needs to be set at ‘manual’ for the interfaces.

For the editing of the activated DHCPCDs, start by opening the configuration file /etc/dhcpcd.confand running the following command:
> sudo nano /etc/dhcpcd.conf

To assign an IP address to Raspberry Pi, use the command ‘static ip_address=’ followed by the desired IPv4 address and the suffix ‘/24’ (an abbreviation of the subnet mak 255.255.255.0). For example, if you want to link a computer with the IPv4 address 192.168.0.4, then you need to use the command ‘static ip_address=192.168.0.4/24’. It goes without saying that the address used here is not yet used anywhere else. As such, it also can’t be located in the address pool of a DHCP server.

You still then need to specify the address of your gateway and domain name server (usually both are the router). Raspberry Pi turns to the gateway address if an IP address to which it wants to send something is outside of the subnet mask (in the example, this would mean outside of the range 192.168.0). In the following command, the IPv4 address 192.168.0.1 is used as an example as both the gateway and DNS server. The complete command looks like this in our example (where a network cable is used for the internet connection):
> interface eth0<br>
static ip_address=192.168.0.4/24<br>
static routers=192.168.0.1<br>
static domain_name_servers=192.168.0.1<br>

The command lines above match the IPv4 addresses that you want to use for your Raspberry Pi, or where your router is assigned. Save the changes with ‘Ctrl + O’ and then press the enter key. Close the configuration file with ‘Ctrl + X’. Restart to adopt the newly assigned static IP address in the network:
> sudo reboot

Now use a ping command to check whether the Raspberry Pi is accessible in the network with its new IP address:
> Ping raspberrypi.local

If the connection of the IP address was successful, you’ll see that you can reach it under the new IP address with a ping.

## IIOT Middleware
***
![IOTGW-OVERVIEW-HIGHLIGHT-MW.png](attachment:IOTGW-OVERVIEW-HIGHLIGHT-MW.png)
***
- **Let's setup the components (microservices) of the Middleware**
1. Elasticsearch Database and Elasticsearch DB Client Driver using HTTP REST APIs.
2. Kibana - to browse the data in the database
3. MQTT Client to receive data from MQTT Queue and pass to Elasticsearch REST Client.

### Setting up Elasticsearch
***
- Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of open source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection of lightweight shipping agents known as Beats for sending data to Elasticsearch.
- To install elasticsearch on your Pc visit the following [link](https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-install.html)
- How does Elasticsearch work?
> Raw data flows into Elasticsearch from a variety of sources, including logs, system metrics, and web applications. Data ingestion is the process by which this raw data is parsed, normalized, and enriched before it is indexed in Elasticsearch. Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data. From Kibana, users can create powerful visualizations of their data, share dashboards, and manage the Elastic Stack.
- What is an Elasticsearch index?
> An Elasticsearch index is a collection of documents that are related to each other. Elasticsearch stores data as JSON documents. Each document correlates a set of keys (names of fields or properties) with their corresponding values (strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data).
Elasticsearch uses a data structure called an inverted index, which is designed to allow very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.
During the indexing process, Elasticsearch stores documents and builds an inverted index to make the document data searchable in near real-time. Indexing is initiated with the index API, through which you can add or update a JSON document in a specific index.

- After the installation is complete we need to start the elasticsearch service using the terminal
- Type <font color = red>sudo /bin/systemctl enable elasticsearch.service </font> to enable elastic search
- Type <font color = red>sudo systemctl start elasticsearch.service</font> to start elastic search service
- Type <font color = red>sudo systemctl stop elasticsearch.service</font> to stop elastic search service

**Elasticsearch API**
Now that we have installed Elasticsearch, let's understand how we can store data in the DB using the REST APIs provided by Elasticsearch.

Please visit the link below to read more about REST and HTTP [click here](https://code.tutsplus.com/tutorials/a-beginners-guide-to-http-and-rest--net-16340)

1. REST APIs use HTTP to query,insert, update and delete data from the DB.
2. HTTP is the protocol that makes the World Wide Web possible.
3. Everytime we visit a page in a Web Browser, the browser makes a HTTP GET request to fetch the page.
4. Also, everytime we submit a form on a webpage, the form data is submitted using a POST request.
The common HTTP verbs are -

1. GET - Query data form DB
2. POST - Insert data into DB
In python, we can use the simple and easy to use Requests library to make HTTP requests.

Install the library by running the cell below.

In [2]:
!pip install requests



- Now let us try to make some HTTP get requests using python by running the code below

In [3]:
import requests
response=requests.get("http://httpbin.org/get", )
print(response.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e3a50bd-85650acdee47cc4dd73ff527"
  }, 
  "origin": "14.139.128.32", 
  "url": "http://httpbin.org/get"
}



- Let's make another request this time with the get parameters <br>
Run the cell below

In [4]:
response=requests.get("http://httpbin.org/get", params={'user':1})
print(response.text)

{
  "args": {
    "user": "1"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e3a50bd-3002e298d2fdc12836bb3348"
  }, 
  "origin": "14.139.128.32", 
  "url": "http://httpbin.org/get?user=1"
}



**Let's now try HTTP POST with Python requests**
1. HTTP POST requets can be used to send data over HTTP requests.
2. The data sent over a POST request can be in JSON format
3. Read more about JSON [here](https://en.wikipedia.org/wiki/JSON#Example)

In [5]:
response=requests.post("http://httpbin.org/post",json={"user":1})
print(response.text)

{
  "args": {}, 
  "data": "{\"user\": 1}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "11", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e3a50be-e42c86a64378615576589ad3"
  }, 
  "json": {
    "user": 1
  }, 
  "origin": "14.139.128.32", 
  "url": "http://httpbin.org/post"
}



**Elasticsearch API (continued)**
***
![IOTGW-DETAILS-HIGHLIGHT-MWHTTPCLIENT.png](attachment:IOTGW-DETAILS-HIGHLIGHT-MWHTTPCLIENT.png)
***
Now that we have figured out how to make GET and POST requests using python, we will try to insert some data into the Elasticsearch DB. In the process we will understand about the HTTP REST DB Client which is highlighted in the diagram above.

Elasticsearch stores data in indices which are similar to a how data is stored in Tables in a Relational DB like MySQL.

Let's call the index as sin6 as we are storing data from a sine wave. We can name it anything sensible. Another sensible name could be SensorName where SensorName is the name of the sensor from which data is inserted into the DB - e.g LightSensor, MoistureSensor, TemperatureSensor, etc.

To insert data into an index called INDEXNAME, we post the data to the URL - http://localhost:9200/INDEXNAME/_doc

Run the following cell to insert a JSON document containing a sine wave into the Elasticsearch DB.

In [None]:
import requests
import paho.mqtt.client as mqtt
import datetime
import time
import numpy as np
import matplotlib.pyplot as plot

def on_connect(client, userdata, flags, rc):
    print("Connected")
    #Subscribing to topic
    client.subscribe("python/sin6/#")
    
def on_message(client, userdata, msg):
    print(str(msg.payload))
    hi = float(msg.payload)
    document={"raw_data": hi,"timestamp":datetime.datetime.now().isoformat()}
    ElasticsearchBaseURL="http://localhost:9200/"
    index="final6"
    url= ElasticsearchBaseURL + index + "/_doc"
    print("Going to try http POST")
    print(document)
    response=requests.post(url,json=document)
    
client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message
client.connect("localhost", port=1883)          #connect to broker
# client.username_pw_set("localhost", password=password)    #set username and password

client.loop_forever()

Connected


- **Now let us review the document stored in the elasticsearch DB**

In [None]:
import requests
import paho.mqtt.client as mqttClient
import datetime
import time

index = 'final6'
ElasticsearchBaseURL='http://localhost:9200/'
response=requests.get(ElasticsearchBaseURL +index + "/_search")
print(response.json())

- **Summary**
***
1. Elasticsearch DB can be used to store and retireve JSON documents (in an INDEX) created from sensor data
2. REST APIs like POST and GET are used to insert and query data from Elasticsearch DB into/from an INDEX
3. HTTP DELETE method can be used to delete an INDEX
4. To insert data we POST to the ElasticsearchBaseURL (default: http://localhost:9200) with INDEX name appended and _doc appended to the path i.e http://localhost:9200/INDEXNAME/_doc
5. To query the data stored in an INDEX, we use the URL http://localhost:9200/INDEXNAME/_search
6. To delete and INDEX, we make an HTTP DELETE request to the URL http://localhost:9200/INDEXNAME
***

### Setting up Kibana
***
**What is Kibana used for?**
> Kibana is a data visualization and management tool for Elasticsearch that provides real-time histograms, line graphs, pie charts, and maps. Kibana also includes advanced applications such as Canvas, which allows users to create custom dynamic infographics based on their data, and Elastic Maps for visualizing geospatial data.
Visit the link to install Kibana on your system https://www.elastic.co/guide/en/kibana/current/targz.html

- *Running Kibana*
***
> Launch ./elasticsearch from elasticsearch-x.y.z/bin/ Launch ./kibana from kibana-x.y.z/bin

- *Test Kibana*
***
> Visit Kibana @ http://localhost:5601

- To start the kibana service type <font color = red>sudo systemctl start kibana.service</font>

**Add index pattern in Kibana management**
***
- Visit the Management page by clicking on the Management tab in the Left Sidebar.
- Add a new index pattern called final6

**Discover in Kibana**
- Click on the Discover tab on the left sidebar.
- Select the index pattern we just created - final6
- Select the time range in the top right bar to last one hour
- Now we should be able to see the data we just stored in Elasticsearch DB.
***
### Kibana Aggregations

***
- One of the key topics for understanding and using kibana are the aggregations.
- They provide the base of all visualisations
- The aggregations are primarily of 2 types:
> 1. Bucket aggregations
> 2. Metric aggregations

**Bucket aggregations:**
- Each bucket aggregation may contain many or single or none documents/data points
- Buckets may overlap hence one doc/data point may be present in 2 or more buckets
- After Bucket aggregation has been done some docs/data points might not be present in any of them
***
![Bucket%20aggregations.png](attachment:Bucket%20aggregations.png)
***
**Metric aggregation**
- These aggregations are responsible for calculating a value for each bucket (based on the docs in each bucket)
***
![Metric%20aggregations.png](attachment:Metric%20aggregations.png)
***
- Now if we wish to visualize the aggregated data as a pie chart say then:
***
![Pie%20chart.png](attachment:Pie%20chart.png)
***
- Each bucket represents one slice of the pie
- The size of each slice will be determined by the value of the metric aggregation

***
**The different types of Bucket and Metric aggregations are:**
***
**Bucket aggregations**
> Histogram <br>
Date Histogram <br>
Range<br>
Date Range<br>
IPV4 Range<br>
Filters<br>
Geohash<br>
Terms<br>
Significant terms<br>
***
**Metric aggregations**
> Count<br>
Sum <br>
Average<br>
Median<br>
Min<br>
Max<br>
Unique count<br>
Standard Deviation<br>
Percentiles<br>
Percentile ranks<br>

# 2.Data analysis and Signal Processing

### Contents:
- [Overview](#Overview)
- [A glance at raw data](#A-glance-at-raw-data)
- [Parameter Estimation and Machine State Identification Algorithms](#Parameter-Estimation-and-Machine-State-Identification-Algorithms)
    - [1: Line Loader](#1:-Line-Loader)
    - [2: Screen printer](#2:-Screen-printer)


### Overview
***
To get a better idea of the data analytics that we have to run on the data from the sensors, we need to have a complete architecture of a factory. Only by knowing how the various machines are interacting with each other, it will be possible for us to develop algorithms for say state detections etc.
Here we will be using the example of vinyas factory whose architecture is as shown below:
***
![Layout.png](attachment:Layout.png)
***
The specific details are as follows:<br>
1. Entry loader: vibration sensor.<br>
2. Screen printer: proximity sensor (at exit), vibration sensor, current meter (single phase).<br>
3. Pick-place machine #1: proximity sensor (at entry), vibration sensor, current meter (3 phase).<br>
4. Pick-place machine #2: proximity sensor (at exit), vibration sensor, current meter (3 phase).<br>
5. Reflow oven: proximity sensor (at entry and exit), vibration sensor, current meter (3 phase).<br>
6. The first baking oven: temperature sensor, current meter (3 phase).<br>
7. The second baking oven: temperature sensor, current meter (3 phase).<br>
The ground floor of the factory was installed with current meters (6), vibration sensors (5), proximity sensors (5), and temperature sensors (2) as given above and at the locations as in Figure

**Delays in machines**
***
![image.png](attachment:image.png)
***
### A glance at raw data
***
The following figure indicates data obtained from a pick-and-place machine. It clearly indicates the relationship between three modes of sensing – vibration, current, and proximity sensors.
***
![Data%20Analysis.png](attachment:Data%20Analysis.png)
***
In Figure above, the first provides the vibration sensor data (one component of the acceleration) on the pick-and-place machine 1 in the Vinyas factory. Note that there are two such machines. The second plot indicates the three currents (3 phase) drawn by this machine. The third plot is the proximity sensor data. The sensor is placed at the entry of the machine. When value is equal to 1, the board has been admitted into the machine.
The above plot can be used to correlate different events. The easiest one to understand is when the board is admitted into the pick and place equipment – the proximity sensor value rises high. We can observe the surge in current drawn by the equipment and the considerable increase in vibration data when the machine starts operating on the PCB.
Above is one such example of the raw data that is retrieved from the various sensors implemented on the production floor.
With our current architecture, we have used various data analysis tools and techniques to infer the different events that take place during the production of PCBs across each machine on the assembly line and associate the events across the line to provide end-to-end traceability. As shown in the figure below, the correlation of data across machines enables boards to be tracked as they move along the assembly line.
***
![Data%20alalysis2.png](attachment:Data%20alalysis2.png)
***
Subsequently, based on those events, the various states of the machine can be identified. For example, we want to know when a board is being processed by a particular machine compared to when it is idle. For this, we have used python packages such as pandas, scipy (find peaks), numpy. From our current analysis algorithms, we have been able to infer the following parameters for each machine:
1. Machine ID: Line-Loader, Screen-Printer, Pick-and-Place1, Pick-and-Place2
2. Machine States: Idle, Loading, Printing, Processing, Maintenance, etc.
3. Number of PCBs processed during the given period
4. Arrival and departure instants of PCBs
5. Delays in machines due to maintenance
6. Machine Utilization Factor
7. PCB processing time
***
### Parameter Estimation and Machine State Identification Algorithms
***
Below are the algorithms explained in detail, for each machine, on how we achieved the above parameters:
***
### 1: Line Loader
***
![Use%20this.png](attachment:Use%20this.png)
***
**Line loader board detections**
***
![Line%20loader%20boards%20detected.png](attachment:Line%20loader%20boards%20detected.png)
***
For the loader, vibration data was used. The parameters that were calculated for the loader were the number of boards (throughput) and the loading (processing) time per board. As the loader and the screen printer were powered by the same line in series, energy estimations were not performed for this machine. The magnitude of the acceleration in the X and Z axes was used. The Y component of acceleration was not used to prevent acceleration due to gravity from overshadowing the vibration in the other axes.
From the raw vibration signals, the resultant along the X and Z axes was calculated. A centered rolling standard deviation with a window size of 100 samples (approx 1.3 seconds) was then applied to the signal. This window was selected as from the observation of raw data, the loading event was seen to last for approximately 2 seconds so the impulses of these events would be further enhanced. This processed signal can be seen in the figure above and is shown in orange. The green line in Figure (), shows the height threshold of 0.01 that is applied to detect peaks and those peaks were identified as loading events. The loading events are shown in the figure as blue stems. A time threshold constraint of 12 seconds was also applied to ensure that any spurious vibrations which exceed the height threshold immediately after any board detection were not counted as boards (false positives were removed). This time threshold of 12 seconds is identified on the basis that it takes 12 seconds for the screen printer to complete the processing of one board and hence the loader cannot push a board into the screen printer until that board is processed completely.
The working times or the time taken for one loading event was calculated using the width of the peak of the processed signal measured at the threshold.
*The algo developed for this is shown below:*

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.signal
import numpy as np
import seaborn as sns
import sys

## changing the figure size
plt.rcParams["figure.figsize"]=[15,5]

raw_ld=pd.read_csv('loader.csv', usecols=['timestamp', 'data.ax', 'data.az'])
raw_ld.sort_values(by=['timestamp'], inplace=True)
raw_ld.reset_index(inplace=True)
raw_ld.dropna(axis=0, inplace=True)
raw_ld.timestamp=pd.to_datetime(raw_ld.timestamp)

#Load LD data and plot
raw_ld["acc"] = ( raw_ld["data.ax"]**2 + raw_ld["data.az"]**2 ) ** 0.5
acc=raw_ld[["timestamp", "acc"]]
acc.set_index('timestamp')
test_ld=acc
test_ld['acc_raw']=test_ld['acc']
test_ld['acc']=pd.Series.to_frame(test_ld.acc.rolling(100, center=True).std())

timethresh=12
test_ld.insert(2,'state',0)

%matplotlib inline

plt.plot(test_ld['timestamp'], test_ld['acc'])

#Get boards from Loader
board=scipy.signal.find_peaks(test_ld.acc, height=(0.01), distance=timethresh*72, width=1)
differ=np.diff(board[0])
differ=differ.tolist()

for x in board[0]:
    test_ld.at[x, 'state'] = 1

ld_boards=test_ld[test_ld['state']==1]
## creating another data frame which only has value of states as 1
print('Boards detected in LD:', ld_boards.shape[0])
# counting no. of rows via 'shape' of the new data frame to calculate the no. of boards

fig, ax1 = plt.subplots()


# ax1.plot(test_ld['timestamp'], test_ld['acc_raw'], 'green')
ax1.plot(test_ld['timestamp'], np.ones(test_ld.shape[0])*0.01, 'green')
ax1.plot(test_ld['timestamp'], test_ld['acc'], 'orange')

# ax1.plot(test_ld['timestamp'], np.ones(test_ld.shape[0]*0.01), 'green')
# ax1.xaxis.set_major_locator(mdates.DateFormatter('%H:%M'))

ax1.set_xlabel('Time')

ax2 = ax1.twinx()

ax2.stem(ld_boards['timestamp'], ld_boards['state'], 'b:')

plt.title('Board Detections')

loading_delays_rawdf=pd.DataFrame({"sample_number":board[0], "working_time":board[1]['widths']/72})

LD_events=pd.DataFrame({"timestamp":test_ld.iloc[loading_delays_rawdf.sample_number].timestamp, "event":1, "working_time":(loading_delays_rawdf.working_time).tolist()})

LD_events.index=LD_events.timestamp
LD_events.drop('timestamp', axis=1, inplace=True)
LD_events['energy']=float('nan')

LD_events

sns.distplot(LD_events.working_time)
plt.title('Loader Loading Times Histogram')

print('LD loading time mode: ', LD_events.working_time.mode().mean())

### 2: Screen printer
***
![Screen%20Printer%20event%20detection.png](attachment:Screen%20Printer%20event%20detection.png)
***
![SP%20printing%20time.png](attachment:SP%20printing%20time.png)
***
![SP2.png](attachment:SP2.png)
***
The screen printer board detection algorithm uses a single-phase current data. It is found that the screen printer undergoes two events:
1. Printing: When the screen printer applies solder paste on the PCBs and
2. Cleaning: When the screen printer vacuum cleans the stencil

Hence the parameters that were calculated for the screen printer were the number of boards (throughput), printing time, cleaning time, energy consumed during printing, energy consumed during cleaning and number of boards printed per cleaning event.
For printing events, from the raw current data (orange waveform), a centered rolling sum with a window size of 6 samples (6 seconds, since current data is sampled every second) was applied to the signal. This processed signal can be seen in the figure above and is shown in blue. The window size of 6 seconds was selected because on observation from the raw data, it was found that a printing event in most cases took approximately 8 seconds. Taking a larger window close to 8, would increase the error as the rolling sum function starts rising earlier due to the inclusion of more data points. A window of 6 seconds included the right number of samples to get a large enough peak compared to the data points when the screen printer was idle, on which a height threshold between 11 and 15 was applied. A distance threshold of 12 seconds was applied to the find peaks function to ensure that any other peaks in the vicinity of the spikes in current during board processing were not incorrectly identified as boards. As can be seen in figure (), the blue signal includes at least 2 peaks in the processing part, but with the distance threshold and find peaks function, only the largest peak was identified as a printing event.
For cleaning events, a centered rolling sum with a window size of 12 samples (12 seconds) was applied to the signal. This window was selected, as on observation from the raw data, a cleaning event took approximately 12 seconds. In this case, even though we had chosen the window to be equal to the average cleaning time observed, the processed signal in blue was seen to form a trapezoidal structure as can be seen in figure (). In figure (), it was also seen that the raw cleaning peak in orange was almost rectangular at the bottom. Hence due to the shape of the signals, in order to get the width of the peak identified, a height threshold of 60 was chosen strategically, in which it not only was able to detect peaks above the threshold, but also capture the entire width of the raw cleaning signal, hence giving us the working time of the cleaning event. The same distance threshold of 12 seconds was applied since a cleaning event took approximately 12 seconds.
The working times were calculated using the widths of the peaks at the chosen threshold. The threshold was picked strategically, so that it captures the entire printing and cleaning time. As can be seen in the figures, the threshold on the processed signal captures all the spikes in current data, which was then attributed to a printing or cleaning event.
In this case, the energy consumed during the events was calculated through integrating the current measured in that duration and multiplying that value with the constant voltage supply of 230 V. As we were able to segregate the printing events from the cleaning events and also able to identify when the screen printer was in an idle state (as the raw current data level is lower when the screen printer was not working on a board), the energy consumed during each machine state was found. Thereby, the total energy consumed by the machine for any window of time can be also be calculated.
***
*Algo developed is shown below*

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.signal
import numpy as np
import seaborn as sns


plt.rcParams["figure.figsize"]=[15,5]
raw_sp= pd.read_csv('screenprinter.csv', usecols =['timestamp', 'data.A1'])
raw_sp.dropna(axis=0, inplace=True)
raw_sp.sort_values(by=['timestamp'])
raw_sp.reset_index(inplace=True)
raw_sp['timestamp'] =  pd.to_datetime(raw_sp['timestamp'])
test_sp=raw_sp

#Get boards from Screenprinter
test_sp=test_sp[['data.A1','timestamp']]
# events=scipy.signal.find_peaks(test_sp['data.A1'], height=(2.22, 5), width=1)

test_sp['sum']=pd.Series.to_frame(test_sp['data.A1'].rolling(6, center=True).sum())
test_sp['sum_mean']=test_sp['sum'].rolling(12,center=True).mean()
test_sp['sum_forcleaning']=pd.Series.to_frame(test_sp['data.A1'].rolling(12, center=True).sum())

%matplotlib

test_sp['idle']=2

sptimethresh=30

printing_delays_raw=scipy.signal.find_peaks(test_sp['sum'],height=(12.5, 15),distance=sptimethresh, width=1)
printing_delays_raw_df=pd.DataFrame({"sample_number":printing_delays_raw[0], "working_time":printing_delays_raw[1]['widths']})
cleaning_delays_raw=scipy.signal.find_peaks(test_sp['sum_forcleaning'], height=60, distance=sptimethresh, width=1)
cleaning_delays_raw_df=pd.DataFrame({"sample_number":cleaning_delays_raw[0], "working_time":cleaning_delays_raw[1]['widths']})

for index, i in enumerate(printing_delays_raw[0]):
    test_sp.ix[int(printing_delays_raw[1]['left_ips'][index]) : int(printing_delays_raw[1]['right_ips'][index]), 'idle']=0

for index, i in enumerate(cleaning_delays_raw[0]):
    test_sp.ix[int(cleaning_delays_raw[1]['left_ips'][index]) : int(cleaning_delays_raw[1]['right_ips'][index]), 'idle']=0
    
idle_delays_raw=scipy.signal.find_peaks(test_sp['idle'], height=.5, width=1)
idle_delays_raw_df=pd.DataFrame({"sample_number":idle_delays_raw[0], "working_time":idle_delays_raw[1]['widths']})

SP_events=pd.DataFrame({"timestamp":test_sp.iloc[printing_delays_raw_df.sample_number].timestamp, "event":1, "working_time":printing_delays_raw_df.working_time.tolist()})
cleanings=pd.DataFrame({"timestamp":test_sp.iloc[cleaning_delays_raw_df.sample_number].timestamp, "event":2, "working_time":cleaning_delays_raw_df.working_time.tolist()})

SP_events.reset_index(inplace=True)
cleanings.reset_index(inplace=True)

idles=pd.DataFrame({"timestamp":test_sp.iloc[idle_delays_raw_df.sample_number].timestamp, "event":0, "working_time":idle_delays_raw_df.working_time.tolist()})
idles.reset_index(inplace=True)

for x, row in SP_events.iterrows():
    SP_events.ix[x,'energy']=test_sp.ix[int(printing_delays_raw[1]['left_ips'][x]):int(printing_delays_raw[1]['right_ips'][x]), 'data.A1'].sum()*230/3600000
    
for x, row in cleanings.iterrows():
    cleanings.ix[x,'energy']=test_sp.ix[int(cleaning_delays_raw[1]['left_ips'][x]):int(cleaning_delays_raw[1]['right_ips'][x]), 'data.A1'].sum()*230/3600000

for x, row in idles.iterrows():
    idles.ix[x,'energy']=test_sp.ix[int(idle_delays_raw[1]['left_ips'][x]):int(idle_delays_raw[1]['right_ips'][x]), 'data.A1'].sum()*230/3600000
    
SP_events.index=SP_events.timestamp
SP_events.drop('timestamp', axis=1, inplace=True)
SP_events.drop('index', axis=1, inplace=True)

cleanings.index=cleanings.timestamp
cleanings.drop('timestamp', axis=1, inplace=True)
cleanings.drop('index', axis=1, inplace=True)

idles.index=idles.timestamp
idles.drop('timestamp', axis=1, inplace=True)
idles.drop('index', axis=1, inplace=True)


SP_events=SP_events.append(cleanings)
SP_events=SP_events.append(idles)

print('Printing time Mode: ', SP_events.working_time[SP_events['event']==1].mean())
print('Cleaning time Mode: ', SP_events.working_time[SP_events['event']==2].mode().mean())
SP_events[SP_events.event==1].shape[0]

SP1spower=SP_events[SP_events.event==1]['energy']*3600000/SP_events[SP_events.event==1]['working_time']
SP1spower.mean()

SP0spower=SP_events[SP_events.event==0]['energy']*3600000/SP_events[SP_events.event==0]['working_time']
SP0spower.mean()

SP_events[SP_events.event==1].shape[0]
SP_events[SP_events.event==2].shape[0]

plt.plot(test_sp['timestamp'], test_sp['data.A1'], 'orange')
plt.stem(SP_events[SP_events.event==1].index, SP_events[SP_events.event==1].event*5, 'red')
plt.plot(test_sp['timestamp'], test_sp['sum'], 'blue')
#plt.stem(SP_events.index, SP_events.event==2, 'green')

temp_sp=test_sp
temp_sp.dropna(inplace=True)
sns.distplot(temp_sp['sum_mean'].tolist())

pkh_processed=pd.read_csv('sp_pkh_processed2019-07-02 12:51:40.300916.csv')
pkh_processed.timestamp=pd.to_datetime(pkh_processed.timestamp)
plt.stem(pkh_processed[pkh_processed.det_binry_sig==1].timestamp, pkh_processed[pkh_processed.det_binry_sig==1].det_binry_sig)