# AWS Kinesis

<p align=center><img src="images/Kinesis.png" width="200" height="200"/></p>

## Introduction

>IMPORTANT: AWS Kinesis though relatively cheap is a paid service, you will get charged if you create a stream on your own account. The pricing for the streams can be found at the following [link](https://aws.amazon.com/kinesis/data-streams/pricing/).

AWS Kinesis can collect streaming data such as event logs, social media feeds, application data, and IoT sensor data in real time or near real-time. Kinesis enables you to process and analyze this data as soon as it arrives, allowing you to respond instantly and gain timely analytics insights.

## Kinesis Services

Amazon Kinesis has four main services: 
- **Kinesis Video Streams**: a service used for stream processing of binary-encoded data, such as audio and video.
- **Kinesis Data Streams**: a serverless streaming data service that makes it easy to capture, process, and store data streams.
- **Kinesis Data Firehose**: an extract, transform, and load (ETL) service that captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.
- **Kinesis Data Analytics**: a service that enables you to use SQL code to continuously read, process, and store data in near real time.

## 1. Kinesis Video Streams

**Kinesis Video Streams** was designed to stream binary-encoded data into AWS from millions of sources.  Traditionally this is audio and video data but it can be any type of binary-encoded time-series data.  Data can be ingested from devices such as smartphones, security cameras, RADAR, drones, satellites, and dash cams.

<p align="center">
    <img src="images/Kinesis Video Streams.png" width="450" height="250"/>
</p>

## 2. Kinesis Data Streams

**Kinesis Data Streams** is a highly customizable AWS streaming solution. Highly customizable means that all parts involved with stream processing, such as data ingestion, monitoring, elasticity, and consumption are done programmatically when creating the stream. An important consideration is that Kinesis Data Streams does not have the ability to do auto scaling. 

<p align="center">
    <img src="images/Kinesis Data Streams.png" width="500" height="200"/>
</p>

>A Kinesis Data Stream is a set of Shards.  

A shard contains:
- a sequence of *Data Records*, which in turn are composed of:
  - a *Sequence Number* 
  - a *Partition Key*
  - a *Data Blob* (which is an immutable sequence of bytes)

### Shards

A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity. The data capacity of your stream is a function of the number of shards that you specify for the stream. You can increase or decrease the number of shards allocated to your stream to keep up with your data demands.

### Sequence Number

Each data record has a sequence number that is unique per partition-key within its shard. Kinesis Data Streams assigns the sequence number after you write data to the stream. The longer the time period between write requests, the larger the sequence numbers become.

### Partition Key

A partition key is used to group data by shard within a stream.

## Kinesis Data Streams high-level architecture

Streaming data frameworks are described as having five layers: 
- *the source* 
- *stream ingestion* 
- *stream storage* 
- *stream processing*
- *the destination*

Let's walk through an example framework using Kinesis Data Streams. Data can be generated by one or more sources such logs, mobile devices, click streams, or meters in smart homes. At the ingestion layer data is collected by one or more producers, formatted into data records and put into a stream.  

The Kinesis Data Stream will constitute the storage layer, that can store data for 24 hours (default) to 365 days. Inside the storage layer, the data records  are immutable (they cannot be modified once stored). Any updates to data require a new record. Additionally, data cannot removed from the stream, it can only expire.  

The processing layer is managed by consumers, which are responsible for sending the data records to the destination layer (e.g data lake, data warehouse, etc).

## 3. Kinesis Data Firehose

**Kinesis Data Firehose** is a data streaming service like Kinesis Data Streams. While Kinesis Data Streams is highly-customizable, Data Firehose is a fully-managed streaming delivery service.  

Kinesis Data Firehose is considered a near real-time streaming solution, as it uses producers to load data into streams in batches. Once inside the stream, the data is delivered to a data store. Ingested data can be dynamically transformed & scaled , and is automatically delivered to a data store (thus, there is no need to develop consumer applications). 

Another difference between Kinesis Data Streams and Kinesis Data Firehose is that Kinesis Data Firehose will automatically scale as needed.  

<p align="center">
    <img src="images/Kinesis Data Firehose.png" width="500" height="200"/>
</p>

## 4. Kinesis Data Analytics

**Kinesis Data Analytics** can read from streams in real time and do aggregations and analysis on data while it is in 'motion'. It does this by leveraging SQL queries or with Apache Flink using Java/Scala to perform time-series analytics, feed real-time dashboards, and create real-time metrics. Kinesis Data Analytics has built-in templates and operators for common processing functions to organize, transform, aggregate, and analyze data.

Use cases include ETL, the generation of continuous metrics, and doing responsive real-time analytics.

<p align="center">
    <img src="images/Kinesis Data Analytics.png" width="450" height="150"/>
</p>

## How to create data streams using Kinesis Data Streams

Navigate to the Kinesis console, and select the **Data Streams** section. Choose the **Create stream** button.

<p align="center">
    <img src="images/Create Stream.png" width="600" height="200"/>
</p>

Choose the desired name for your stream and input this in the **Data stream name** field. For our use case we will use the **Provisioned** capacity mode.

<p align="center">
    <img src="images/Data Stream Config.png" width="500" height="400"/>
</p>

Once you have entered the name and chose the capacity mode click on **Create data stream**. When your stream is finished creating the **Status** will change from Creating to Active.

## Conclusion
At this point, you should have a good understanding of: 
- What is AWS Kinesis
- The four different Kinesis services and their use cases
- How to create a Kinesis Data Stream