Real Time Intrusion Detection
=============================

Nemesis Networks has hired you as a consultant to help build out their
real time intrusion detection system. 

They want to use Kafka to capture all login and logout events in a
system and then look for patterns.

For example, if there is a marked uptick in login failures the system
should raise an alert. This way the system can lock out people who are
trying to hack into the system by guessing passwords.

Step 1
------

Using the steps from the lecture:

- Download Kafka

- Install Kafka

- Start Zookeeper and Kafka

Step 2
------

Use `pip` to install `kafka-python` and `avro`.

Step 3
------

In Kafka create a topic called `login-topic`.

    KAFKA_TOPIC = 'login-topic'
    
    OR (?)
    
    ./bin/kafka-topics.sh \
      --zookeeper localhost:2181 \
      --topic login-topic \
      --create \
      --replication-factor 1 \
      --partitions 1

./bin/kafka-topics.sh \
  --zookeeper localhost:2181 \
  --topic amazon-topic \
  --create \
  --replication-factor 1 \
  --partitions 1

Step 4
------

Create an Avro schema for objects that look like this.

```javascript
{"date":"2015-10-01","time":"08:43:14","user":"alice","op":"login","success":"false"}
```

Field      |Value
-----      |-----
`date`     |Date in `yyyy-mm-dd` format
`time`     |Time in `hh:mm:ss` format
`user`     |Some user ID like `alice` or `bob`
`op`       |`login` `logout`
`success`  |`true` `false`


    AVRO_SCHEMA_STRING = '''{
        "namespace": "kafda_lab.avro",
        "type": "record",
        "name": "User",
        "fields": [
            {"name": "date",    "type": "string"},
            {"name": "time",    "type": "string"},
            {"name": "user",    "type": "string"},
            {"name": "op",      "type": "string"},
            {"name": "success", "type": "string"}
        ]
    }
    '''

Step 5
------

Create a producer that uses the `get_login_event()` function defined
below to feed events into the Kafka topic `login-topic`. The producer
should serialize the events using Avro. 

```python
import random
import time

LOGIN_USERS = ['alice','bob','chas','dee','eve']
LOGIN_OPS = ['login','logout']

def get_login_event():
    return {
        'date'    : time.strftime('%F'),
        'time'    : time.strftime('%T'),
        'user'    : random.choice(LOGIN_USERS),
        'op'      : random.choice(LOGIN_OPS),
        'success' : bool(random.randint(0,1)) }
```

Step 6
------

Create a consumer that consumes the login events published to
`login-topic`.

The consumer keeps track of average login success rate. It prints out
the rate.

The average login success rate is `success/total_count`.

Step 7
------

In [1]:
import io, random, threading, logging, time

import avro.io
import avro.schema

from kafka.client   import KafkaClient
from kafka.consumer import KafkaConsumer
from kafka.producer import SimpleProducer


LOGIN_USERS = ['alice','bob','chas','dee','eve']
LOGIN_OPS = ['login','logout']

KAFKA_TOPIC = 'login-topic'

AVRO_SCHEMA_STRING = '''{
        "namespace": "kafka_lab.avro",
        "type": "record",
        "name": "User",
        "fields": [
            {"name": "date",    "type": "string"},
            {"name": "time",    "type": "string"},
            {"name": "user",    "type": "string"},
            {"name": "op",      "type": "string"},
            {"name": "success", "type": "string"}
        ]
    }
    '''

In [2]:
class AvroSerDe:
    '''Serializes and deserializes data structures using Avro.'''
    def __init__(self, avro_schema_string):
        self.schema = avro.schema.parse(avro_schema_string)
        self.datum_writer = avro.io.DatumWriter(self.schema)
        self.datum_reader = avro.io.DatumReader(self.schema)

    def obj_to_bytes(self, obj):
        bytes_writer = io.BytesIO()
        encoder = avro.io.BinaryEncoder(bytes_writer)
        self.datum_writer.write(obj, encoder)
        raw_bytes = bytes_writer.getvalue()
        return raw_bytes

    def bytes_to_obj(self, raw_bytes):
        bytes_reader = io.BytesIO(raw_bytes)
        decoder = avro.io.BinaryDecoder(bytes_reader)
        obj = self.datum_reader.read(decoder)
        return obj

In [3]:
def get_login_event():
    return {
        'date'    : str(time.strftime('%F')),
        'time'    : str(time.strftime('%T')),
        'user'    : random.choice(LOGIN_USERS),
        'op'      : random.choice(LOGIN_OPS),
        'success' : str(bool(random.randint(0,1))) }

class Producer(threading.Thread):
    '''Produces users and publishes them to Kafka topic.'''
    daemon = True
    def run(self):
        avro_serde = AvroSerDe(AVRO_SCHEMA_STRING)
        client = KafkaClient('localhost:9092')
        producer = SimpleProducer(client)
        while True:
            # input generated avro data here
            raw_bytes = avro_serde.obj_to_bytes(get_login_event())
            producer.send_messages(KAFKA_TOPIC, raw_bytes)
            time.sleep(1)

In [5]:
class Consumer(threading.Thread):
    '''Consumes users from Kafka topic.'''
    daemon = True
    def run(self):
        avro_serde = AvroSerDe(AVRO_SCHEMA_STRING)
        client = KafkaClient('localhost:9092')
        consumer = KafkaConsumer(KAFKA_TOPIC,
                                 group_id='my_group',
                                 bootstrap_servers=['localhost:9092'])
        attempts = 0.0
        success = 0.0
        failure = 0.0
        for message in consumer:
            user = avro_serde.bytes_to_obj(message.value)
            print '--> ' + str(user)
            if user['op'] == 'login':
                attempts += 1.0
                if user['success'] == 'True':
                    success += 1.0
                else:
                    failure += 1.0
            if attempts > 0:
                print "Success Rate {:.2}".format(success/attempts)
            if failure > 3:
                print "Threat Detected!"
                print "{} Failured Login Attempts !".format(failure)
            print "\n\n"

In [7]:
'''Starts producer and consumer threads.'''
threads = [ Producer(), Consumer() ]
for t in threads: t.start()
time.sleep(5)

--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'alice', u'success': u'False', u'time': u'23:28:05'}
Success Rate 0.0



--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'bob', u'success': u'False', u'time': u'23:28:06'}
Success Rate 0.0



--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'bob', u'success': u'True', u'time': u'23:28:07'}
Success Rate 0.33



--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'bob', u'success': u'True', u'time': u'23:28:08'}
Success Rate 0.5



--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'bob', u'success': u'True', u'time': u'23:28:09'}
Success Rate 0.6



--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'alice', u'success': u'False', u'time': u'23:28:10'}
Success Rate 0.5



--> {u'date': u'2015-10-07', u'op': u'login', u'user': u'dee', u'success': u'False', u'time': u'23:28:11'}
Success Rate 0.43
Threat Detected!
4.0 Failured Login Attempts !



--> {u'date': u'2015-10-07', u'op': u'logout', u'user': u'cha

Suppose you had the data from all the servers coming into your Kafka
topic. How would you scale it?

Suppose you scale this with hundreds of consumers. What might some of
the issues be? How can you resolve those issues?