### About this notebook
This notebook is part 2 of Abnormal Distribution's code deliverable for TCC's Data Analysis Bootcamp. Here we are implementing Python code to extract our previously ETL'd dataset from PostgreSQL as intrusion detection system (IDS) rules in <a href=https://suricata.io/>Suricata</a>'s format.<p>

<b>Prequisite:</b> Completion of "Project 3 Part 1 - ETL.ipynb."

<b>A note about terminology:</b> "Signatures" tell a security control how to interpret input, such as an attack pattern, and "rules" are the functional configuration of those signatures in the control (e.g., Suricata). Functionally, the terms rule and signature are used interchangeably here.

### On to the code

In [2]:
# Ensure suricataparser is available in the local Jupyter environment
# suricataparser is the library that will export IDS signatures from our ETL'd database in Suricata format
!pip install suricataparser



In [35]:
# Import psycopg2 and suricataparser libraries for database connectivity and rule extraction
import pandas as pd, csv, psycopg2, suricataparser

In [44]:
""" Connect to the iot_attack_traffic database

Note that hardcoding user credentials is extremely insecure code ... anyone who has access will have your creds.
Because this is non-production code we accept this risk. As the code moves to Production we would implement code
to check out the credentials from a secure password store, such as keyring or passlib.
(https://theautomatic.net/2020/04/28/how-to-hide-a-password-in-a-python-script/)
"""
conn = psycopg2.connect("dbname='iot_attack_traffic' user='postgres' host='localhost' password='postgres'")
cur=conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur=conn.cursor(cursor_factory=psycopg2.extras.DictCursor)

AttributeError: module 'psycopg2' has no attribute 'extras'

In [38]:
# Define the query we'll use to extract traffic and attack patterns from the database
# Start with a var to query the list of attacks we know about ...
attacksQuery = "select * from traffic_patterns"
cur.execute(attacksQuery)
# ... and write the results to var called "attacksList"
attacksList=cur.fetchall()

In [39]:
# Next are the traffic stats (i.e., pull the traffic signatures that will become rules)
tfcQuery="select * from all_traffic"
cur.execute(tfcQuery)
tfc=cur.fetchall()

In [None]:
cur.execute(
    "select * from all_traffic")

In [42]:
tfcDf=pd.DataFrame(tfc)
tfcDf.head()

Unnamed: 0,0,1,2,3,4,5
0,0,38667,1883,tcp,mqtt,MQTT_Publish
1,1,51143,1883,tcp,mqtt,MQTT_Publish
2,2,44761,1883,tcp,mqtt,MQTT_Publish
3,3,60893,1883,tcp,mqtt,MQTT_Publish
4,4,51087,1883,tcp,mqtt,MQTT_Publish


In [11]:
# Close the db connections
conn.close()
cur.close()

In [36]:
attacksDf=pd.DataFrame(tfc)
attacksDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 123117 entries, 0 to 123116
Data columns (total 6 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   0       123117 non-null  int64 
 1   1       123117 non-null  int64 
 2   2       123117 non-null  int64 
 3   3       123117 non-null  object
 4   4       123117 non-null  object
 5   5       123117 non-null  object
dtypes: int64(3), object(3)
memory usage: 5.6+ MB


In [18]:
# Create an empty rules list, loop through the traffic and append rules for each
rules = []
for row in tfc:
    rule=suricataparser.parse_rules(f"alert tcp any any -> any any (sid:1; gid:1;)")
    rules.append(rule)

In [19]:
# Quality check the rules
rules

[[<suricataparser.rule.Rule at 0x7fb6ee2d9210>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d8d60>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9fc0>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9c90>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9f60>],
 [<suricataparser.rule.Rule at 0x7fb6ee2da620>],
 [<suricataparser.rule.Rule at 0x7fb6ee2db4c0>],
 [<suricataparser.rule.Rule at 0x7fb6ee2da830>],
 [<suricataparser.rule.Rule at 0x7fb6ee2dac20>],
 [<suricataparser.rule.Rule at 0x7fb6ee2da920>],
 [<suricataparser.rule.Rule at 0x7fb6ee2da980>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9cf0>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9090>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d98a0>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d8ac0>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9de0>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9000>],
 [<suricataparser.rule.Rule at 0x7fb6ee2dae00>],
 [<suricataparser.rule.Rule at 0x7fb6ee2d9270>],
 [<suricataparser.rule.Rule at 0x7fb6ee2dae30>],
 [<suricataparser.ru

In [30]:
# Use Pandas to make tfc a df for further (later) analysis
tfcDf=pd.DataFrame(tfc)
attacksDF=pd.DataFrame(attacksList)
print(tfcDf.info())
print(attacksDF.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 123117 entries, 0 to 123116
Data columns (total 6 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   0       123117 non-null  int64 
 1   1       123117 non-null  int64 
 2   2       123117 non-null  int64 
 3   3       123117 non-null  object
 4   4       123117 non-null  object
 5   5       123117 non-null  object
dtypes: int64(3), object(3)
memory usage: 5.6+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       12 non-null     int64 
 1   1       12 non-null     object
 2   2       12 non-null     object
dtypes: int64(1), object(2)
memory usage: 416.0+ bytes
None


In [33]:
# Import the csv library and write the patterns to a file
# this file can be imported to Suricata
import csv
with open('rules.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(['src_port','dst_port','proto','service','pattern'])
    for event in tfc:
        writer.writerow(event)