# IaaS Linux Logs Review Template - Sentinel
BTV Project Obsidian, 2023

Author: juju43, https://blueteamvillage.org/programs/project-obsidian/ https://discord.gg/blueteamvillage
<img align="right" width="100" height="100" src="https://cfc.blueteamvillage.org/media/call-for-content-2021/img/20200622_BTVillage_logos_RGB_pos_hcOC7Qx.png">

This playbook is to help validating available logs
It helps to baseline environment, identify gaps and control points.

It targets Defcon31 BlueTeamVillage Project obsidian environment and splunk platform but it can be adapt to other logging platforms.

Possible sources
* systemd, journald, su, sshd, sudo, cron, at
* auditd
* osquery
* sysmonforlinux

External sources like EDR or network are intentionally not covered here.

## Resources

* https://github.com/microsoft/msticpy/
* https://infosecjupyterthon.com/
* https://dropbox.tech/security/how-dropbox-security-builds-better-tools-for-threat-detection-and-incident-response
* https://github.com/SigmaHQ/sigma
* cli usage
  * as is `jupyter run notebook.ipynb --allow-errors` - https://docs.jupyter.org/en/latest/running.html#using-a-command-line-interface
  * with parameters `papermill input.ipynb output.ipynb -p alpha 0.6 -p l1_ratio 0.1` - https://papermill.readthedocs.io/en/latest/usage-workflow.html 

(Linux)
* https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/Entity%20Explorer%20-%20Linux%20Host.ipynb
* https://securitydatasets.com/notebooks/atomic/linux/intro.html

## Findings

_Put your findings here_

## Table of Contents

* Import
* Configuration
* Queries
  * Authentication
    * fail
    * success
  * Remote access
  * Privilege escalation
  * Privileged users activities? root, Administrator...
  * Service activities
    * Time
    * Logging
    * Scheduled tasks
  * System boot, on/off
  * Process activities
  * Network activities
  * File Integrity Monitoring (FIM)
  * Auditd
  * AV logs?
  * Web logs?
  * Misc

## Import

In [None]:
# Check we are running Python 3.6
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
    print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
    print('or later is selected as the active kernel.')
    sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)

In [None]:
# Imports
import pandas as pd
import msticpy.nbtools as nbtools
from datetime import datetime,timedelta
import os

In [None]:
# path to config file
os.environ['MSTICPYCONFIG'] = '/home/ubuntu/msticpyconfig.yaml'
from msticpy.nbtools import *
from msticpy.data.data_providers import QueryProvider
from msticpy.common.wsconfig import WorkspaceConfig
from msticpy.nbtools.data_viewer import DataViewer
from msticpy.vis.matrix_plot import plot_matrix
from msticpy.nbtools import process_tree as ptree
print('Imports Complete')

## Configuration

In [None]:
# Interactive settings edit
# https://msticpy.readthedocs.io/en/latest/getting_started/SettingsEditor.html#using-mpconfigfile-to-check-and-manage-your-msticpyconfig-yaml
from msticpy.config import MpConfigFile, MpConfigEdit, MpConfigControls
mpconfig = MpConfigFile()
# mpconfig.load_default()
# mpconfig.view_settings()
mpconfig

In [None]:
# q_times = nbwidgets.QueryTime(units='hours', max_before=72, before=1, max_after=0)
q_times = nbwidgets.QueryTime(origin_time=datetime(2023, 6, 15), units='days', max_before=30, before=1, max_after=0)
#q_times = nbwidgets.QueryTime(origin_time=datetime(2023, 6, 15), units='days', max_before=1, before=0, max_after=0)
# q_times = nbwidgets.QueryTime(origin_time=datetime(2023, 6, 15), units='hours', max_before=4, before=0, max_after=0)

q_times.display()

In [None]:
# If your environment footpring is very large or timeperiod too big, queries not optimized enough may return 'ADX query timed out' or 'Unknown query error' when done through msticpy.
# Ensure to use appropriate filters
query_common_args = ''
# query_common_args = f'''| where _SubscriptionId in ("12345", "67890")'''
results_limit = 10

In [None]:
query_common_args = query_common_args.strip()
query_common_args = query_common_args + f'''| where TimeGenerated >= datetime({q_times.start})
| where TimeGenerated <= datetime({q_times.end})'''

In [None]:
query_common_args

In [None]:
# Configuration
qry_prov = QueryProvider("AzureSentinel")

In [None]:
# Get the default Microsoft Sentinel workspace details from msticpyconfig.yaml
ws_config = WorkspaceConfig()

# Connect to Microsoft Sentinel with our QueryProvider and config details
qry_prov.connect(ws_config)

In [None]:
# pandas
pd.set_option('display.max_colwidth', 500)

## Queries

### Timeperiod

Let's confirm that we have logs for the targeted timeperiod.

In [None]:
q_times.start

In [None]:
q_times.end

In [None]:
query = f'''Syslog {query_common_args}
| summarize max(TimeGenerated),min(TimeGenerated)
'''
df_timeperiod = qry_prov.exec_query(query)
df_timeperiod.head(results_limit)

### Authentication

In [None]:
query = f'''search in (Syslog) "su:"  {query_common_args}
| where Facility == "authpriv" and SyslogMessage contains "su:"
| summarize count() by SourceSystem,ProcessName
| limit {results_limit}'''
# these ones work
query = f'''Syslog {query_common_args}
| where Facility == "authpriv" and ProcessName == "su" and SyslogMessage contains "su:"
| summarize count() by ProcessName'''
query = f'''Syslog {query_common_args}
| where Facility == "authpriv" and ProcessName == "su" and SyslogMessage contains "su:"
| summarize count() by SourceSystem,ProcessName'''
df_auth = qry_prov.exec_query(query)
df_auth.head(results_limit)

In [None]:
query = f'''search in (Syslog) "session" {query_common_args}
| where Facility == "authpriv" and SyslogMessage contains "session"
| summarize count() by SourceSystem,ProcessName
| limit {results_limit}'''
df_session = qry_prov.exec_query(query)
df_session.head(results_limit)

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName == "systemd-logind"
| summarize count() by SourceSystem,Facility,ProcessName
| limit {results_limit}'''
df_systemdlogind = qry_prov.exec_query(query)
df_systemdlogind.head(results_limit)

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName in ("xrdp", "xrdp-chansrv", "xrdp-sesman")
| summarize count() by SourceSystem,Facility,ProcessName
| limit {results_limit}'''
df_xrdp = qry_prov.exec_query(query)
df_xrdp.head(results_limit)

### Remote access

In [None]:
query = f'''Syslog {query_common_args}
| where Facility == "authpriv" and ProcessName == "sshd"
| limit {results_limit}'''
df_ssh = qry_prov.exec_query(query)

In [None]:
df_ssh.head(results_limit)

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name)
| where name == "pack_osquery-custom-pack_authorized_keys"
| summarize count() by name
'''
df_ssh_authorized_keys = qry_prov.exec_query(query)
df_ssh_authorized_keys.head(10)

### Privilege Escalation

In [None]:
query = f'''Syslog {query_common_args}
| where Facility == "authpriv" and ProcessName in ("sudo", "doas")
| summarize count() by SourceSystem,ProcessName
| limit {results_limit}'''
df_sudo = qry_prov.exec_query(query)
df_sudo.head(results_limit)

### Services activities

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName == "systemd" and SyslogMessage has_any ("start", "stop")
| summarize count() by Facility,SeverityLevel,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_services = qry_prov.exec_query(query)
df_services.head(results_limit)

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName == "systemd" and SyslogMessage has_any ("ntpd", "openntpd", "ntpdate", "rdate", "chrony")
| summarize count() by Facility,SeverityLevel,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_service_time = qry_prov.exec_query(query)
df_service_time.head(results_limit)

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName in ("rsyslog", "syslog-ng", "logrotate", "systemd-journald")
| summarize count() by Facility,SeverityLevel,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_service_logging = qry_prov.exec_query(query)
df_service_logging.head(results_limit)

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName in ("CRON", "crontab", "systemd-timers")
| summarize count() by Facility,SeverityLevel,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_service_scheduledtasks = qry_prov.exec_query(query)
df_service_scheduledtasks.head(results_limit)

### System boot, on/off

In [None]:
# FIXME! more filtering needed
query = f'''Syslog  {query_common_args}
| where 
    (Facility == "daemon" and SyslogMessage has_any ("halt", "shutdown", "reboot"))
    or ProcessName in ("systemd-shutdownd")
| summarize count() by Facility,SeverityLevel,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_system_onoff = qry_prov.exec_query(query)
df_system_onoff.head(results_limit)

### Error, warnings

In [None]:
query = f'''search in (Syslog) "error" {query_common_args}
| summarize count() by Facility,SeverityLevel,ProcessName,SyslogMessage
| sort by count_ desc
| limit {results_limit}'''
df_errors = qry_prov.exec_query(query)
df_errors.head(results_limit)

In [None]:
query = f'''search in (Syslog) "warn" {query_common_args}
| summarize count() by Facility,SeverityLevel,ProcessName,SyslogMessage
| sort by count_ desc
| limit {results_limit}'''
df_warn = qry_prov.exec_query(query)
df_warn.head(results_limit)

### Process activities

In [None]:
query = f'''Sysmonforlinux_CL {query_common_args}
| summarize count() by RuleName,User,Image,CommandLine,ParentCommandLine
| sort by count_ desc 
| limit {results_limit}
'''
# Enable if have Sysmonforlinux_CL table
# df_process = qry_prov.exec_query(query)
# df_process.head(results_limit)

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name), action = tostring(json.action), pid = tostring(json.columns.pid), cmdline = tostring(json.columns.cmdline), ppid = tostring(json.columns.ppid), pcmdline = tostring(json.columns.pcmdline)
| where name == "pack_osquery-custom-pack_outbound_connections"
| summarize count() by name,action,cmdline,pcmdline
| sort by count_ desc 
| limit {results_limit}'''
df_process2 = qry_prov.exec_query(query)
df_process2.head(results_limit)

### Network activities

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name), action = tostring(json.action), type = tostring(json.columns.type), address = tostring(json.columns.address)
| where name == "pack_osquery-custom-pack_dns_resolvers"
| summarize count() by name,action,type,address
| sort by count_ desc 
| limit {results_limit}'''
df_dns = qry_prov.exec_query(query)
df_dns.head(results_limit)

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name), action = tostring(json.action), username = tostring(json.columns.username), pid = tostring(json.columns.pid), cmdline = tostring(json.columns.cmdline), ppid = tostring(json.columns.ppid), pcmdline = tostring(json.columns.pcmdline), remote_address = tostring(json.columns.remote_address)
| where name == "pack_osquery-custom-pack_outbound_connections"
| summarize count() by name,action,username,cmdline,remote_address
| sort by count_ desc 
| limit {results_limit}'''
df_outbound = qry_prov.exec_query(query)
df_outbound.head(results_limit)

### File Integrity Monitoring

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name), target_path = tostring(json.columns.target_path), username = tostring(json.columns.username), action = tostring(json.columns.action)
| where name == "fim"
| summarize count() by name,action,username,target_path
| sort by count_ desc 
| limit {results_limit}'''
df_fim = qry_prov.exec_query(query)
df_fim.head(results_limit)

### Auditd

by type, keywords, exe

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName in ("audit", "auditd", "audispd")
| summarize count() by SeverityLevel,ProcessName,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_auditd = qry_prov.exec_query(query)
df_auditd.head(results_limit)

### Selinux

In [None]:
# https://www.redhat.com/sysadmin/diagnose-selinux-violations
# https://unix.stackexchange.com/questions/642876/have-selinux-allow-syslog-and-logrotate-to-handle-files-outside-of-var-log/642877#642877
query = f'''Syslog {query_common_args}
| where ProcessName in ("setroubleshoot", "setroubleshootd", "sedispatch")
| summarize count() by SyslogMessage
| sort by count_ desc
| limit {results_limit}'''
df_selinux = qry_prov.exec_query(query)
df_selinux.head(results_limit)

In [None]:
# `semanage fcontext -a -t var_log_t /var/opt/microsoft/azuremonitoragent/; restorecon /var/opt/microsoft/azuremonitoragent/`
query = f'''Syslog {query_common_args}
| where ProcessName in ("setroubleshoot")
| parse SyslogMessage with * "SELinux is preventing" bin_path:string " from " action:string " access on " target:string ". For complete SELinux messages run:" *
| summarize count() by ProcessName,bin_path,action,target
| sort by count_ desc
| limit {results_limit}'''
df_selinux_preventing = qry_prov.exec_query(query)
df_selinux_preventing.head(results_limit)

In [None]:
query = f'''Syslog {query_common_args}
| where ProcessName in ("setroubleshoot")
| parse SyslogMessage with * "SELinux is preventing" bin_path:string " from " action:string " access on " target:string "Plugin catch" *
| summarize count() by ProcessName,bin_path,action,target
| sort by count_ desc
| limit {results_limit}'''
df_selinux_preventing = qry_prov.exec_query(query)
df_selinux_preventing.head(results_limit)

### Local AV, EDR

In [None]:
# in most cases, local service logs. Usually, no EDR alerts.
query = f'''Syslog {query_common_args}
| where ProcessName in ("clamd", "freshclam", "wdavdaemon", "microsoft-mdatp-installer", "microsoft-mdatp-uninstaller", "falcon-sensor", "falcond", "falconctl")
| summarize count() by SeverityLevel,ProcessName,SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_sectools = qry_prov.exec_query(query)
df_sectools.head(results_limit)

### Misc

In [None]:
qry_prov.list_queries()

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name)
| summarize count() by name
| sort by count_ desc 
| limit {results_limit}'''
df_osquery_queries = qry_prov.exec_query(query)
df_osquery_queries.head(results_limit)

In [None]:
query = f'''Osquery_CL {query_common_args}
| extend json = parse_json(RawData)
| extend name = tostring(json.name), pkg_name = tostring(json.columns.name), summary = tostring(json.columns.summary), version = tostring(json.columns.version)
| where name == "pack_osquery-custom-pack_python_packages"
| summarize count() by name,pkg_name,summary,version
| sort by count_ desc 
| limit {results_limit}'''
df_python = qry_prov.exec_query(query)
df_python.head(results_limit)

In [None]:
# Noise ?
query = f'''Syslog {query_common_args}
| summarize count() by ProcessName
| sort by count_ desc 
| limit {results_limit}'''
df_volume = qry_prov.exec_query(query)
df_volume.head(results_limit)

In [None]:
# No Processname?
# | limit {results_limit}
query = f'''Syslog {query_common_args}
| where ProcessName == ""
| summarize count() by Facility,SeverityLevel,ProcessName,Computer,_ResourceId
| sort by count_ desc '''
df_anomalies1 = qry_prov.exec_query(query)
df_anomalies1.head(results_limit)

In [None]:
# Wrong Computer name?
query = f'''Syslog {query_common_args}
| where Computer == "" or Computer == "localhost"
| summarize count() by Computer,_ResourceId
| sort by count_ desc 
| limit {results_limit}'''
df_anomalies2 = qry_prov.exec_query(query)
df_anomalies2.head(results_limit)

In [None]:
# ASIM tables?
# https://learn.microsoft.com/en-us/azure/sentinel/normalization-schema-audit
# https://learn.microsoft.com/en-us/azure/sentinel/normalization-schema-authentication
# and so on

In [None]:
# PII, credentials detection? many more variants...
query = f'''search in (Syslog) ("--password" or "password=" or "_PASSWORD" or "PASSWORD_" or "credentials=" or "pin=" or "cvv=" or "hl7-org") {query_common_args}
| where not (SyslogMessage has_any ("PROTECTED", "REDACTED", "MASKED"))
| summarize count() by SyslogMessage
| sort by count_ desc 
| limit {results_limit}'''
df_sensitivedata = qry_prov.exec_query(query)
df_sensitivedata.head(results_limit)

In [None]:
# Time series
# https://msticpy.readthedocs.io/en/latest/visualization/TimeSeriesAnomalies.html
query = f"""
Syslog {query_common_args}
| summarize LogsCount=count() by bin(TimeGenerated, 1h)
| project TimeGenerated, LogsCount
"""
ts_df = qry_prov.exec_query(query)
ts_df = ts_df.set_index("TimeGenerated")

In [None]:
ts_df.head()

In [None]:
ts_df[ts_df['LogsCount'].isna()]

In [None]:
from msticpy.analysis import timeseries

ts_decomp_df = ts_df.mp_timeseries.analyze(
    # time_column="TimeGenerated"  - if the DF is not indexed by timestamp
    data_column="LogsCount",
    seasonal=7,
    period=24
)

ts_decomp_df.head()