## Project Straylight

### Forward DNS Reconnaissance and Attack Surface Visualization using Rapid7 Sonar dataset

__Introduction:__
This walkthrough provides the steps to configure AWS cloud based resources to query the Forward DNS stored in the Rapid 7 Project Sonar public dataset. The output of this process can be used to supplement passive domain reconnaissance techniques. It can also be integrated as a fully automated and entirely passive process to track attack surface on a monthly basis.

This notebook will take a domain name (i.e. microsoft.com) as input and query the project Sonar public dataset for the applicable Forward DNS entries. Additionally, it processes the results by geomapping the IP addresses and producing a heatmap of the global external presence of the domain.

The results provide a completely passive method for reconnaisance and mapping of domains without any direct interaction, querying, or brute-forcing of a domain.

__GitHub:__
* https://github.com/brevityinmotion/straylight

__Blog:__
* [External IP Domain Reconnaissance and Attack Surface Visualization in Under 2 Minutes](https://medium.com/@brevityinmotion/external-ip-domain-reconnaissance-and-attack-surface-visualization-in-under-2-minutes-b2ab06105def?sk=45a029919647bd3214e6dd1e8526ca25)

__Credits:__
* Special thank you to Evan Perotti for the awesome walkthrough for querying project Sonar FDNS and the query code within the Lambda APIs! Some of the ideas and steps were adapted from Evan's tutorial at: http://securityriskadvisors.com/blog/creating-a-project-sonar-fdns-api-with-aws/
* Thank you to Rapid7 for the availability of this valuable dataset (https://www.rapid7.com/research/project-sonar/) and the blog post detailing how to build and query the dataset (https://blog.rapid7.com/2018/10/16/how-to-conduct-dns-reconnaissance-for-02-using-rapid7-open-data-and-aws/)

## Project Dependencies
For this notebook to work, AWS Athena needs to be manually configured using the following setup information. The queries and approach are from Rapid 7's blog detailing the process (https://blog.rapid7.com/2018/10/16/how-to-conduct-dns-reconnaissance-for-02-using-rapid7-open-data-and-aws/).

The query code is also located in a [Brevity In Motion gist](https://gist.github.com/brevityinmotion/af6f10257c6d7a9fe175e30a5af3d45c).

### Additional Notebooks
The code in this notebook utilizes code within the following additional notebooks. Each notebook needs to reside in the same directory as this current notebook for it to run.
* configuration.ipynb
* corefunctions.ipynb
* tools-maxmind.ipynb

### AWS Athena
Within the AWS Athena console, you will need to run the following three queries to configure the environment.
TODO: Codify these commands into Boto3 commands to run directly from the notebook.

#### Query 1:
<code>CREATE DATABASE rapid7fdns;</code>

#### Query 2:
<code>CREATE EXTERNAL TABLE IF NOT EXISTS rapid7_fdns_any (
  `timestamp` timestamp,
  `name` string,
  `type` string,
  `value` string 
) PARTITIONED BY (
  date string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://rapid7-opendata/fdns/any/v1/'
TBLPROPERTIES ('has_encrypted_data'='false');
</code>
#### Query 3:
<code>msck repair table rapid7_fdns_any;</code>

### AWS IAM Roles
When the initial SageMaker instance is created, it will create an execution role providing the relevant access to the Notebooks running within SageMaker. There will be base permissions, but will need extended to other services that the notebooks utilize. The following services need to be added:
* AWS S3 -  Get and Put access to buckets for processed and results data including the location where the Athena query results are stored.
* AWS Secrets Manager - This should be part of the default policy, but has a conditional limitation to only secrets with the format of SageMaker-*.
* AWS Lambda


In [None]:
# Run the core configurations notebook. This generally only needs to be run once.
%run ./configuration.ipynb

In [None]:
# Install additional dependencies for tools-r7sonar from the configuration.ipynb notebook
dependencies_r7sonar()

In [None]:
# The core functions notebook contains generalized functions that apply across use cases
%run ./corefunctions.ipynb

In [None]:
# The Maxmind notebook has the configuration functions to download the maxmind databases and csvs
%run ./tools-maxmind.ipynb

In [None]:
# Establish additional imports
import json, boto3, os, requests, io, time, logging
import pandas as pd
from botocore.exceptions import ClientError

In [None]:
# Download the maxmind dependencies
secret_name = 'AmazonSageMaker-geoip'
region_name = 'us-east-1'
license_key = get_secret(secret_name, region_name)
# The return value of the function contains the {secretname:secretvalue}. To only utilize the secret, the secretname can be referenced as depicted below.
licensesecret = license_key['license_key']

# This function is located in the tools-maxmind.ipynb notebook
maxmind_geolitecity_db(licensesecret)

In [None]:
# Make sure to update these values
DOMAIN_TO_QUERY = 'microsoft.com' # This should look like 'domain.com'. The wildcard will be added automatically later.
ATHENA_BUCKET = 's3://brevity-athena' # This will need to be customized and specific to your own account (i.e. s3://customname-athena').
ATHENA_DB = 'rapid7fdns' # This should align with the database and not need changed if it was created using the previous queries.
ATHENA_TABLE = 'rapid7_fdns_any' # This should align with the table and not need changed if it was created using the previous queries.

# Do not modify this query unless the intent is to customize
querydomain = '%.' + DOMAIN_TO_QUERY
query = "SELECT * FROM %s WHERE name LIKE '%s' AND date = (SELECT MAX(date) from %s);" % (ATHENA_TABLE,querydomain,ATHENA_TABLE)

execid = queryathena(ATHENA_DB, ATHENA_BUCKET, query)
print(execid)

In [None]:
# Utilize executionID to retrieve results
# The retrieveresults function is in the corefunctions.ipynb notebook
downloadURL = retrieveresults(execid)

# Load output into dataframe
s=requests.get(downloadURL).content
dfhosts=pd.read_csv(io.StringIO(s.decode('utf-8')))
dfhosts

In [None]:
# Pass the central function the dataframe and the column containing the IP address
# The get_location function is in the corefunctions.ipynb notebook
df_min = get_location(dfhosts, 'value')
df_min.head(10)

In [None]:
# Output results to excel spreadsheet
# Example code to output the dataframe. This file is not further utilized in this notebook.
df_min.to_excel("sonar-domains.xlsx") 

In [None]:
# Load an external notebook with normalized functions
# The prepare_location function is in the corefunctions.ipynb notebook
df_plot = prepare_location(df_min)
df_plot.head(50)

In [None]:
# The map depicted in this command requires the following two extensions to be enabled. These need to be run as Lifecycle rules if you are using SageMaker.
# The gist for the lifecycle code is at: https://gist.github.com/brevityinmotion/495d1b77bd3f3ea679ef7ccfddce23b3

#!jupyter nbextension enable --py gmaps
#!jupyter nbextension enable --py widgetsnbextension

from ipywidgets.embed import embed_minimal_html
import gmaps
import gmaps.datasets

# The get_heatmap function is in the corefunctions.ipynb notebook
fig = get_heatmap(df_plot)
embed_minimal_html('sonar-heatmap.html', views=[fig]) # Export the map to html
fig

In [None]:
# You can use the following to upload the results to a static S3 hosting bucket for interactive viewing
#bucket = 'recon.brevityinmotion.com'
#file_name = 'sonar-heatmap.html'

# The upload_file function is in the corefunctions.ipynb notebook
#upload_file(file_name,bucket)

In [None]:
# Not necessary to run but are helpful S3 queries to check for latest datasets
#! aws s3 ls s3://rapid7-opendata/fdns/any/v1/ --no-sign-request
#! aws s3 ls s3://rapid7-opendata/fdns/any/v1/date=202005/ --no-sign-request