### Special thank you to Evan Perotti for the awesome walkthrough for querying project Sonar FDNS and the query code within the Lambda APIs!

### Additionally, thank you to Rapid7 for the availability of this valuable dataset (https://www.rapid7.com/research/project-sonar/) and the blog post detailing how to build and query the dataset.

### The following steps are adapted from the tutorials at: http://securityriskadvisors.com/blog/creating-a-project-sonar-fdns-api-with-aws/
https://blog.rapid7.com/2018/10/16/how-to-conduct-dns-reconnaissance-for-02-using-rapid7-open-data-and-aws/


### This notebook will take a domain name (i.e. microsoft.com) as input and query the project Sonar public dataset for the applicable DNS entries. Additionally, it processes the results by geomapping the IP addresses and producing a heatmap of the global external presence of the domain.

### The results provide a completely passive method for reconnaisance and mapping of domains without any direct interaction, querying, or brute-forcing of a domain.

In [None]:
import requests
import pandas as pd
import io
import time

## For this notebook to work, AWS Athena needs to be manually configured using the following setup information. The queries and approach are from Rapid 7's blog detailing the process (https://blog.rapid7.com/2018/10/16/how-to-conduct-dns-reconnaissance-for-02-using-rapid7-open-data-and-aws/).


### Within the Athena interface, you will need to run the following three queries to configure the environment:

#### Query 1:
CREATE DATABASE rapid7fdns;

#### Query 2:
CREATE EXTERNAL TABLE IF NOT EXISTS rapid7_fdns_any (
  `timestamp` timestamp,
  `name` string,
  `type` string,
  `value` string 
) PARTITIONED BY (
  date string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://rapid7-opendata/fdns/any/v1/'
TBLPROPERTIES ('has_encrypted_data'='false');

#### Query 3:
msck repair table rapid7_fdns_any;

#### Todo: codify the commands and dependencies into Boto3 commands to run directly from the notebook.

In [None]:
# Not necessary to run but are helpful S3 queries to check for latest datasets
#! aws s3 ls s3://rapid7-opendata/fdns/any/v1/ --no-sign-request
#! aws s3 ls s3://rapid7-opendata/fdns/any/v1/date=202005/ --no-sign-request

In [None]:
# Recreate the Lambda fully into Jupyter notebook
# To-do: Codify the creation of a S3 bucket to save the Athena results

import json
import boto3
import os

# The core functions notebook contains generalized functions that apply across use cases
%run ./corefunctions.ipynb

# Make sure to update these values
DOMAIN_TO_QUERY = 'microsoft.com' # This should look like 'domain.com'. The wildcard will be added automatically later.
ATHENA_BUCKET = 's3://brevity-athena' # This will need to be customized and specific to your own account (i.e. s3://customname-athena').
ATHENA_DB = 'rapid7fdns' # This should align with the database and not need changed if it was created using the previous queries.
ATHENA_TABLE = 'rapid7_fdns_any' # This should align with the table and not need changed if it was created using the previous queries.

# Do not modify this query unless the intent is to customize
querydomain = '%.' + DOMAIN_TO_QUERY
query = "SELECT * FROM %s WHERE name LIKE '%s' AND date = (SELECT MAX(date) from %s);" % (ATHENA_TABLE,querydomain,ATHENA_TABLE)

execid = queryathena(ATHENA_DB, ATHENA_BUCKET, query)
print(execid)

In [None]:
# Recreate the Lambda fully into Jupyter notebook
# This code is taken again from Evan Perotti from http://securityriskadvisors.com/blog/creating-a-project-sonar-fdns-api-with-aws/ and was adapted from the Lambda.
# Retrieve results Lambda

import json, boto3, time, requests
import pandas as pd
import io

# Load an external notebook with normalized functions
%run ./corefunctions.ipynb

# Utilize executionID to retrieve results
downloadURL = retrieveresults(execid)

# Load output into dataframe
s=requests.get(downloadURL).content
dfhosts=pd.read_csv(io.StringIO(s.decode('utf-8')))
dfhosts

In [None]:
# Load an external notebook with normalized functions
%run ./corefunctions.ipynb
# Pass the central function the dataframe and the column containing the IP address
df_min = get_location(dfhosts, 'value')
df_min.head(10)

In [None]:
# Load an external notebook with normalized functions
%run ./corefunctions.ipynb
df_plot = prepare_location(df_min)
df_plot.head(50)

In [None]:
# The map depicted in this command requires the following two extensions to be enabled. These need to be run as Lifecycle rules if you are using SageMaker.
#!jupyter nbextension enable --py gmaps
#!jupyter nbextension enable --py widgetsnbextension

import gmaps
import gmaps.datasets

%run ./corefunctions.ipynb
fig = get_heatmap(df_plot)
fig