# OSINT

### Gather IP address data

**Windows**
* Windows Defender Firewall Logging
* Elasticsearch winlogbeat

Analyze with open APIs ip-api and VirusTotal

* Use ip-api for geolocation data
* https://github.com/ashwin-patil/threat-hunting-with-notebooks/blob/master/threat-hunting-with-ipaddress-from-logs-Public.ipynb
* https://nbviewer.jupyter.org/github/ashwin-patil/threat-hunting-with-notebooks/blob/master/Open%20Source%20Threat%20Intel%20lookup%20using%20Requests%20API.ipynb 

# ip-api 

### Query a single IP address
Example query syntax: http://ip-api.com/json/24.48.0.1

In [1]:
import requests, os
from IPython.display import JSON
ip = '93.184.220.29'
r = requests.get(f'http://ip-api.com/json/{ip}')
result_json = r.json()
JSON(result_json)

<IPython.core.display.JSON object>

### Query a batch of IP addresses

Batch query supports 100 addresses at once. The addresses need to be provided in JSON format and sent as HTTP post to http://ip-api.com/batch

Usage limits to note:
* 45 requests per minute

In [2]:
def chunks(ips, n):
    # Yield successive n-sized chunks from ips
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
def ip_api_batch_query(ips,fields,reqs_left,ttl):
    import time, requests
    # Split ips list into chunks, if list longer than 100
    if len(ips) > 100:
        ips_list = list(chunks(ips,100))
    # Keep track of usage limits
    if reqs_left != None:
            if int(reqs_left) == 0:
                time.sleep(int(ttl))
    json_post = [{"query":ip, "fields":fields} for ip in ips]
    r = requests.post("http://ip-api.com/batch",json=json_post)
    r_json, r_headers = r.json(), r.headers
    reqs_left = r_headers['X-Rl']
    ttl = r_headers['X-Rl']
    return r_json, reqs_left, ttl

In [3]:
result_json, reqs_left, ttl = None, None, None
ips = ["93.184.220.29","52.114.77.38"]
fields = "country,regionName,city,lat,lon,isp,query"
result_json, reqs_left, ttl = ip_api_batch_query(ips,fields,reqs_left,ttl)
JSON(result_json)

<IPython.core.display.JSON object>

### Virustotal

Virustotal url scan - API usage, provides a detailed scan from multiple different provides if the URL is classified as clean.

In [4]:
def virustotal_url_query(ip,apikey):
    import time
    params = {'apikey':apikey,'url':ip}
    r_post = requests.post('https://www.virustotal.com/vtapi/v2/url/scan',params=params)
    results_json = r_post.json()
    scan_id = results_json['scan_id']
    params = {'apikey': apikey, 
          'resource': scan_id,
          'allinfo': False,
          'scan': 0}
    print('Waiting for scan results...')
    time.sleep(30)
    report = requests.get('https://www.virustotal.com/vtapi/v2/url/report',params=params)
    return report.json()

apikey = os.getenv('VIRUSTOTAL_API_KEY') # Get API key from environment variables or define as a string here
report = virustotal_url_query(ips[0],apikey)
JSON(report)

Waiting for scan results...


<IPython.core.display.JSON object>

### Reading from Windows Defender Firewall logs
An example of using Pandas Dataframe to read Windows Defender Firewall logs and enriching the logs with APIs

First, load logs to Pandas Dataframe

In [6]:
import pandas as pd
import numpy as np
def windows_firewall_log_reader(log_file):
    with open(log_file) as f:
        lines = f.readlines()
        columns = lines[3].split(': ')
        columns = columns[1].split(' ')
        data = lines[4:]
        data.remove('\n')
        dataa = []
        for x in data:
            x.replace('\n','')
            x = x.split(' ')
            arr = np.array(x)
            dataa.append(arr)
    dataa = np.array(dataa)
    df = pd.DataFrame(columns=columns,data=dataa)
    return df
df = windows_firewall_log_reader('pfirewall.log')
df.head(5)

Unnamed: 0,date,time,action,protocol,src-ip,dst-ip,src-port,dst-port,size,tcpflags,tcpsyn,tcpack,tcpwin,icmptype,icmpcode,info,path\n
0,2020-03-02,12:49:27,ALLOW,UDP,10.0.2.15,239.255.255.250,50596,1900,0,-,-,-,-,-,-,-,SEND\n
1,2020-03-02,12:49:35,ALLOW,UDP,10.0.2.15,192.168.8.4,59929,53,0,-,-,-,-,-,-,-,SEND\n
2,2020-03-02,12:49:35,ALLOW,UDP,10.0.2.15,192.168.8.4,51488,53,0,-,-,-,-,-,-,-,SEND\n
3,2020-03-02,12:49:35,ALLOW,UDP,10.0.2.15,192.168.8.4,53984,53,0,-,-,-,-,-,-,-,SEND\n
4,2020-03-02,12:49:35,ALLOW,TCP,10.0.2.15,151.101.86.217,50308,443,0,-,0,0,0,-,-,-,SEND\n


Enrich IP Address type based on IP ranges.

In [7]:
def ip_type(string):
    import ipaddress as ip
    try:
        if ip.ip_address(string).is_private:
            return 'Private'
        elif ip.ip_address(string).is_multicast:
            return 'Multicast'
        elif ip.ip_address(string).is_unspecified:
            return 'Unspecified'
        elif ip.ip_address(string).is_reserved:
            return 'Reserved'
        elif ip.ip_address(string).is_loopback:
            return 'Loopback'
        elif ip.ip_address(string).is_global:
            return 'Public'
        elif ip.ip_address(string).is_link_local:
            return 'Link Local'
    except ValueError:
        return 'Unknown'
df['dst-ip address type'] = df['dst-ip'].apply(lambda x: ip_type(x))
df['src-ip address type'] = df['src-ip'].apply(lambda x: ip_type(x))
df.head(5)

Unnamed: 0,date,time,action,protocol,src-ip,dst-ip,src-port,dst-port,size,tcpflags,tcpsyn,tcpack,tcpwin,icmptype,icmpcode,info,path\n,dst-ip address type,src-ip address type
0,2020-03-02,12:49:27,ALLOW,UDP,10.0.2.15,239.255.255.250,50596,1900,0,-,-,-,-,-,-,-,SEND\n,Multicast,Private
1,2020-03-02,12:49:35,ALLOW,UDP,10.0.2.15,192.168.8.4,59929,53,0,-,-,-,-,-,-,-,SEND\n,Private,Private
2,2020-03-02,12:49:35,ALLOW,UDP,10.0.2.15,192.168.8.4,51488,53,0,-,-,-,-,-,-,-,SEND\n,Private,Private
3,2020-03-02,12:49:35,ALLOW,UDP,10.0.2.15,192.168.8.4,53984,53,0,-,-,-,-,-,-,-,SEND\n,Private,Private
4,2020-03-02,12:49:35,ALLOW,TCP,10.0.2.15,151.101.86.217,50308,443,0,-,0,0,0,-,-,-,SEND\n,Public,Private


Query addresses to ip-api

In [8]:
dst_ip_public = df[df['dst-ip address type'] == 'Public']
dst_ip_public = df['dst-ip'].unique().tolist()
src_ip_public = df[df['src-ip address type'] == 'Public']
src_ip_public = df['src-ip'].unique().tolist()
fields = "country,regionName,city,isp,query"
reqs_left, ttl = None, None
result_json, reqs_left, ttl = ip_api_batch_query(dst_ip_public,fields,reqs_left,ttl)
result_df = pd.DataFrame.from_dict(result_json)
result_df.head(5)

Unnamed: 0,query,country,regionName,city,isp
0,239.255.255.250,,,,
1,192.168.8.4,,,,
2,151.101.86.217,Sweden,Stockholm County,Stockholm,Fastly
3,192.168.8.8,,,,
4,104.16.93.80,United States,Illinois,Chicago,"Cloudflare, Inc."


List all different ISPs in the logs.

In [9]:
result_df['isp'].unique()

array([nan, 'Fastly', 'Cloudflare, Inc.', 'Akamai Technologies',
       'Google LLC', '', 'Microsoft Corporation', 'Facebook, Inc.',
       'Amazon Technologies Inc.', 'Amazon.com, Inc.',
       'DigitalOcean, LLC', 'LinkedIn Corporation', 'Rackspace Ltd.',
       'Adobe Inc.', 'AppNexus, Inc', 'Level 3 Communications, Inc.',
       'Akamai Technologies, Inc.',
       'MCI Communications Services, Inc. d/b/a Verizon Business',
       'FUNET'], dtype=object)

# Phishing Domains

Usually suspicious links in emails contain the domain names, which can be extracted and queried directly to an API to check if it is unsafe.
If a system administrator is looking for suspicious web traffic in network logs, a reverse lookup is first needed in order to get a domain name from an IP address

A reverse lookup can be performed for example by the [ipwhois Python library](https://github.com/richardpenman/whois). Note that Reverse IP lookups are not always useful, thousands of domains can be under the same address in domain hosts. Check the blog [Website Attribution Without WhoIs – Reverse IP Lookup
](https://nixintel.info/osint/website-attribution-without-whois-reverse-ip-lookup/) for more information.

A free twice-a-day updated feed can be found from [OpenPhish](https://openphish.com/)

**Warning**
Do **NOT** click the URL's below, they may be real phishing domains and could try to download malware or use your information maliciously

In [16]:
from urllib.parse import urlparse
suspicious_domain = "https://tsh.re"
r = requests.get('https://openphish.com/feed.txt')
phishing_domains = r.text.split('\n')
phishing_domains = [urlparse(d).netloc for d in phishing_domains]
if urlparse(suspicious_domain).netloc in phishing_domains:
    print(f"Domain {suspicious_domain} found in OpenPhish phishing feed")
else:
    print(f"Domain {suspicious_domain} not found in OpenPhish phishing feed")

Domain https://tsh.re is a phishing domain


Alternatively, for non-commericial use, [Google Safe Browsing API](https://developers.google.com/safe-browsing/v4) contains an updated list of unsafe web resources. A Google account is required and a project in Google Cloud Console needs to be created with Google Safe Browsing API enabled. The API key is provided after enabling the API. The database which to search from is much larger than the OpenPhish threat feed, but it may not contain the latest phishing domains.

In [34]:
apikey = "AIzaSyDG39wZh6_AHETZeya4TMnv54hKVIfWV84"
url = "https://safebrowsing.googleapis.com/v4/threatMatches:find"
suspicious_links = ["https://irs-gov.czldgzy.com/", # Found in OpenPhish feed
                    "url-facebook20.tk", # Older phishing domain found in https://github.com/mitchellkrogza/Phishing.Database
                    "https://cloudflare.com" # Legitimate website
                   ]
body = {
    "client": {
      "clientId": "Security Company Inc",
      "clientVersion": "1.5.2"
    },
    "threatInfo": {
      "threatTypes":      ["MALWARE", "SOCIAL_ENGINEERING"],
      "platformTypes":    ["ANY_PLATFORM"],
      "threatEntryTypes": ["URL"],
      "threatEntries": [{'url': url} for url in suspicious_links]
    }
}
params = {'key':apikey}
headers = {'Content-type': 'application/json'}
r = requests.post(url, 
                  json=body,
                  params=params,
                  headers=headers)
results = r.json()
if len(results) == 0:
    for link in suspicious_links:
        print(f"{link} not found in Google Safe Browsing")
else:
    print(results)

{'matches': [{'threatType': 'SOCIAL_ENGINEERING', 'platformType': 'ANY_PLATFORM', 'threat': {'url': 'url-facebook20.tk'}, 'cacheDuration': '300s', 'threatEntryType': 'URL'}]}
