# Hashlookup Analysis Workshop

This workshop introduces Hashlookup, a powerful file hash intelligence service for malware analysis and threat hunting. Hashlookup provides comprehensive information about known files through their cryptographic hashes.

## What is Hashlookup?

Hashlookup is a public API service that provides file intelligence based on cryptographic hashes. It offers:
- **File Intelligence**: Detailed information about known files from various sources
- **Hash Correlation**: Cross-referencing between MD5, SHA-1, and SHA-256 hashes
- **Metadata Enrichment**: File names, sizes, types, and associated information
- **Threat Classification**: Known good/bad file classifications from multiple sources
- **Historical Data**: Timeline information about file observations

## Key Features

### Hashlookup Capabilities:
- **Multi-Hash Support**: Query using MD5, SHA-1, or SHA-256 hashes
- **Bulk Operations**: Process multiple hashes in a single API call
- **File Metadata**: Retrieve comprehensive file information including names and sizes
- **Source Attribution**: Information about where files have been observed
- **Known Good/Bad Classification**: Threat intelligence about file reputation
- **API Integration**: RESTful API for programmatic access and automation

## Data Sources

Hashlookup aggregates data from multiple sources:
- **NSRL (National Software Reference Library)**: Known good software files
- **Threat Intelligence Feeds**: Malware and threat indicators
- **Sandbox Analysis**: Dynamic analysis results from security platforms
- **Community Submissions**: Crowdsourced file intelligence
- **Vendor Databases**: Commercial security product databases

## Documentation

- **Main Service**: https://circl.lu/services/hashlookup
- **API Documentation**: https://hashlookup.circl.lu/

### Use Cases:
- **Malware Analysis**: Identify known malicious files and their characteristics
- **Incident Response**: Quickly assess file reputation during investigations
- **Threat Hunting**: Search for indicators of compromise across file systems
- **Digital Forensics**: Validate file integrity and identify unknown files
- **Security Operations**: Automate file reputation checks in security workflows

## Learning Objectives

By the end of this workshop, you will be able to:
1. Query individual file hashes for detailed intelligence
2. Perform bulk hash lookups for efficient analysis
3. Integrate Hashlookup with local file system analysis
4. Correlate MISP threat intelligence with Hashlookup data
5. Implement automated file reputation workflows
6. Analyze and interpret file intelligence results

## Exercises

In [1]:
import requests
import json

# Configure HTTP session for API communication
print("Configuring API session for Hashlookup...")

# Hashlookup session configuration
hashlookup = requests.Session()

print("API session configured successfully!")
print("Ready to query Hashlookup service.")

Configuring API session for Hashlookup...
API session configured successfully!
Ready to query Hashlookup service.


### Exercise 1.0: Query Individual File Hash Intelligence

**Objective:** Learn how to retrieve comprehensive file intelligence for a specific hash value.

**Understanding Hash-Based File Intelligence:**
Cryptographic hashes serve as unique fingerprints for files, enabling:
- **File Identification**: Uniquely identify files regardless of name or location
- **Integrity Verification**: Confirm files haven't been modified or corrupted
- **Threat Detection**: Identify known malicious files through their hashes
- **Duplicate Detection**: Find identical files across different systems or networks

**Hash Types Supported:**
- **MD5**: 32-character hexadecimal string (legacy but widely used)
- **SHA-1**: 40-character hexadecimal string (more secure than MD5)
- **SHA-256**: 64-character hexadecimal string (current security standard)

**About This Hash:**
The MD5 hash `aaade8e2a921e9ac40178a263ebb67e3` represents a specific file. Through Hashlookup, we can discover:
- What file this hash represents
- Where it has been observed
- Whether it's classified as malicious or benign
- Associated metadata and file information

**API Endpoint:** `https://hashlookup.circl.lu/lookup/md5/[hash]`

**Security Applications:**
This information helps security analysts:
- Identify unknown files during incident response
- Validate file integrity during forensic analysis
- Assess threat level of suspicious files
- Build comprehensive file intelligence profiles

In [2]:
md5 = "aaade8e2a921e9ac40178a263ebb67e3"
response = hashlookup.get(f"https://hashlookup.circl.lu/lookup/md5/{md5}")
if response.status_code == 200:
    data = response.json()
    print("Hashlookup Result:", json.dumps(data, indent=4))
else:
    print("Error querying Hashlookup:", response.status_code)

Hashlookup Result: {
    "FileName": "snap-hashlookup-import/usr/bin/netcat",
    "FileSize": "39560",
    "MD5": "AAADE8E2A921E9AC40178A263EBB67E3",
    "SHA-1": "4D82414A45E1559C6B06B675BB416C7A6B570430",
    "SHA-256": "2A6FAC3D98E090468962EF18003CB8B89FBFFA7219917CA12567D5E42B156948",
    "SHA-512": "3F3B0C576B79AA8314EF2AC8E2D402D2E282E773851254FEDE829A349B12B711A84B0409E62F081E907944FA664FBF96868FC5333553CEA4EB54BFE30257F650",
    "SSDEEP": "768:nyyFY1k9ZC/TL9Jx+s9CYcV3q8CUXESOrW9ulS:nSk94x+s9CqeFO",
    "TLSH": "T1A003094BA1629A78C06482304BEF8B621570F835DB33567F2B10BB393D72E45572DE26",
    "insert-timestamp": "1728210578.8194463",
    "mimetype": "inode/symlink",
    "source": "snap:dwTAh7MZZ01zyriOZErqd1JynQLiOGvM_490",
    "hashlookup:parent-total": 1,
    "parents": [
        {
            "SHA-1": "9573871195870BD347E6BC5F7347DC0F54F54EA7",
            "snap-authority": "canonical",
            "snap-filename": "dwTAh7MZZ01zyriOZErqd1JynQLiOGvM_490.snap",
            "snap-i

### Exercise 1.1: Bulk Hash Intelligence Queries

**Objective:** Demonstrate efficient batch processing of multiple file hashes for large-scale analysis.

**Understanding Bulk Operations:**
When analyzing multiple files or conducting threat hunting across large datasets, individual hash queries become inefficient. Bulk operations provide:
- **Efficiency**: Process hundreds of hashes in a single API call
- **Rate Limit Compliance**: Avoid overwhelming the API with individual requests
- **Comprehensive Analysis**: Get complete intelligence for entire file sets
- **Workflow Integration**: Enable automated processing of hash lists

**Bulk Query Benefits:**
- **Performance**: Significantly faster than individual queries
- **Resource Optimization**: Reduced network overhead and API calls
- **Batch Processing**: Ideal for automated security workflows
- **Scale**: Handle large incident response or forensic investigations

**API Endpoint:** `https://hashlookup.circl.lu/bulk/md5`

* **Hint**: You can also do bulk `SHA1` queries using `https://hashlookup.circl.lu/bulk/sha1`.

In [3]:
hashes = [
    "aaade8e2a921e9ac40178a263ebb67e3",
    "30caea6cf2c12bbb1b626b04ab77d7cb",
    "d04249be1ba985d1f94aea79deab52de",
]

response = hashlookup.post(
    f"https://hashlookup.circl.lu/bulk/md5", json={"hashes": hashes}
)
if response.status_code == 200:
    data = response.json()
    print("Hashlookup Result:", json.dumps(data, indent=4))
else:
    print("Error querying Hashlookup:", response.status_code)

Hashlookup Result: [
    {
        "FileName": "snap-hashlookup-import/usr/bin/netcat",
        "FileSize": "39560",
        "MD5": "AAADE8E2A921E9AC40178A263EBB67E3",
        "SHA-1": "4D82414A45E1559C6B06B675BB416C7A6B570430",
        "SHA-256": "2A6FAC3D98E090468962EF18003CB8B89FBFFA7219917CA12567D5E42B156948",
        "SHA-512": "3F3B0C576B79AA8314EF2AC8E2D402D2E282E773851254FEDE829A349B12B711A84B0409E62F081E907944FA664FBF96868FC5333553CEA4EB54BFE30257F650",
        "SSDEEP": "768:nyyFY1k9ZC/TL9Jx+s9CYcV3q8CUXESOrW9ulS:nSk94x+s9CqeFO",
        "TLSH": "T1A003094BA1629A78C06482304BEF8B621570F835DB33567F2B10BB393D72E45572DE26",
        "insert-timestamp": "1728210578.8194463",
        "mimetype": "inode/symlink",
        "source": "snap:dwTAh7MZZ01zyriOZErqd1JynQLiOGvM_490"
    },
    {
        "FileName": "snap-hashlookup-import/usr/bin/ls",
        "FileSize": "142312",
        "MD5": "30CAEA6CF2C12BBB1B626B04AB77D7CB",
        "SHA-1": "14074FE0A2C4C45A16472DFA23DC4E05FF8177FF",
 

### Exercise 1.2: File System Analysis and Unknown File Detection

**Objective:** Integrate Hashlookup with local file system analysis to identify unknown or potentially suspicious files.

**Understanding File System Intelligence:**
This exercise demonstrates real-world application of hash-based file intelligence by:
- **System Auditing**: Analyze files present on local systems
- **Baseline Establishment**: Identify which files are known and documented
- **Anomaly Detection**: Find files that aren't in public databases
- **Security Assessment**: Evaluate the security posture of file systems

**Why Analyze System Directories:**
System directories like `/usr/sbin` contain critical system utilities and administrative tools:
- **High-Value Targets**: Attackers often target system directories for persistence
- **Integrity Verification**: Ensure system files haven't been tampered with
- **Baseline Security**: Establish known-good file inventories
- **Change Detection**: Identify unauthorized modifications or additions

In [5]:
import os
import hashlib

directory = "/usr/sbin"
hashes = []
hashes_filenames = {}
for filename in os.listdir(directory):
    filepath = os.path.join(directory, filename)
    if os.path.isfile(filepath):
        with open(filepath, "rb") as f:
            file_data = f.read()
            file_hash = hashlib.md5(file_data).hexdigest()
            hashes.append(file_hash)
            hashes_filenames[file_hash] = filename
print(f"Computed {len(hashes)} hashes from files in {directory}")

response = hashlookup.post(
    f"https://hashlookup.circl.lu/bulk/md5", json={"hashes": hashes}
)
if response.status_code == 200:
    data = response.json()
else:
    print("Error querying Hashlookup:", response.status_code)

print("Files with not found hashes:")
not_found_count = 0
for file_hash in hashes:
    found = False
    for result in data:
        if result["MD5"].upper() == file_hash.upper():
            found = True
            break
    if not found:
        print(f"- {hashes_filenames.get(file_hash, 'Unknown file')}: {file_hash}")
        not_found_count += 1
print(f"Total not found hashes: {not_found_count}")

Computed 594 hashes from files in /usr/sbin
Files with not found hashes:
- cracklib-check: baf62fc2abd604a9409f75721cf9f98a
- grub-set-default: af851bdbc8ac0a7c4019ec0a890db4e1
- kvmexit-bpfcc: 81d2956c01679bba273dd6a50974f86f
- kpppoed: d498c8afafcf43dfab3232d24e1578df
- dcstat-bpfcc: e4f80da4c092fdbcc5d869cb12279b75
- telinit: fdfe2a2115cb8a5c02977f93db0a27d1
- getcap: 4d3a58e865b99c906d668b59ed1cd7dd
- NetworkManager: 1f8d6e5ba7a3769a1f3722964bf4a186
- runqlat-bpfcc: 5e653bef163c67339e91576f6573568b
- ntfsundelete: e291a1023d66e6d8cd78090e70f20680
- fixparts: c1b92152aeb14bc091b9ff7c2d99c6dc
- statsnoop-bpfcc: 70e1fad21f0872e849d6d80aea04d670
- filegone-bpfcc: 4ed5498928b114e2c3b9a285cf36d0e4
- sudo_logsrvd: 9975be4024de68184bc6cdf6f0f1c473
- e4defrag: 3b459d991f1f4c5e0d18b21d6aed01a7
- slabratetop-bpfcc: 20be993bfd5a5f54b8eb7649dab3cd77
- e2mmpstatus: 05dc1814befcdb27671ccd73d226f2f5
- runqlen-bpfcc: ade5a54bfe42184408cb89dcf1cb13db
- rcvboxautostart-service: 8c39f0ca454771e1f3cb5b

### Exercise 1.3: MISP and Hashlookup Intelligence Correlation

**Objective:** Demonstrate integration between MISP threat intelligence platform and Hashlookup file intelligence service.

**Understanding Intelligence Correlation:**
This exercise shows how to combine different threat intelligence sources:
- **MISP Integration**: Extract file hashes from threat intelligence events
- **Hash Enrichment**: Enhance MISP indicators with Hashlookup intelligence
- **Cross-Platform Analysis**: Correlate data from multiple intelligence sources
- **Comprehensive Profiling**: Build complete threat intelligence pictures

**MISP as Hash Source:**
MISP contains extensive collections of file hashes from:
- **Malware Analysis**: Hashes of analyzed malicious files
- **Incident Reports**: IOCs from security incidents and breaches
- **Threat Intelligence Feeds**: Commercial and open source threat data
- **Community Sharing**: Collaborative threat intelligence from security teams

**Intelligence Enhancement Process:**
1. **Hash Extraction**: Retrieve MD5 hashes from MISP attribute database
2. **Bulk Analysis**: Submit MISP hashes to Hashlookup for enrichment
3. **Data Correlation**: Match Hashlookup results with MISP context
4. **Intelligence Fusion**: Combine threat context with file intelligence

**Use Cases:**
- **Threat Hunting**: Search for known malicious files across enterprise systems
- **IOC Validation**: Verify threat indicators before deploying to security tools
- **Incident Response**: Quickly assess file reputation during investigations
- **Intelligence Analysis**: Build comprehensive threat actor profiles
### Exercise 1.3: MISP and Hashlookup Intelligence Correlation

**Objective:** Demonstrate integration between MISP threat intelligence platform and Hashlookup file intelligence service.

**Understanding Intelligence Correlation:**
This exercise shows how to combine different threat intelligence sources:
- **MISP Integration**: Extract file hashes from threat intelligence events
- **Hash Enrichment**: Enhance MISP indicators with Hashlookup intelligence
- **Cross-Platform Analysis**: Correlate data from multiple intelligence sources
- **Comprehensive Profiling**: Build complete threat intelligence pictures

**MISP as Hash Source:**
MISP contains extensive collections of file hashes from:
- **Malware Analysis**: Hashes of analyzed malicious files
- **Incident Reports**: IOCs from security incidents and breaches
- **Threat Intelligence Feeds**: Commercial and open source threat data
- **Community Sharing**: Collaborative threat intelligence from security teams

**Intelligence Enhancement Process:**
1. **Hash Extraction**: Retrieve MD5 hashes from MISP attribute database
2. **Bulk Analysis**: Submit MISP hashes to Hashlookup for enrichment
3. **Data Correlation**: Match Hashlookup results with MISP context
4. **Intelligence Fusion**: Combine threat context with file intelligence

**Use Cases:**
- **Threat Hunting**: Search for known malicious files across enterprise systems
- **IOC Validation**: Verify threat indicators before deploying to security tools
- **Incident Response**: Quickly assess file reputation during investigations
- **Intelligence Analysis**: Build comprehensive threat actor profiles

**Analysis Questions:**
- How many MISP hashes were successfully enriched vs not found in Hashlookup?
- Are there hashes with conflicting reputations between sources? How should they be prioritized?
- Which additional fields from Hashlookup (filename, size, sources) helped confirm or refute MISP context?
- What confidence thresholds will you use to automate tagging or blocking actions?
- Which parts of this workflow can be safely automated, and which require analyst review?

In [6]:
import pymisp
import urllib3
import getpass

MISP_BASEURL = "https://training.misp-community.org"
MISP_API_KEY = getpass.getpass("Enter your MISP AuthKey:")

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

misp = pymisp.PyMISP(MISP_BASEURL, MISP_API_KEY, ssl=False)  # Disable SSL verification
print(f"Connected to MISP {misp.root_url} running version: {misp.version['version']}")

Enter your MISP AuthKey: ········


Connected to MISP https://training.misp-community.org running version: 2.5.17.2


In [7]:
attributes = misp.search(
    "attributes",
    type_attribute=["md5"],
    pythonify=True,
)
hashes = [attr.value for attr in attributes]
print(f"Retrieved {len(hashes)} MD5 hashes from MISP.")

response = hashlookup.post(
    f"https://hashlookup.circl.lu/bulk/md5", json={"hashes": hashes}
)
if response.status_code == 200:
    print ("Hashlookup Result:", json.dumps(response.json(), indent=4))
else:
    print("Error querying Hashlookup:", response.status_code)

Retrieved 181 MD5 hashes from MISP.
Hashlookup Result: [
    {
        "KnownMalicious": "malshare.com",
        "MD5": "B7B5E1253710D8927CBE07D52D2D2E10",
        "SHA-1": "596F1FDB5A3DE40CCCFE1D8183692928B94B8AFB",
        "SHA-256": "EAE876886F19BA384F55778634A35A1D975414E83F22F6111E3E792F706301FE",
        "SSDEEP": "1536:YZKZMY2546PTNGS719+T0GDGPWW2XTAJP7FD8OUFB4VH9QNWPWBLZ:RX2C29+4G8WW2XTO7L8OUGX9QNWP6"
    },
    {
        "FileName": "usr/share/windows/nirsoft/NirSoft/advancedrun.exe",
        "FileSize": "91000",
        "MD5": "17FC12902F4769AF3A9271EB4E2DACCE",
        "SHA-1": "9A4A1581CC3971579574F837E110F3BD6D529DAB",
        "SHA-256": "29AE7B30ED8394C509C561F6117EA671EC412DA50D435099756BBB257FAFB10B",
        "SSDEEP": "1536:JW3osrWjET3tYIrrRepnbZ6ObGk2nLY2jR+utQUN+WXim:HjjET9nX0pnUOik2nXjR+utQK+g3",
        "TLSH": "T14F936D4363E44466E5F31E306A7977228FB1BD32AA70C50F9728BA4E2CB0B61D931757",
        "tar:gname": "root",
        "tar:uname": "root"
    },
    {
        "K

### Task - Use other third-party services to investigate filehashes
- Use the VirusTotal web UI or API to collect a small set of MD5 hashes (export a CSV or copy a few sample hashes).
- Put those MD5 values into the notebook (e.g., assign them to the existing `hashes` list or save to a file and load them).
- Reuse the already-configured `hashlookup` session and run the notebook cell that posts to the `/bulk/md5` endpoint to enrich those hashes.
- After enrichment, check:
    - how many hashes were found vs not found,
    - filenames/sizes/sources returned,
    - any reputation or vendor classifications indicating maliciousness.

In [None]:

# Your code here...
