The Collection phase is the starting point of the threat intelligence lifecycle. Its goal is to gather raw data from a wide range of sources, focusing on public information and passive reconnaissance. This phase is critical because the quality and breadth of collected data directly influence the effectiveness of subsequent enrichment, analysis, and dissemination.

Goal: Gather raw, unfiltered threat data. How: Passive reconnaissance and open-source intelligence (OSINT).
Methods:

¬∑ Public data, web searches, threat feeds
. No interaction with the target's internal systems
¬∑ Passive scanning


Key Principles


Passive Data Gathering:
Collection should be performed without interacting directly with or probing the target environment. Only public data is gathered, such as OSINT (Open Source Intelligence), vendor advisories, and threat feeds. [Pascal Thr...telligence | Word]


Legal and Ethical Boundaries:
Only passive browsing is permitted‚Äîopen pages, download public documents, and take screenshots. Active probing, testing discovered services, or attempting to log in is strictly prohibited and may be illegal. Respect robots.txt and site terms of service. Automated scraping of Google search results is not allowed; use official APIs for automation

Documentation:
Every query used, date/time, URL, screenshot, and the reason for its relevance should be documented. This forms the evidence chain for the assessment.


# <h1>üîç Google Dorking</h1>

---

## <h2>1. Navigate to your browser </h2>

1. Open **Kali Linux** (or your workstation) ‚Üí open **Firefox ESR** <br>**Path:**
   `Applications ‚Üí Internet ‚Üí Firefox ESR`

---

## <h2>2. What is Google Dorking?</h2>

Google Dorking is using **advanced Google search operators** to find information that isn‚Äôt obvious in normal searches.

### üí° **Simple Description**

Think of Google as a giant library:

* Normal search = ‚ÄúWhere are the cookbooks?‚Äù
* Google Dorking = ‚ÄúShow me Italian cookbooks with handwritten notes on page 3.‚Äù

It‚Äôs about asking **very precise** questions.

### üí¨ Understanding Google Dorking

* Normal Google: ‚ÄúWhere is the bakery?‚Äù
* Google Dorking: ‚ÄúShow me all PDF invoices from the bakery stored publicly on this website.‚Äù

### üéØ Why It Matters for This Project

Google Dorking helps uncover:

* publicly accessible documents
* configurations
* login pages
* exposed credentials
* internal reports
* hidden subdomains

Attackers use these for **OSINT, phishing, and initial access**.
You will use them for **ethical analysis**.

---

## <h2>3. Safety & Legal Rules (Read Carefully)</h2>

<div style="border:1px solid #ccc; padding:10px; border-radius:6px; background:#fff8e6;">
‚úî **Allowed:** passive browsing, opening pages, downloading public docs, taking screenshots.  
‚ùå **Not allowed:** logging in, probing systems, scanning, brute forcing, exploiting.  
</div>

### üö® Mandatory Rules

1. **Only passive recon**: open pages, take notes, don‚Äôt interact.
2. **Never attempt login** or submit forms.
3. **Respect robots.txt** and Google‚Äôs Terms of Service.
4. For automation, use approved APIs (Google Custom Search, SerpAPI).
5. If you find sensitive data (PII, medical info):

   * redact
   * document
   * report (if authorized)
6. **Document everything**:

   * query
   * URL
   * date/time
   * screenshot
   * significance

---

# <h2>4. Core Google Dorking Operators</h2>



<table>
<tr><th>Operator</th><th>Meaning</th></tr>
<tr><td><code>site:</code></td><td>Restrict search to a domain</td></tr>
<tr><td><code>filetype:</code> / <code>ext:</code></td><td>Find specific file types (pdf, xlsx, sql...)</td></tr>
<tr><td><code>intitle:</code></td><td>Look for keywords in the page title</td></tr>
<tr><td><code>inurl:</code></td><td>Look for keywords in URL</td></tr>
<tr><td><code>intext:</code></td><td>Look for keywords in page body</td></tr>
<tr><td><code>"exact phrase"</code></td><td>Exact phrase match</td></tr>
<tr><td><code>*</code></td><td>Wildcard</td></tr>
<tr><td><code>OR</code></td><td>Include multiple keywords</td></tr>
<tr><td><code>cache:</code></td><td>View Google cached version</td></tr>
<tr><td><code>related:</code></td><td>Find similar websites</td></tr>
<tr><td><code>allintitle:</code></td><td>Require all keywords in title</td></tr>
<tr><td><code>allinurl:</code></td><td>Require all in URL</td></tr>
<tr><td><code>-keyword</code></td><td>Exclude keyword</td></tr>
</table>

---

# <h2>5. Exact Search Examples</h2>

---

## <h3>üîê A. Search for Exposed Credentials</h3>

```
site:nhs.uk "password" OR "passwd" OR "credentials"
```

### Why it matters

Developers sometimes accidentally leave:

* config files
* changelogs
* old documentation

that contain terms like **"password"** or **"credentials"**.

### Description

Searching only in your kitchen (**site:nhs.uk**) for labels that say:

* sugar
* salt
* spice

Same concept: narrow location + keywords.

### Screenshots

<img src="pic/ti1.png" alt="Screenshot 1 - Exposed credentials example" width="600"/>

<img src="pic/ti2.png" alt="Screenshot 2 - Exposed credentials example" width="600"/>

<img src="pic/ti3.png" alt="Screenshot 3 - Exposed credentials example" width="600"/>

<img src="pic/ti4.png" alt="Screenshot 4 - Exposed credentials example" width="600"/>

---

## <h3>üóÇ B. Search for Backup Folders</h3>

```
site:nhs.uk "index of" "backup"
```

### Why it matters

Backup folders may contain:

* old configs
* logs
* database dumps
* sensitive data

### Description

Looking inside a closet and seeing dusty boxes labeled ‚Äúbackup‚Äù.


‚Ä¢ 	"index of": This phrase often shows up when a web server is listing files/folders.
‚Ä¢ 	"backup": You‚Äôre asking if any of those folders are named ‚Äúbackup.‚Äù


### Screenshots

<img src="pic/ti5.png" alt="Screenshot - index of backup" width="600"/>
<img src="pic/ti6.png" alt="Screenshot - backup folder navigation" width="600"/>
<img src="pic/ti7.png" alt="Screenshot - parent directory" width="600"/>
<img src="pic/ti8.png" alt="Screenshot - parent directory" width="600"/>

---

## <h3>üîë C. Search for Login/Admin Pages</h3>

```
site:nhs.uk intitle:"login" OR inurl:"/admin"
```

‚Ä¢ 	intitle:"login": Look for pages with ‚Äúlogin‚Äù in the title.
‚Ä¢ 	inurl:"/admin: Look for pages with ‚Äú/admin‚Äù in the web address.



### Why attackers look for these

They reveal:

* login portals
* staff portals
* admin dashboards
* vendor systems

### Screenshots

<img src="pic/ti9.png" alt="Screenshot - login pages" width="600"/>
<img src="pic/ti10.png" alt="Screenshot - login page result" width="600"/>

---

## <h3>üìì D. Search for Log Files</h3>

```
site:nhs.uk filetype:log OR "logfile"
```

Description: Like reading someone‚Äôs diary ‚Äî everything step-by-step.

<img src="pic/ti11.png" alt="Screenshot - log files" width="600"/>
<img src="pic/ti12.png" alt="Screenshot - log files" width="600"/>

---

## <h3>üß© E. Search for Configuration Files</h3>

```
site:nhs.uk filetype:env OR filetype:ini OR filetype:sql
```

‚Ä¢ 	filetype:env ‚Üí Look for  files (environment configs, often with API keys).
‚Ä¢ 	filetype:ini ‚Üí Look for  files (settings/configs).
‚Ä¢ 	filetype:sql ‚Üí Look for  files (database dumps).

### Why dangerous

These may contain:

* API keys
* database credentials
* system paths
* environment variables

Description: Like finding recipe cards in your kitchen drawer, some might have secret family recipes (API keys, credentials).

These files can reveal sensitive information if exposed

---

## <h3>üîß F. Find Exposed APIs</h3>

```
site:nhs.uk inurl:"api" filetype:json
```

Description: A vending machine with the **instruction manual** taped to it.

---

## <h3>üìÑ G. Find Public Documents (PDF, DOCX, XLSX, CSV)</h3>

```
site:nhs.uk filetype:pdf
site:nhs.uk filetype:docx
site:nhs.uk filetype:xlsx
site:nhs.uk filetype:csv
```

These often contain:

* internal reports
* staff lists
* vendor data
* configuration details

Example: site:nhs.uk filetype:xlsx

What it does
‚Ä¢ 	site:nhs.uk ‚Üí Only search inside the NHS website.
‚Ä¢ 	filetype:xlsx ‚Üí Only look for Excel spreadsheet files ().

It's like saying ‚ÄúShow me Excel spreadsheets that are publicly visible on the NHS website.‚Äù

Screenshots

<img src="pic/ti14.png" alt="Screenshot - pdf finding" width="600"/>
<img src="pic/ti15.png" alt="Screenshot - pdf finding" width="600"/>

---

## <h3>üìß H. Find Public Staff Emails</h3>

Why it matters: finds pages where email addresses are published (useful for spear-phishing risk).

```
site:nhs.uk "@nhs.uk"
site:nhs.uk "contact" "email"
site:nhs.uk "staff" "email"
```

Risk: Spear-phishing, credential stuffing.

<img src="pic/ti16.png" alt="Screenshot - staff email search" width="600"/>

---



## <h3>üìÑ 1. Exposed Documents</h3>

```
site:rhs.org filetype:pdf
site:rhs.org filetype:docx
site:rhs.org filetype:xlsx
site:rhs.org filetype:csv
```

---

## <h3>üîê 2. Internal Docs & Credentials</h3>

```
site:nhs.uk intext:"password"
site:nhs.uk intext:"username"
site:nhs.uk intext:"confidential"
site:nhs.uk "passw*"
```

---

## <h3>üîë 3. Login Pages & Admin Panels</h3>

```
site:nhs.uk inurl:login
site:nhs.uk inurl:admin
site:nhs.uk inurl:signin
site:nhs.uk inurl:portal
site:nhs.uk intitle:"admin"
```

---

## <h3>üåê 4. Subdomains & Hidden Hosts</h3>

```
site:nhs.uk-www
site:*.nhs.uk
site:nhs.uk inurl:dev OR inurl:test OR inurl:stage
```
<img src="pic/ti19.png" alt="Screenshot - staff email search" width="600"/>

```
site:nhs.uk "VPN" OR "vpn"
```
<img src="pic/ti20.png" alt="Screenshot - staff email search" width="600"/>

---

## <h3>üíæ 5. Code Repositories</h3>

```
site:github.com "nhs.uk"
```

<img src="pic/ti17.png" alt="Screenshot - staff email search" width="600"/>
<img src="pic/ti18.png" alt="Screenshot - staff email search" width="600"/>

```
site:gitlab.com "nhs.uk"
site:github.com "api.nhs.uk"
```

---

## <h3>‚òÅÔ∏è 6. Cloud Storage Leakage</h3>

```
site:amazonaws.com nhs.uk
site:drive.google.com "nhs.uk"
site:docs.google.com "nhs.uk"
site:pastebin.com "nhs.uk"
```

---

## <h3>üìä 7. Internal Spreadsheets / Reports</h3>

```
site:sharepoint.com "nhs.uk"
```

<img src="pic/ti21.png" alt="Screenshot - staff email search" width="600"/>
site:templates.office.com "nhs.uk"
site:docs.google.com/file "nhs.uk"


---

## <h3>üîì 8. Leaked Credentials</h3>

```
site:pastebin.com "nhs.uk"
intext:"nhs.uk" "password" site:pastebin.com
```
<img src="pic/ti22.png" alt="Screenshot - staff email search" width="600"/>
---

## <h3>üîë 9. API Keys / AWS Keys</h3>


Why it matters: Attackers harvest leaked API keys from public repos.

```
site:github.com "AWS_ACCESS_KEY_ID" "nhs"
```
<img src="pic/ti23.png" alt="Screenshot - staff email search" width="600"/>

```
site:github.com "aws_secret_access_key" "nhs"
site:github.com "PRIVATE_KEY" "nhs"
```

---

## <h3>üóÑ 10. Database Dumps</h3>

Why it matters: SQL dumps often contain PII and credentials.

```
site:nhs.uk filetype:sql
site:nhs.uk intext:"INSERT INTO" | filetype:sql
```

<img src="pic/ti24.png" alt="Screenshot - staff email search" width="600"/>
---

## <h3>üìú 11. Certificates / Expiration</h3>

Why it matters: Cross-correlate with crt.sh findings.

```
site:nhs.uk "expires" "certificate"
site:nhs.uk "not after" "certificate"
```
<img src="pic/ti25.png" alt="Screenshot - staff email search" width="600"/>

---

## <h3>üìù 12. PDF Metadata</h3>

Why it matters: PDF metadata sometimes shows internal server paths or usernames.

```
site:nhs.uk filetype:pdf "Created by" OR "Producer" OR "XMP"
```

<img src="pic/ti26.png" alt="Screenshot - staff email search" width="600"/>
---

## <h3>üè• 13. Vendor / Third-Party Exposure</h3>

Why it matters: Third-party providers can be a vector (example: Synnovis).

```
site:nhs.uk "Synnovis" OR "Supplier" OR "Partner" OR "pathology"
```
<img src="pic/ti27.png" alt="Screenshot - staff email search" width="600"/>

---

## <h3>üè• 13. Find error messages</h3>


Why it matters: It helps finding misconfigured files or exposed config files.

<img src="pic/ti13.png" alt="Screenshot - staff email search" width="600"/>





# <h2>7. Findings </h2>


### üî• High Severity

* Credentials
* API keys
* Database dumps
* PII / medical info

### ‚ö†Ô∏è Medium Severity

* internal PDFs
* internal configurations
* login/admin pages

### üü¢ Low Severity

* press releases
* public blog posts





#  **THEHARVESTER ‚Äî Passive Reconnaissance on NHS (OSINT Report)**

---

##  **Overview**

We used **theHarvester** as a passive OSINT tool to collect publicly available emails, hostnames, and IP addresses for the domain **nhs.uk**.

This simulates how real attackers perform reconnaissance before phishing or ransomware operations.

### **Total Findings**

* **59 IPs**
* **17 ASNs**
* **39 emails**
* **1,066 unique hostnames**

### **Key Flags**

We are:

* **Not breaking in**
* **Not touching internal systems**
* **Only collecting public internet-facing evidence**

---

#  Setup

### **Open Kali & Create a Workspace**

1. Open **Kali Linux**
2. Click the **Terminal** icon (black screen on the left)

---

#  STEP 1 ‚Äî What theHarvester Does

**theHarvester is like a supercharged Google for OSINT.**
You tell it:

> ‚ÄúGo search the entire internet for ANY emails, subdomains, or IPs related to this company.‚Äù

It then:

* Queries search engines
* Queries certificate transparency logs
* Queries DNS
* Queries public-facing identity/login endpoints
* Collects everything
* Gives a structured OSINT report

**It does NOT hack; it only gathers public information.**

---

#  STEP 2 ‚Äî Run theHarvester

### **Command**

```bash
theHarvester -d nhs.uk -b all
```

<img src="pic/ti28.png" alt="Screenshot - staff email search" width="600"/>

### **Breakdown**

* `theHarvester` ‚Üí tool
* `-d nhs.uk` ‚Üí domain
* `-b all` ‚Üí use all data sources (google, bing, crtsh, yahoo, linkedin*, shodan‚Ä¶)

  * Google now blocks most automated queries
  * Bing remains consistent for OSINT

### **Optional: Increase Results**

```bash
-l 500
```

Meaning: ‚Äúshow everything up to 500 results.‚Äù

---

#  STEP 3 ‚Äî Watch theHarvester Run

*screenshots*

<img src="pic/ti29.png" alt="theHarvester output 0" width="600"/>
`<img src="pic/ti30.png" alt="theHarvester output 1">`
`<img src="pic/ti31.png" alt="theHarvester output 2">`
`<img src="pic/ti32.png" alt="theHarvester output 3">`

---

#  STEP 4 ‚Äî Save & Extract Clean Lists

You should now have files like:

```
~/raw/nhs_harvester_full.txt
~/evidence/emails.txt
~/evidence/hosts.txt
```

<img src="pic/ti33.png" alt="theHarvester output 3">
<img src="pic/ti34.png" alt="theHarvester output 3">

---

#  EMAIL ANALYSIS (OSINT)

Emails are like **doorways for attackers** ‚Üí phishing, credential stuffing, hierarchy mapping.

---
Screenshot

<img src="pic/ti35.png" alt="theHarvester output 3">

What we found:

## **1. General Contact & Info Emails**

**Examples**

* [charity@nottshc.nhs.uk](mailto:charity@nottshc.nhs.uk)
* [reception.w97061@wales.nhs.uk](mailto:reception.w97061@wales.nhs.uk)
* [enquiries@leadershipacademy.nhs.uk](mailto:enquiries@leadershipacademy.nhs.uk)

**Reveals**

* Public reception, Trust-level queries
* Org hierarchy indirectly

**Attacker Interest**

* Classic phishing targets
* Department mapping

---

## **2. Service / Department Emails**

**Examples**

* [learning@cpft.nhs.uk](mailto:learning@cpft.nhs.uk)
* [neurologyadmin@uhs.nhs.uk](mailto:neurologyadmin@uhs.nhs.uk)
* [electivecare@nhsdorset.nhs.uk](mailto:electivecare@nhsdorset.nhs.uk)

**Reveals**

* Internal workflows
* Medical service specialization

**Attacker Interest**

* Spear-phishing
* Role enumeration

---

## **3. Patient / Supplier Interfaces**

**Examples**

* [pals@snee.nhs.uk](mailto:pals@snee.nhs.uk)
* [patientexperience@dchft.nhs.uk](mailto:patientexperience@dchft.nhs.uk)
* [qualityteam@supplychain.nhs.uk](mailto:qualityteam@supplychain.nhs.uk)

**Reveals**

* Patient feedback pipelines
* Vendor communication channels

**Attacker Interest**

* Fraud
* Social engineering of patients

---

## **4. Trust / Organization-Level Emails**

**Examples**

* [headquarters@dchft.nhs.uk](mailto:headquarters@dchft.nhs.uk)
* [nhsbsa.nhsjobs@nhsbsa.nhs.uk](mailto:nhsbsa.nhsjobs@nhsbsa.nhs.uk)

**Reveals**

* HR
* Recruitment
* Organizational processes

**Attacker Interest**

* HR-themed phishing
* Insider-style attacks

---

## **5. Third-Party / Vendor Emails**

**Examples**

* [cmartorella@edge-security.com](mailto:cmartorella@edge-security.com)
* Synnovis-related references

**Reveals**

* External vendors
* Research partners
* Integration points

**Attacker Interest**

* Vendor compromise
* Supply-chain infiltration

---

#  HOSTNAMES ANALYSIS

Screenshot

<img src="pic/ti36.png" alt="theHarvester output 3">

### **Top Categories of Hostnames Identified**

---

## üß© **1. Dev / Staging / Test Environments**

**Examples**

* 111devon.nhs.uk
* staging.111locationconsent.nhs.uk
* test.add-correct-contact-details.nhs.uk

**Reveals**

* Public dev pipelines
* Project names
* Potential debug features

**Attacker Interest**

* Dev = **weaker security**
* Exposed **API keys, tokens, test accounts**
* Pivot paths

---

## üîê **2. Identity Portals / SSO / Login Systems**

**Examples**

* nam.dev21.signin.nhs.uk
* autodiscover.nhs.uk
* vpn1.asp.nhs.uk
* future.nhs.uk

**Reveals**

* Azure AD, ADFS, Okta, SAML
* Cross-trust authentication

**Attacker Interest**

* Password spraying
* MFA fatigue attacks
* SAML manipulation

---

## üè• **3. Hospitals, Clinics, GP Practices**

**Examples**

* abbeycourtmedicalcentre.nhs.uk
* yorkmedicalgroup.nhs.uk

**Reveals**

* Granular NHS infrastructure
* Legacy systems common

**Attacker Interest**

* Old PHP/ASP.NET servers

---

## üåê **4. Digital Platforms & Services**

**Examples**

* mychart.gslb.addenbrookes.nhs.uk
* mylearning.nhsp.nhs.uk
* nhsappvuecomponentlibraryv1.nonlive.nhsapp.service.nhs.uk

**Reveals**

* Microservices (AWS/Azure)
* VDI platforms
* Learning portals

**Attacker Interest**

* Clone UI for phishing
* Open API reconnaissance

---

## üì° **5. Supply Chain / Third-Party Services**

**Examples**

* supplychain.nhs.uk
* audit.yorkshire.nhs.uk

**Reveals**

* Heavy dependency on vendors

**Attacker Risk**

* Supplier compromise ‚Üí NHS compromise

---

## üõ†Ô∏è **6. Legacy Tech Indicators**

**Examples**

* `.php`, `.aspx`, `autodiscover.asp.nhs.uk`

**Reveals**

* Outdated tech

**Attacker Interest**

* RCE, SQLi, LFI vulnerabilities

---

#  IP ADDRESS ANALYSIS

Every found is like discovering:

‚Ä¢	which doors the organization has,
‚Ä¢	which windows exist,
‚Ä¢	and which lights are on at night.

Attackers use this to understand:

‚Ä¢	what servers exist,
‚Ä¢	which technologies they use,
‚Ä¢	and where vulnerabilities might hide.
Even one weak IP can allow ransomware access.


<img src="pic/ti37.png" alt="theHarvester output 3">

IPs reveal:

* Architecture
* Cloud providers
* Internal networks
* Legacy exposure

---

## **1. Cloud Infrastructure**

NHS uses:

* Cloudflare
* AWS
* Azure
* Akamai
* DigitalOcean
* Rackspace

**Biggest takeaway:** **hybrid architecture** means more attack surface.

---

## **2. Load Balancers / CDNs**

Examples:

* 13.227.x.x (Akamai)
* 108.138.x.x (CloudFront)
* 188.114.x.x (Cloudflare)

**Implication**

* Real servers hidden behind CDNs
* Attackers search for origin servers

---

## **3. Internal NHS Ranges**

* 164.134.x.x
* 213.121.x.x
* 195.99.125.x

**High-value**

* Internal apps
* Legacy systems
* Patient portals

---

## **4. Linux Server Footprint**

Most NHS-facing servers run Linux:

* DNS
* web
* mail
* proxy

Attackers target:

* outdated Apache/Nginx
* SSH brute force
* CVE exploitation

---

## **5. Third-Party Hosting Risks**

Examples:

* 45.x.x.x ranges
* 185.x.x.x Europe

**Supply chain exposure.**

---

## **6. Possible Forgotten Systems**

Suspicious ranges:

* 45.131.x.x
* 188.65.x.x

Often:

* old microsites
* retired projects
* deprecated portals

---

#  URL ANALYSIS

<img src="pic/ti38.png" alt="theHarvester output 3">
---

## üß© **1. Exposed Dev / Test / Pre-Prod**

Examples:

* pr-989.imms.dev.vds.platform.nhs.uk
* app.dev-cidv-3971.dev.identity-verification-service.nhs.uk

**Risks**

* Debug menus
* Hardcoded keys
* Open test accounts
  ‚Üí Common ransomware entry point (HSE Ireland, Synnovis, Travelex)

---

## üîê **2. High-Value Identity Portals**

Examples:

* adfs.addenbrookes.nhs.uk/adfs/ls/?SAMLRequest
* nhsi.okta-emea.com/oauth2
* future.nhs.uk/system/login

**Reveals**

* Azure AD
* Okta
* ADFS
* SAML flows

**Attacker Interest**

* Clone login pages
* Replay SAML
* MFA fatigue

---

## üß© **3. Third-Party Providers & Suppliers**

Examples:

* accurx.nhs.uk
* bookonline.chesterfieldroyal.nhs.uk

**Risk**

* Supplier compromise
* Weak SaaS security

*(Synnovis breach followed this exact path.)*

---

## üìú **4. Certificate Transparency / OSINT Logs**

* crt.sh
* ThreatMiner

**Reveals**

* Every NHS certificate
* Forgotten subdomains
* Shadow IT

---

## üß† **5. Internal Product/Stack Names**

Examples:

* identity-verification-service.nhs.uk
* eps.national.nhs.uk
* nrls.nhs.uk

**Reveals**

* Microservices
* Tech stack
* Regional mapping

**Attacker Uses**

* Guess internal endpoints
* Create realistic phishing themes
* Identify weak Trusts

---

## ‚ö†Ô∏è **6. Legacy Indicators**

Examples:

* login_up.php
* `*.aspx` portals

**Risk**

* Known vulnerabilities
* RCE
* SQLi
* LFI

---

#  bonus: PEOPLE

<img src="pic/ti39.png" alt="theHarvester output 3">

#  theHARVESTER key take away

TheHarvester helped us understand the organization‚Äôs digital footprint using only publicly available information. This demonstrates that attackers can build a profile of the organization without touching internal systems. Based on the results, we identified exposed emails, possible outdated subdomains, and potential attack vectors useful for phishing and social engineering.

Our OSINT shows the NHS is:

* **Massive**
* **Fragmented**
* **Filled with third-party systems**
* **Running mixed legacy + modern technologies**
* **Exposing dev/test systems**
* **Using Azure AD + ADFS + Okta across multiple Trusts**

To attackers, this means:

* Many entry points
* Weak dev/test targets
* Multiple suppliers
* Password spraying opportunities
* Real URLs for phishing
* Legacy systems easy to exploit

This matches:

* Synnovis ransomware
* WannaCry
* Hillingdon
* Birmingham Women‚Äôs Hospital


# <span style="color:#3A6EA5;">Hunter.io ‚Äî Email Discovery, Staff Enumeration & Organisational Mapping</span>

> **Hunter.io is like a professional phonebook and organisational map.**
> You enter a company name and Hunter reveals staff emails, roles, departments, and naming conventions.
>
> For an attacker, this is gold: it shows **who to phish**, **who approves payments**, and **how emails are structured**.

Hunter.io identified **109 NHS email addresses**, exposing a wide and predictable attack surface.
Cross-checking these with **HaveIBeenPwned** shows multiple accounts in historical breaches ‚Äî increasing the risk of phishing, credential stuffing, and social engineering.

---

# <span style="color:#3A6EA5;">1. Result Goal </span>

By the end of this section, we will have:

* **raw/hunter_rhs.csv** (exported directly from Hunter.io)
* **evidence/hunter_decisionmakers.csv** (role-based / senior staff)
* **evidence/hunter_generic_emails.csv** (service desks & functional mailboxes)
* **Cross-validation** with LinkedIn + TheHarvester
* **A final-report entry** you can paste into your assessment
* **Mitigation recommendations** focused on spear-phishing and decision-maker targeting

---

# <span style="color:#3A6EA5;">2. OSINT FRAMEWORK</span>

<img src="pic/ti45.png" alt="hibp output 3">


# <span style="color:#3A6EA5;">3. Hunter.io (Click-by-Click Guide)</span>

### **1) Open Firefox**

* Applications ‚Üí Internet ‚Üí **Firefox ESR**
  *(or click the Firefox dock icon)*

### **2) Navigate to Hunter.io**

Enter:

```
https://hunter.io
```
<img src="pic/ti46.png" alt="hibp output 3">

### **3) Sign in / Create a Free Account**

* Click **Sign in** (top-right)
* Or **Sign up** if new
* Free tier allows:

  * Limited searches
  * CSV export
  * Enough for student projects / learning

> *Think of it like entering a library ‚Äî Hunter requires logins to enforce fair usage.*

### **4) Go to ‚ÄúDomain Search‚Äù**

Top menu ‚Üí **Domain Search**

<img src="pic/ti47.png" alt="hibp output 3">

### **5) Enter the Target Domain**

Example:

```
nhs.uk
```

<img src="pic/ti48.png" alt="hibp output 3">
Hunter will enumerate emails connected to that domain.

---

# <span style="color:#3A6EA5;">4. What Hunter.io Discovered</span>

Hunter.io returned **109 NHS-associated email addresses**, from a mix of:

* Senior leadership
* Admin/staff support roles
* Recruitment & HR
* Procurement & supply chain
* Service desks & generic mailboxes
* Healthcare operational units

This demonstrates that the NHS has **a broad, semi-public email footprint** accessible via OSINT.

---

# <span style="color:#3A6EA5;">5. Exposure Significance</span>

## **5.1 Large External Attack Surface**

Having 109 staff emails publicly listed means:

* More potential social engineering entry points
* Attackers can impersonate internal staff
* Naming conventions & organisational structure become visible

## **5.2 Predictable & Consistent Email Pattern**

NHS uses:

```
firstname.lastname@nhs.uk
```

Predictable formats enable:

* **Account enumeration**
* **Credential stuffing**
* **Password spraying**
* **Highly targeted spear-phishing**

Even if an email isn't found in OSINT, attackers can *guess* it.

---

# <span style="color:#3A6EA5;">6. Organisational Insights from Hunter Results</span>

Hunter reveals functional mailboxes such as:

* recruitment@...
* patientexperience@...
* foi@...
* audiology@...
* qualityteam@...

This leaks:

* Department structure
* Internal workflows
* Public-facing contact points
* Potential pretexts for phishing

Attackers can turn these into believable scenarios such as:

> ‚ÄúHi, this is the Quality Team. Please review the attached incident log.‚Äù

---

# <span style="color:#3A6EA5;">7. How Attackers Exploit Hunter.io Data</span>

## **7.1 Spear-Phishing**

Attackers now know:

* Real staff names
* Real email formats
* Department roles
* Seniority

Perfect for impersonating:

* HR
* IT Support
* Procurement
* Line managers

## **7.2 Business Email Compromise (BEC)**

Common targets include:

* procurement@
* payroll@
* recruitment@
* leadershipacademy@

BEC scenarios include:

* Fake vendor invoices
* Purchase order fraud
* Payroll redirection
* Supplier impersonation

## **7.3 Credential Attacks**

If an email is breached:

* Attackers try password spraying
* Credential stuffing on NHS portals
* OWA / Office 365
* NHS VPN and SSO portals

---

# <span style="color:#3A6EA5;">8. Cross-Validation (Improves Accuracy)</span>

Attackers combine:

1. **Hunter.io emails**
2. **TheHarvester results**
3. **LinkedIn role verification**

### LinkedIn verification (manual):

* Search: `"Firstname Lastname" "NHS" Deputy Director`
* If LinkedIn confirms job and organisation, this means high-confidence identity.

Example attacker logic:

* If Hunter lists **XXXXXX ‚Äî Deputy Director**
* LinkedIn confirms employment
* That person becomes a **high-value whaling target**.

---

# <span style="color:#3A6EA5;">9. Defensive Recommendations</span>

### üîê **1. Enforce MFA for all decision makers + role-based accounts**

### üì® **2. Minimize generic mailboxes**

Use aliases and logging instead of raw shared inboxes.

### üéØ **3. Run targeted phishing simulations**

Especially for:

* Leaders
* Finance
* Procurement
* HR

### üì§ **4. Implement full email authentication**

* **DMARC**
* **DKIM**
* **SPF**

### üïµÔ∏è **5. Monitor for newly discovered staff emails**

Through:

* Hunter API
* Scheduled theHarvester scans

---

# <span style="color:#3A6EA5;">10. Alternatives & Complements if Hunter Finds Few Emails</span>

1. **Snov.io** ‚Äî duplicate Hunter functionality + CSV export
2. **Clearbit** ‚Äî deeper org/person enrichment
3. **LinkedIn Sales Navigator** ‚Äî role verification
4. **Google Dorking**
5. **theHarvester** ‚Äî already used; combine both lists

---



```markdown
Hunter.io was used to enumerate staff emails, role-based accounts, and organisational structures associated with the target NHS domain. A total of 109 valid email addresses were identified, revealing predictable naming conventions and clear departmental mappings. These findings were cross-validated with theHarvester and LinkedIn to improve confidence levels.
```

This is our folder structure:
Full report: ~/raw/nhs_harvester_full.txt   
~/evidence/
    emails.txt
    hosts.txt
    ips.txt
    urls.txt
    people.txt