# Introduction

This notebook provides a quick introduction to the `phishing-web-collector` library, which allows you to collect phishing domains from various feeds. 

It provides a simple way to retrieve and store phishing domains in a separate CSV file.
By default, all raw data collected from the feeds is stored in a local directory configured by `storage_path` parameter of `FeedManager` class.

# Installation


In [1]:
!pip install phishing-web-collector>=0.2.1


[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


# Import libraries  

In [3]:
import csv

import phishing_web_collector as pwc

# Configure phishing feeds


In [4]:
manager = pwc.FeedManager(
    sources=[
        pwc.FeedSource.AD_GUARD_HOME,
        pwc.FeedSource.BINARY_DEFENCE_IP,
        pwc.FeedSource.BLOCKLIST_DE_IP,
        pwc.FeedSource.BOTVRIJ,
        pwc.FeedSource.C2_INTEL_DOMAIN,
        pwc.FeedSource.C2_TRACKER_IP,
        pwc.FeedSource.CERT_PL,
        pwc.FeedSource.DANGEROUS_DOMAINS,
        pwc.FeedSource.GREEN_SNOW_IP,
        pwc.FeedSource.MALWARE_WORLD,
        pwc.FeedSource.MIRAI_SECURITY_IP,
        pwc.FeedSource.OPEN_PHISH,
        pwc.FeedSource.PHISHING_ARMY,
        pwc.FeedSource.PHISHING_DATABASE,
        pwc.FeedSource.PHISH_STATS,
        pwc.FeedSource.PHISH_TANK,
        pwc.FeedSource.PROOF_POINT_IP,
        pwc.FeedSource.THREAT_VIEW_DOMAIN,
        pwc.FeedSource.TWEET_FEED,
        pwc.FeedSource.URL_ABUSE,
        pwc.FeedSource.URL_HAUS,
        pwc.FeedSource.VALDIN,
    ],
    storage_path="feeds_data",
)

# Retrieve feeds

In [5]:
entries = await manager.retrieve_all()

# Transform entries to CSV

In [7]:

phishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]

with open("phishing_domains.csv", mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Domain"])
    for domain in phishing_domains:
        writer.writerow([domain])
print("First 10 phishing domains:")
print(phishing_domains[:10])

First 10 phishing domains:
['swingingtxcpl.tumblr.com', 'xinchaochcdfe.com', 'wingsofhewind.tk', 'ad237.ezcybersearch.com', 'r9---sn--pj2-2v1e.googlevideo.com', 'tracking.etidning.norrteljetidning.se', 'seducio.com', 'r16---sn-a5mekn7d.googlevideo.com', 'iampornholio-deactivated2015111.tumblr.com', 'celebrity-nude-pictures.com']
