# Introduction

This notebook provides a quick introduction to the `phishing-web-collector` library, which allows you to collect phishing domains from various feeds. 

It provides a simple way to retrieve and store phishing domains in a separate CSV file.
By default, all raw data collected from the feeds is stored in a local directory configured by `storage_path` parameter of `FeedManager` class.

# Installation


In [30]:
!pip install phishing-web-collector>=0.1.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Import libraries  

In [31]:
import csv

import phishing_web_collector as pwc

# Configure phishing feeds


In [32]:
manager = pwc.FeedManager(
    sources=[
        pwc.FeedSource.BINARY_DEFENCE_IP,
        pwc.FeedSource.BLOCKLIST_DE_IP,
        pwc.FeedSource.BOTVRIJ,
        pwc.FeedSource.C2_INTEL_DOMAIN,
        pwc.FeedSource.C2_TRACKER_IP,
        pwc.FeedSource.CERT_PL,
        pwc.FeedSource.DANGEROUS_DOMAINS,
        pwc.FeedSource.GREEN_SNOW_IP,
        pwc.FeedSource.MIRAI_SECURITY_IP,
        pwc.FeedSource.OPEN_PHISH,
        pwc.FeedSource.PHISHING_ARMY,
        pwc.FeedSource.PHISHING_DATABASE,
        pwc.FeedSource.PHISH_STATS_API,
        pwc.FeedSource.PHISH_TANK,
        pwc.FeedSource.PROOF_POINT_IP,
        pwc.FeedSource.THREAT_VIEW_DOMAIN,
        pwc.FeedSource.TWEET_FEED,
        pwc.FeedSource.URL_ABUSE,
        pwc.FeedSource.URL_HAUS,
        pwc.FeedSource.VALDIN,
    ],
    storage_path="feeds_data",
)

# Retrieve feeds

In [33]:
entries = await manager.retrieve_all()

# Transform entries to CSV

In [34]:

phishing_domains = [pwc.get_domain_from_url(item.url) for item in entries]

with open("phishing_domains.csv", mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["Domain"])
    for domain in phishing_domains:
        writer.writerow([domain])
