Skip to content

khramtsova/url_feature_extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

URL Feature extractor

Feature extractor from the paper Federated Learning For Cyber Security: SOC Collaboration For Malicious URL Detection

A code to extract lexicographical features from URLs. Takes as an input the csv file with different URLs and generates 72 features per URL.

The resulting extracted features from the dataset with more than 700K malicious and benign URLs can be found in the archive urls_final_complete.tar.xz .

Initial URL dataset represents a collection from different sources. The urls are destributed between malware, defacement, phishing, spam and benign classes. They are taken from different sources, in particular from ISCX-URL-2016, that was further augmented by:

  1. Benign: Hacker News, PhishStorm, Ebbu2017 Dataset
  2. Malware: URLHaus
  3. Phishing: Openphish, PhishTank

Resulting collection of URLs can be found here

For more details on class distribution, as well as our other experiments please conult the paper.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages