Skip to content

Kmohamedalie/Phishing-Websites

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

Complete JupyterNotebook: Link

Metrics:

Algorithm Precision Recall F1-score Accuracy
Xgboost 97.01% 97.01% 97.01% 97.01%

Additional Information about the dataset

Creators: Rami Mohammad, Lee McCluskey

This dataset collected mainly from: PhishTank archive, MillerSmiles archive, Google’s searching operators.

One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.