ML with Phishing

The files in this repo will be used for my "Machine Learning with Phishing" series on my blog, please refer to the original blog posts to have a better understanding of this project.

The dataset used contains the 10 'baseline features' that were selected in this study.

Features

Here are the features used for this project:

No.	Identifier	Value type	Description
1	NumDash	Discrete	Counts the number of "-" in webpage URL.
2	NumNumericChars	Discrete	Counts the number of numeric characters in the webpage URL.
3	NumSensitiveWords	Discrete	Counts the number of sensitive words (i.e., "secure", "account", "webscr", "login","ebayisapi", "signin", "banking", "confirm") in webpage URL.
4	PctExtHyperlinks	Continuous	Counts the percentage of external hyperlinks in webpage HTML source code.
5	PctNullSelfRedirectHyperlinks	Continuous	Counts the percentage of hyperlinks fields containing empty value, self-redirect value such as "#", the URL of current webpage, or some abnormal value such as "file://E:/".
6	FrequentDomainNameMismatch	Binary	Checks if the most frequent domain name in HTML source code does not match the webpage URL domain name.
7	SubmitInfoToEmail	Binary	Check if HTML source code contains the HTML "mailto" function.
8	PctExtResourceUrlsRT	Categorical	Counts the percentage of external resource URLs in webpage HTML source code. Apply rules and thresholds to generate value.
9	ExtMetaScriptLinkRT	Categorical	Counts percentage of meta, script and link tags containing external URL in the attributes. Apply rules and thresholds to generate value.
10	PctExtNullSelfRedirectHyperlinksRT	Categorical	Counts the percentage of hyperlinks in HTML source code that uses different domain names, starts with "#", or using "JavaScript ::void(0)". Apply rules and thresholds to generate value.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ep1_decisiontree		ep1_decisiontree
ep2_randomforest		ep2_randomforest
.gitignore		.gitignore
README.md		README.md
phishing_smaller.csv		phishing_smaller.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ep1_decisiontree

ep1_decisiontree

ep2_randomforest

ep2_randomforest

.gitignore

.gitignore

README.md

README.md

phishing_smaller.csv

phishing_smaller.csv

Repository files navigation

ML with Phishing

Features

About

Languages

andpalmier/MLWithPhishing

Folders and files

Latest commit

History

Repository files navigation

ML with Phishing

Features

About

Topics

Resources

Stars

Watchers

Forks

Languages