Skip to content

Security tool that scans URLs and predicts if they are malicious or not, based on a Logistic Regression algorithm.

Notifications You must be signed in to change notification settings

IvanHanonoCozzetti/URL-Malware-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

URL-Malware-Analyzer

Security tool that scans URLs and predicts if they are malicious or not. The prediction is based on a list of URLs, the respective labels(there is an extensive amount of lists available online that can be used) and the Logistic Regression algorithm model (scikit learn).

Vectorization

Vectorization is the general process of turning a collection of text documents into numerical feature vectors. This specific strategy (tokenization, counting and normalization) is called the Bag of Words or “Bag of n-grams” representation.
Documents are described by word occurrences while completely ignoring the relative position information of the words in the document.

sklearn.feature_extraction.text
Text Analysis is a major application field for machine learning algorithms. However the raw data, a sequence of symbols cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length.

About

Security tool that scans URLs and predicts if they are malicious or not, based on a Logistic Regression algorithm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages