This repo provides a dataset with 388448 URLs labelled with 0 or 1, where 1 represents malicious URL. This work was done in early 2016. For demonstration purpose, I have trained a simple Logistic Regression model and have created a simple web app using Flask. Please note that this implementation is by no means the state-of-the-art, there are number of ways we can improve this model. First of all, you might get better result with deep neural networks (i.e Recurrent Neural Network). Secondly, directly using URL string as an input is not a good idea. We need to perform feature engineering and find better features(i.e using web page content or ip/host details). The data was collected from many sources, then it was merged and preprocessed. One of the sources is this.
-
Notifications
You must be signed in to change notification settings - Fork 0
bhattsameer/Malicious-URL-Detection-using-Machine-Learning
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
About
A Dataset for the task of Malicious URL Detection
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Jupyter Notebook 71.9%
- Python 25.1%
- HTML 3.0%