Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.md
feature_extraction.py
models.py
raw_data_agg.py
url_and_whois_utils.py

README.md

Dangerous Links

The goal of this project is to develop and test machine learning models that could be used to determine if a site is malicious based off its URL and WHOIS information.

Datasets Used: https://www.unb.ca/cic/datasets/url-2016.html

Python Library for WHOIS: https://pypi.org/project/whois/#description

Publications Referenced for Feature Extraction:

Structure of Directory:

  • feature_extraction.py - the script that was run to create the engineered features that were used in the models
  • models.py - code for training and testing the models for this project
  • raw_data_agg.py - this is the Python script that was run to aggregate together all of the original URL and WHOIS info (takes a long time to run since it has to search WHOIS for every URL)
  • url_and_whois_utils.py - this is a collection of functions to make collecting the data and creating the features easier and cleaner
You can’t perform that action at this time.