The general setup for the problem is a common one: we have a single table of log lines recording Internet traffic between various sources. Traffic between a source and destination is labeled as malicious or clean in the dataset, and we'd like to be able to predict ahead of time if a future connection between a source and a destination will be malicious.
We'll demonstrate an end-to-end workflow using a Cybersecurity Dataset derived from data from the Los Alamos National Laboratory. This notebook demonstrates a rapid way to predict whether a connection (defined in several ways) is malicious.
- Quickly make end-to-end workflow using log-line cybersecurity data
- Find interesting automatically generated features
-
Clone the repo
git clone https://github.com/Featuretools/predict-malicious-cyber-connections.git
-
Install the requirements
pip install -r requirements.txt
You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation
-
Download the data
You can download the data from Amazon S3. After downloading, save the CSV to a directory called
data
in the root of this repository. -
Run the Tutorial notebooks:
jupyter notebook
Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Any questions can be directed to help@featurelabs.com