Skip to content

Latest commit

 

History

History
50 lines (30 loc) · 845 Bytes

README.md

File metadata and controls

50 lines (30 loc) · 845 Bytes

Malware Detection

Description

We are trying to built a predictive classification model which based on system configuration will predict the whether it is likely to get attacked by a malware.

Data Definition

Dataset contains 82 attributes.

Data Preparation

  1. Data Cleaning
    • Handling Missing Values
    • Skewness
  2. Categorical Data Handling
    • Category reduction
    • Case-sensitive merging
    • Special Character handling

Data Exploration

  1. Target variable
    • Distribution & Bias
  2. EDA Tasks
  3. Data Visualisation

Data Preparation for SageMaker

  1. Dataframe modification and conversion
  2. Train Test Split
  3. Storage in S3 bucket

Model Training in SageMaker

Model Inference

Model Performance

  1. Accurary
  2. F1- Score
  3. ROC and AUC curve

Feature importance

Pandas Profiling top 15 features