Skip to content

Data based security code for malware detection using various of boost algorithm

License

Notifications You must be signed in to change notification settings

djzdjzdl/Data_Based_Security

Repository files navigation

Data_Based_Security

Overview

  • Exploratory Dataset Analysis & Malware detection usin boost algorithm
  • Use 76 Static Features

how to run?

python main.py

EDA [Data Refining]

  • PE file Static Analysis [Input file : File Directory / Output file : pe_features.csv]
    • filename => Not used as a real feature
    • e_magic
    • e_lfanew
    • e_minalloc
    • e_ovno
    • Signature
    • Machine
    • NumberOfSections
    • TimeDateStamp
    • PointerToSymbolTable
    • NumberOfSymbols
    • SizeOfOptionalHeader
    • Characteristics
    • Magic
    • SizeOfCode
    • AddressOfEntryPoint
    • BaseOfCode
    • ImageBase
    • EntryPoint
    • SectionAlignment
    • FileAlignment
    • SizeOfImage
    • SizeOfHeaders
    • CheckSum
    • Subsystem
    • NumberOfRvaAndSizes
    • CompareNumberOfSections
    • .textSectionName
    • .textSectionVirtualSize
    • .textSection|VirtualSize-SizeOfRawData|
    • .textSectionVirtualAddress
    • .textSectionSizeOfRawData
    • .textSectionPointerToRawData
    • .textSectionCharacteristics
    • .textSectionEntropy
    • .dataSectionName
    • .dataSectionVirtualSize
    • .dataSection|VirtualSize-SizeOfRawData|
    • .dataSectionVirtualAddress
    • .dataSectionSizeOfRawData
    • .dataSectionPointerToRawData
    • .dataSectionCharacteristics
    • .dataSectionEntropy
    • .rsrcSectionName
    • .rsrcSectionVirtualSize
    • .rsrcSection|VirtualSize-SizeOfRawData|
    • .rsrcSectionVirtualAddress
    • .rsrcSectionSizeOfRawData
    • .rsrcSectionPointerToRawData
    • .rsrcSectionCharacteristics
    • .rsrcSectionEntropy
    • .rdataSectionName
    • .rdataSectionVirtualSize
    • .rdataSection|VirtualSize-SizeOfRawData|
    • .rdataSectionVirtualAddress
    • .rdataSectionSizeOfRawData
    • .rdataSectionPointerToRawData
    • .rdataSectionCharacteristics
    • .rdataSectionEntropy
    • .relocSectionName
    • .relocSectionVirtualSize
    • .relocSection|VirtualSize-SizeOfRawData|
    • .relocSectionVirtualAddress
    • .relocSectionSizeOfRawData
    • .relocSectionPointerToRawData
    • .relocSectionCharacteristics
    • .relocSectionEntropy
    • TotalNumberOfFunctionInIAT
    • TotalNumberOfFunctionInEAT
    • exist_rich_header
    • raw_rich_header
    • MajorLinkerVersion
    • MinorLinkerVersion
    • SizeOfInitializedData
    • SizeOfUninitializedData
    • SizeOfStackReserve
    • DllCharacteristics

Malware_Classifier [Train & Validation & Test with Malware features]

  • Using Models
    • xgboost
    • adaboost
    • gbm
    • lgbm
    • catboost

  • Code Explanation
    • Model class [Making each Trained models & Saving each models]

      • Xgboost function [Implementation of xgboost]

      • Adaoost function [Implementation of Adaboost]

      • GBM function [Implementation of GBM]

      • LGBM function [Implementation of LGBM]

      • Catboost function [Implementation of Catboost]

      • Find_Hyperparameter function [Hyperparameter finder automatically]

      • predictions function (lower case only) [Predictions with test data using target model]

      • Get_Csv function [Get Dataset]

      • Check_Drop function [drop columns from dataset]

      • Change_Types function [Change types for training & MinMax Scaling]

      • Check_Describe function [Describe Dataset x]


    • Model_Validation class [Get Scores with Validation Dataset using each models]

      • Check_Xgboost function [Get score xgboost model with Validation dataset]

      • Check_Adaboost function [Get score adaboost model with Validation dataset]

      • Check_GBM function [Get score GBM model with Validation dataset]

      • Check_LGBM function [Get score LGBM model with Validation dataset]

      • Check_Catboost function [Get score catboost model with Validation dataset]


    • Model_Test class [Get prediction with Test Dataset using best models]

      • Input : Model
      • Get_Test_Features function [Get Test Dataset csv file]
      • Do_Test function [Do predictions using model]
      • Save_To_Csv function [Save predictions to csv file]

About

Data based security code for malware detection using various of boost algorithm

Topics

Resources

License

Stars

Watchers

Forks

Languages