Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML and JSON are scanned like structures #236

Merged
merged 2 commits into from Nov 16, 2022
Merged

Conversation

babenek
Copy link
Contributor

@babenek babenek commented Oct 26, 2022

Description

  • Added StructContentProvider to separate structure storage
  • YAML and JSON might be scanned like structures
  • The case in yaml with multiline credential is found:
key: |
  5UpEr5eCrEt

How has this been tested?

  • UnitTests
  • do benchmark with --dept 33

@babenek babenek self-assigned this Oct 28, 2022
@babenek babenek marked this pull request as ready for review November 8, 2022 00:00
@babenek babenek requested a review from a team as a code owner November 8, 2022 00:00
Copy link
Contributor

@csh519 csh519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this category(using extension to analyze) of features should to have abstract class and each specific classes.
I suggest to create new directory like extension_analyzer on credsweeper/file_handler/ as a child and create classes for that. Including xml parsing function.

How do you think?

@babenek
Copy link
Contributor Author

babenek commented Nov 9, 2022

Agree, there is some refactoring required. But it is not analysis by extension. It is a 'bruteforce' data analysis.
I hope, we implement file type recognition with signature sometime like 'is_zip', 'is_gzip'.
Moreover, there is applied recursive scan. So, credential might be hidden ZIP -> YAML -> BASE64 -> someAPI...
The feature is experimental yet and not optimized: CredSweeper has to read a file twice. First - to scan as is. Second - to recognize data type.
So, the refactoring will be big and the architecture will be changed.
I try to add new feature with the PR and #237 without extra changes.

I could separate struct_scan but it might invoke data_scan and this required self.config
There should be the config propagate in the classes. Probably we have to discuss more with xDizzix

@csh519
Copy link
Contributor

csh519 commented Nov 9, 2022

@babenek

Okay I understood what you mean except about whatthepatch case(xDizzix)..
Is it related with recursive scan with proper file type(extension)?
And regarding propagation of config, we can make a config as a classmethod and make it access from all over source code.

But I love singleton style more as we using.

kmnls
kmnls previously approved these changes Nov 11, 2022
Comment on lines 61 to 62
# logger.debug("CONVERTED from '%s' json:\n%s", self.data.decode(encoding='utf-8', errors='strict'),
# str(self.structure))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please unify the logging style with YAML code block(L75-L76)?
And please remove annotated codes.

csh519
csh519 previously approved these changes Nov 15, 2022
Copy link
Contributor

@csh519 csh519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding structure scan.

LGTM 👍

commit 7b2feea
Author: Roman Babenko <babenek@gmail.com>
Date:   Wed Nov 16 01:23:13 2022 +0200

    Squashed commit of the following:

    commit 02768be
    Merge: 6faeb2c d7e44c7
    Author: Roman Babenko <babenek@users.noreply.github.com>
    Date:   Wed Nov 16 00:38:37 2022 +0200

        Merge branch 'main' into structures

    commit d7e44c7
    Author: Roman Babenko <babenek@users.noreply.github.com>
    Date:   Tue Nov 15 23:32:10 2022 +0200

        Version up to 1.4.5 (Samsung#243)

        * Version up to 1.4.5 and template fix

        * Update whatthepath requirement version to avoid huge diffs check.

        * Added test for huge patches parsing

    commit 6faeb2c
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Mon Nov 14 09:16:09 2022 +0200

        Removed extra debug comments

    commit 722c068
    Merge: 1555e12 d2e07b8
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Nov 9 01:47:10 2022 +0200

        Merge remote-tracking branch 'upstream/main' into structures

    commit 1555e12
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Nov 9 01:47:03 2022 +0200

        rename methods

    commit 0e36719
    Merge: ab563ba 4d59bda
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Nov 8 01:19:11 2022 +0200

        Merge remote-tracking branch 'upstream/main' into structures

    commit ab563ba
    Merge: 37287ae 97bdaa0
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Mon Nov 7 09:23:10 2022 +0200

        Merge branch 'auxiliary' into structures

    commit 97bdaa0
    Merge: 6989e46 27b7110
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Mon Nov 7 09:20:05 2022 +0200

        Merge branch 'main' into auxiliary

    commit 6989e46
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Sat Nov 5 10:38:19 2022 +0200

        rename methods

    commit 99f3408
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Sat Nov 5 10:32:13 2022 +0200

        make the method private again

    commit 2108748
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Nov 2 10:21:50 2022 +0200

        fix

    commit 4abdb33
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Nov 2 10:20:46 2022 +0200

        Line umeration

    commit da431c1
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Nov 2 09:50:44 2022 +0200

        Use common code for reduce duplicate code

    commit d3861df
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Nov 2 09:31:23 2022 +0200

        Apply test assertion

    commit f268401
    Author: Roman Babenko <babenek@users.noreply.github.com>
    Date:   Wed Nov 2 09:30:36 2022 +0200

        Update credsweeper/app.py

        Co-authored-by: ShinHyung Choi <sh519.choi@samsung.com>

    commit a956018
    Author: Roman Babenko <babenek@users.noreply.github.com>
    Date:   Wed Nov 2 09:30:25 2022 +0200

        Update credsweeper/file_handler/data_content_provider.py

        Co-authored-by: ShinHyung Choi <sh519.choi@samsung.com>

    commit 37287ae
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Thu Oct 27 10:59:27 2022 +0300

        restored 75% fuzz coverage

    commit 343ba58
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Thu Oct 27 10:58:09 2022 +0300

        minimized

    commit 6b9a5d5
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Thu Oct 27 10:30:13 2022 +0300

        reduced

    commit 7a8c2fd
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Thu Oct 27 08:03:52 2022 +0300

        четвер, 27 жовтня 2022 08:03:52 +0300

    commit 307762f
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 12:45:54 2022 +0300

        flake8fix

    commit 7c68080
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 12:04:06 2022 +0300

        Any struct

    commit 1e2c317
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 11:30:44 2022 +0300

        Fix MyPY. Reduce fuzz coverage

    commit 74fa40e
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 11:18:53 2022 +0300

        fix

    commit e816a68
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 10:53:29 2022 +0300

        fix tests credentials

    commit 9987e12
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 10:40:17 2022 +0300

        Structure scan JSON and YAML

    commit 2ddfee3
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 09:22:08 2022 +0300

        scan struct

    commit 2024255
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Wed Oct 26 08:32:21 2022 +0300

        Commited forgotten sample

    commit c894f31
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 23:58:12 2022 +0300

        refuzzed

    commit f17cbbd
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 18:35:55 2022 +0300

        fix

    commit 4e77dc9
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 18:25:15 2022 +0300

        Improved encode test research

    commit 6594b60
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 17:56:11 2022 +0300

        Encoded test

    commit 5e2775d
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 17:52:40 2022 +0300

        Separated test for docx

    commit 26d727b
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 17:44:43 2022 +0300

        fix

    commit 4ea1137
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 14:58:27 2022 +0300

        apply file type when not None

    commit cc2f6d7
    Author: Roman Babenko <babenek@gmail.com>
    Date:   Tue Oct 25 13:40:40 2022 +0300

        Encoded data might be decoded
@babenek
Copy link
Contributor Author

babenek commented Nov 15, 2022

Rebased after the last release

csh519
csh519 previously approved these changes Nov 16, 2022
Copy link
Contributor

@csh519 csh519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve again.

Copy link
Contributor

@iuriimet iuriimet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@babenek babenek merged commit b912ccb into Samsung:main Nov 16, 2022
@babenek babenek deleted the structures branch November 16, 2022 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants