Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EML scanner enchancement #525

Merged
merged 5 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 14 additions & 15 deletions cicd/benchmark.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ Authentication Credentials 96 2651 32
Cryptographic Primitives 54 171 1
Generic Secret 1210 29585 618
Generic Token 341 3718 556
Other 700 3707 37
Other 715 3695 37
Password 1483 7145 4224
Predefined Pattern 428 5292 11
Predefined Pattern 427 5290 11
Private Key 1019 1477
TOTAL: 5331 53746 5479
TOTAL: 5345 53732 5479
FileType FileNumber ValidLines Positives Negatives Template
--------------- ------------ ------------ ----------- ----------- ----------
190 36319 45 407 80
Expand Down Expand Up @@ -83,7 +83,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.html 56 30394 10 116 18
.idl 2 841 4
.iml 6 699 32
.in 6 2190 2 46 7
.in 6 2190 4 44 7
.inc 2 81 2 1
.ini 11 1489 6 10 24
.ipynb 1 210 4
Expand All @@ -93,7 +93,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.jenkinsfile 1 78 1 6
.jinja2 1 64 2
.js 665 705090 321 2445 363
.json 856 15025976 331 10634 185
.json 856 15025976 337 10628 185
.jsp 13 4101 1 38 1
.jsx 7 1162 19
.jwt 6 8 6
Expand All @@ -120,7 +120,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.markdown 3 146 2 2
.markerb 3 12 2 1
.marko 1 32 2
.md 659 172418 395 2404 719
.md 659 172418 398 2401 719
.mdx 3 723 7
.mjml 2 183 3
.mjs 22 5853 85 309
Expand All @@ -146,7 +146,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.pl 16 15748 6 34 1
.pm 3 880 7
.po 3 2996 15
.pod 9 1921 1 26 1
.pod 9 1921 6 21 1
.pony 1 106 4
.postinst 2 441 12 3
.pp 10 687 14 1
Expand Down Expand Up @@ -183,7 +183,7 @@ FileType FileNumber ValidLines Positives Negatives Templat
.rst 89 36385 29 304 62
.rules 1 6 2
.sample 2 25 7 3
.sbt 3 652 1 5 2
.sbt 3 652 6 2
.scala 40 6603 12 96 6
.scss 16 10191 32 1
.secrets 1 12 1
Expand Down Expand Up @@ -228,21 +228,20 @@ FileType FileNumber ValidLines Positives Negatives Templat
.xib 11 504 164
.xsl 1 315 1
.yaml 151 23500 92 379 52
.yml 450 41925 295 960 359
.yml 450 41925 294 961 359
.zsh 7 1109 13
.zsh-theme 1 121 1
TOTAL: 10214 19076375 5331 53746 5479
TOTAL: 10214 19076375 5345 53732 5479
Detected Credentials: 6022
credsweeper result_cnt : 5093, lost_cnt : 0, true_cnt : 4663, false_cnt : 430
credsweeper result_cnt : 5093, lost_cnt : 0, true_cnt : 4677, false_cnt : 416
Category TP FP TN FN FPR FNR ACC PRC RCL F1
-------------------------- ---- ---- -------- ---- -------- -------- -------- -------- -------- --------
Authentication Credentials 74 25 2658 22 0.009318 0.229167 0.983087 0.747475 0.770833 0.758974
Cryptographic Primitives 38 3 169 16 0.017442 0.296296 0.915929 0.926829 0.703704 0.800000
Generic Secret 1108 23 30180 102 0.000762 0.084298 0.996021 0.979664 0.915702 0.946604
Generic Token 299 7 4267 42 0.001638 0.123167 0.989382 0.977124 0.876833 0.924266
Other 519 329 3415 181 0.087874 0.258571 0.885239 0.612028 0.741429 0.670543
Other 534 317 3415 181 0.084941 0.253147 0.888014 0.627497 0.746853 0.681992
Password 1195 37 11332 288 0.003254 0.194201 0.974712 0.969968 0.805799 0.880295
Predefined Pattern 411 6 5297 17 0.001131 0.039720 0.995987 0.985612 0.960280 0.972781
Predefined Pattern 410 4 5297 17 0.000755 0.039813 0.996334 0.990338 0.960187 0.975030
Private Key 1019 0 1477 0 1.000000 1.000000 1.000000 1.000000
4663 430 19070614 668 0.000023 0.125305 0.999942 0.915570 0.874695 0.894666

4677 416 19070614 668 0.000022 0.124977 0.999943 0.918319 0.875023 0.896149
34 changes: 23 additions & 11 deletions credsweeper/deep_scanner/eml_scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,32 +21,44 @@ def data_scan(
depth: int, #
recursive_limit_size: int) -> List[Candidate]:
"""Tries to scan EML with text representation"""
candidates = []
candidates: List[Candidate] = []

try:
msg = email.message_from_bytes(data_provider.data)
for part in msg.walk():
content_type = part.get_content_type()
body = part.get_payload(decode=True)

if not isinstance(body, (bytes, str)):
continue
if "text/plain" == content_type:
eml_text_data_provider = ByteContentProvider(content=body,
file_path=data_provider.file_path,
file_type=data_provider.file_type,
info=f"{data_provider.info}|EML-TEXT")
eml_text_data_provider = ByteContentProvider(
content=(body if isinstance(body, bytes) else body.encode()),
file_path=data_provider.file_path,
file_type=data_provider.file_type,
info=f"{data_provider.info}|EML-TEXT")
eml_candidates = self.scanner.scan(eml_text_data_provider)
candidates.extend(eml_candidates)
elif "text/html" == content_type:
html_data_provider = DataContentProvider(data=body)
if html_data_provider.represent_as_html(depth, recursive_limit_size,
self.scanner.keywords_required_substrings_check):
string_data_provider = StringContentProvider(lines=html_data_provider.lines,
line_numbers=html_data_provider.line_numbers,
else:
x_data_provider = DataContentProvider(data=(body if isinstance(body, bytes) else body.encode()),
file_path=data_provider.file_path,
file_type=data_provider.file_type,
info=f"{data_provider.info}|EML-DATA")
new_limit = recursive_limit_size - len(body)
if "text/html" == content_type and x_data_provider.represent_as_html(
depth, new_limit, self.scanner.keywords_required_substrings_check):
string_data_provider = StringContentProvider(lines=x_data_provider.lines,
line_numbers=x_data_provider.line_numbers,
file_path=data_provider.file_path,
file_type=data_provider.file_type,
info=f"{data_provider.info}|EML-HTML")
html_candidates = self.scanner.scan(string_data_provider)
candidates.extend(html_candidates)
elif content_type.startswith("application"):
x_candidates = self.recursive_scan(x_data_provider, depth, new_limit)
candidates.extend(x_candidates)
else:
logger.error(f"{data_provider.file_path}:{content_type}:{type(body)} cannot be supported")
except Exception as eml_exc:
logger.error(f"{data_provider.file_path}:{eml_exc}")
return candidates
18 changes: 10 additions & 8 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'CredSweeper'
copyright = '2023, Samsung CredTeam'
copyright = '2024, Samsung CredTeam'
author = 'CredTeam'

from credsweeper import __version__ as credsweeper_version
Expand All @@ -38,8 +38,10 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.viewcode',
'sphinx.ext.todo',
'sphinx.ext.napoleon',
'sphinx_autodoc_typehints',
'm2r2',
Expand All @@ -55,18 +57,18 @@
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build']

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'

Expand All @@ -76,13 +78,13 @@
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

html_theme = 'sphinx_rtd_theme'

html_theme_options = {
'logo_only': True, #
'navigation_depth': 3 #
Expand Down
8 changes: 8 additions & 0 deletions docs/source/credsweeper.filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,14 @@ credsweeper.filters.value\_base64\_encoded\_pem\_check module
:undoc-members:
:show-inheritance:

credsweeper.filters.value\_base64\_key\_check module
----------------------------------------------------

.. automodule:: credsweeper.filters.value_base64_key_check
:members:
:undoc-members:
:show-inheritance:

credsweeper.filters.value\_blocklist\_check module
--------------------------------------------------

Expand Down
Loading
Loading