Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive reduction in performance after 1.5.0 #438

Closed
Plazmaz opened this issue Dec 18, 2018 · 19 comments · Fixed by #1148
Closed

Massive reduction in performance after 1.5.0 #438

Plazmaz opened this issue Dec 18, 2018 · 19 comments · Fixed by #1148
Labels
bug Something isn't working
Milestone

Comments

@Plazmaz
Copy link

Plazmaz commented Dec 18, 2018

Describe the bug
After the 1.5.0 release, there appears to have been a significant reduction in performance. For instance, here is the output of time bandit -r . using 1.4.0:

17.56user 2.12system 0:21.22elapsed 92%CPU (0avgtext+0avgdata 90712maxresident)k
182728inputs+0outputs (0major+45973minor)pagefaults 0swaps

And here it is with 1.5.0:

261.46user 3.96system 4:27.26elapsed 99%CPU (0avgtext+0avgdata 89164maxresident)k
362728inputs+0outputs (8major+45690minor)pagefaults 0swaps

To Reproduce
Steps to reproduce the behavior:

  1. Run time bandit -r . within a medium to large-size project on 1.4.0
  2. Note the time.
  3. Run the same command again, but with bandit 1.5.0
  4. Compare the two times. You should see a significant drop in performance

Expected behavior
Performance consistent or comparable with 1.4.0

Bandit version

bandit 1.5.0
  python version = 2.7.15rc1 (default, Apr 15 2018, 21:51:34) [GCC 7.3.0]
@Plazmaz
Copy link
Author

Plazmaz commented Dec 18, 2018

Worth noting this is also an issue in 1.5.1

@ericwb ericwb added the bug Something isn't working label Dec 19, 2018
@ehooo
Copy link
Contributor

ehooo commented Dec 22, 2018

I just test rule by rule and the issue is not look related to this.

The only point I see that could be caused is the number of rules from 64 to 70 but according to the test I made the degradation time is around 3 or 4 sec, so should be another thing.

Here you are the code and the result in ODS file Bandit_time.zip

To get this data I clone twice bandit one for 1.4.0 and other for 1.5.0 and on a virtual env with each bandit installed to run the script attached.

Bandit 1.4.0

B602: 0.550184
B405: 0.640002
Total: 0:00:36.750709 - Num: 64
Mean: 0.574229 - Median: 0.569994
With all: 0.730448
Bandit 1.5.0
B110: 0.547545
B402: 0.649634
Total: 0:00:40.508832 - Num: 70
Mean: 0.578697 - Median: 0.577912
With all: 0:00:08.433786

I also check the code diffing and I don't see anything that could cause that degradation.
Any idea?

@lukehinds
Copy link
Member

It might be worth us putting in some debug timers on each plugin call to see if its any in particular that are causing the issue.

@ehooo
Copy link
Contributor

ehooo commented Dec 24, 2018

I just run line_profile

import line_profiler
from bandit.cli.main import _get_profile
from bandit.core import config as b_config
from bandit.core import manager as b_manager


def line_profiler_function():
    targets = ('../django-moth/', )
    formatter = 'txt'

    prof = line_profiler.LineProfiler()

    b_conf = b_config.BanditConfig(config_file=None)
    profile = _get_profile(b_conf, None, None)
    b_mgr = b_manager.BanditManager(b_conf, 'file', False,
                                    profile=profile, verbose=False,
                                    ignore_nosec=False)
    b_mgr.discover_files(targets, True, None)
    for key in b_mgr.b_ts.tests:
        for funct in b_mgr.b_ts.tests[key]:
            prof.add_function(funct)
    f = prof(b_mgr.run_tests)
    f()
    prof.print_stats(open('profile.log', 'w'))
    prof.disable()

Just to resume the issue is on blacklist function:
On 1.4.0:

Total time: 0.307109 s
File: bandit/core/blacklisting.py
Function: blacklist at line 30
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
...
    54                                                   else:
    55       624        683.0      1.1      0.2              name = context.call_function_name_qual
    56     21828      11170.0      0.5      3.6          for check in blacklists[node_type]:
    57     77940      41703.0      0.5     13.6              for qn in check['qualnames']:
    58     56737     233936.0      4.1     76.2                  if fnmatch.fnmatch(name, qn):
    59         4         57.0     14.2      0.0                      return report_issue(check, name)
    60                                           
    61       863        859.0      1.0      0.3      if node_type.startswith('Import'):
    62       242        137.0      0.6      0.0          prefix = ""
    63       242        153.0      0.6      0.0          if node_type == "ImportFrom":
    64       198        252.0      1.3      0.1              if context.node.module is not None:
    65       198        237.0      1.2      0.1                  prefix = context.node.module + "."
    66                                           
    67      3104       1679.0      0.5      0.5          for check in blacklists[node_type]:
    68      5964       5013.0      0.8      1.6              for name in context.node.names:
    69      7226       4191.0      0.6      1.4                  for qn in check['qualnames']:
    70      4130       3066.0      0.7      1.0                      if (prefix + name.name).startswith(qn):
    71         6         99.0     16.5      0.0                          return report_issue(check, name.name)

On 1.5.0:

Total time: 37.5339 s
File: bandit/core/blacklisting.py
Function: blacklist at line 30
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
...
    54                                                   else:
    55       624        688.0      1.1      0.0              name = context.call_function_name_qual
    56                                                       # In the case the Call is an importlib.import, treat the first
    57                                                       # argument name as an actual import module name.
    58       624        393.0      0.6      0.0              if name in ["importlib.import_module", "importlib.__import__"]:
    59                                                           name = context.call_args[0]
    60     24320      15044.0      0.6      0.0          for check in blacklists[node_type]:
    61    102295      88582.0      0.9      0.2              for qn in check['qualnames']:
    62     78600   37401448.0    475.8     99.6                  if fnmatch.fnmatch(name, qn):
    63         4        161.0     40.2      0.0                      return report_issue(check, name)
    64                                           
    65       863       1130.0      1.3      0.0      if node_type.startswith('Import'):
    66       242        146.0      0.6      0.0          prefix = ""
    67       242        161.0      0.7      0.0          if node_type == "ImportFrom":
    68       198        264.0      1.3      0.0              if context.node.module is not None:
    69       198        264.0      1.3      0.0                  prefix = context.node.module + "."
    70                                           
    71      3576       2015.0      0.6      0.0          for check in blacklists[node_type]:
    72      6944       5963.0      0.9      0.0              for name in context.node.names:
    73     12060       7133.0      0.6      0.0                  for qn in check['qualnames']:
    74      8456       6229.0      0.7      0.0                      if (prefix + name.name).startswith(qn):
    75         6        104.0     17.3      0.0                          return report_issue(check, name.name)
name)

@ehooo
Copy link
Contributor

ehooo commented Dec 26, 2018

I don't see any possible solution to solve it.
If I'm not wrong the only changes was on:

For call.py file

  • 'pickle', 'B301',
    - 'dill.loads',
    - 'dill.load',
    - 'dill.Unpickler'
  • 'md5', 'B303':
    - 'hashlib.sha1',
    - 'Crypto.Hash.SHA.new',
    - 'Cryptodome.Hash.MD2.new',
    - 'Cryptodome.Hash.MD4.new',
    - 'Cryptodome.Hash.MD5.new',
    - 'Cryptodome.Hash.SHA.new',
    - 'cryptography.hazmat.primitives.hashes.SHA1'
  • 'ciphers', 'B304':
    - 'Cryptodome.Cipher.ARC2.new',
    - 'Cryptodome.Cipher.ARC4.new',
    - 'Cryptodome.Cipher.Blowfish.new',
    - 'Cryptodome.Cipher.DES.new',
    - 'Cryptodome.Cipher.XOR.new',
  • 'tempnam', 'B325':
    - 'os.tempnam'
    - 'os.tmpnam'
  • 'unverified_context', 'B323',:
    - 'ssl._create_unverified_context'

For imports.py file

  • import_pycrypto', 'B413':
    - 'Crypto.Cipher',
    - 'Crypto.Hash',
    - 'Crypto.IO',
    - 'Crypto.Protocol',
    - 'Crypto.PublicKey',
    - 'Crypto.Random',
    - 'Crypto.Signature',
    - 'Crypto.Util'
  • 'import_pycryptodome', 'B414':
    - 'Cryptodome.Cipher',
    - 'Cryptodome.Hash',
    - 'Cryptodome.IO',
    - 'Cryptodome.Protocol',
    - 'Cryptodome.PublicKey',
    - 'Cryptodome.Random',
    - 'Cryptodome.Signature',
    - 'Cryptodome.Util'

I also test the code with the commit 257b5dd the different is near to None

I realy don't know how to fix it. :(

Any idea??

@lukehinds
Copy link
Member

lukehinds commented Jan 11, 2019

Tracked down the issue to Python 2 and < Release 1.50

Python 3.7.1 & Bandit 1.4.0

result: bandit -r youtube-dl  64.17s user 0.12s system 99% cpu 1:04.47 total

Python 3.7.1 & Bandit 1.5.0

result: bandit -r youtube-dl  64.52s user 0.07s system 99% cpu 1:04.68 total

Python 3.7.1 & Bandit 1.5.1

result: bandit -r youtube-dl  63.88s user 0.09s system 99% cpu 1:04.06 total

Python version 2.7.15 & Bandit 1.4.0

result: bandit -r youtube-dl  73.62s user 0.17s system 99% cpu 1:14.00 total

Python version 2.7.15 & Bandit 1.5.0

result: bandit -r youtube-dl  597.42s user 0.72s system 99% cpu 10:00.45 total

Python version 2.7.15 & Bandit 1.5.1

result: bandit -r youtube-dl  581.50s user 0.46s system 99% cpu 9:45.59 total

Now trying to find the particular change that caused this. As per #422 B303, B304, B305 are where the more delaying factors are based.

For now recommend anyone feeling the impact of this in their gate, change to running bandit under python3.

@Plazmaz
Copy link
Author

Plazmaz commented Jan 14, 2019

@lukehinds unfortunately that's not currently an option in my scenario. The temp fix I've implemented scans a much smaller subset of files, but it's definitely sub-optimal. One thing I noticed was that calls to re seemed to be taking up a large amount of the execution time, so if there's anywhere that doesn't use precompiled regexes, that might be worth looking at.

@ghugo
Copy link
Contributor

ghugo commented Jan 14, 2019

Looks like this is the commit where performance tanks:
85e5667

Commenting out the added blacklisted items brings performance back to being the original speedy self.

gagehugo@zbook:~$ time bandit -r keystone/keystone/ -x keystone/keystone/tests/ -f txt -o /dev/null
[main]	INFO	profile include tests: None
[main]	INFO	profile exclude tests: None
[main]	INFO	cli include tests: None
[main]	INFO	cli exclude tests: None
[main]	INFO	running on Python 2.7.12
525 [0.. 50.. 100.. 150.. 200.. 250.. 300.. 350.. 400.. 450.. 500.. ]
[text]	INFO	Text output written to file: /dev/null

real	1m33.092s
user	1m33.008s
sys	0m0.080s

After commenting out:

    sets.append(utils.build_conf_dict(
        'md5', 'B303',
        ['hashlib.md5',
         'hashlib.sha1',
         'Crypto.Hash.MD2.new',
         'Crypto.Hash.MD4.new',
         'Crypto.Hash.MD5.new',
#         'Crypto.Hash.SHA.new',
         'Cryptodome.Hash.MD2.new',
         'Cryptodome.Hash.MD4.new',
         'Cryptodome.Hash.MD5.new'],
#         'Cryptodome.Hash.SHA.new',
#         'cryptography.hazmat.primitives.hashes.MD5',
#         'cryptography.hazmat.primitives.hashes.SHA1'],
        'Use of insecure MD2, MD4, MD5, or SHA1 hash function.'
        ))
gagehugo@zbook:~$ time bandit -r keystone/keystone/ -x keystone/keystone/tests/ -f txt -o /dev/null
[main]	INFO	profile include tests: None
[main]	INFO	profile exclude tests: None
[main]	INFO	cli include tests: None
[main]	INFO	cli exclude tests: None
[main]	INFO	running on Python 2.7.12
525 [0.. 50.. 100.. 150.. 200.. 250.. 300.. 350.. 400.. 450.. 500.. ]
[text]	INFO	Text output written to file: /dev/null

real	0m6.508s
user	0m6.452s
sys	0m0.052s

@lukehinds
Copy link
Member

nice find @ghugo

I wonder what is the factor that causes the performance variations between python 2&3, something.

@ehooo
Copy link
Contributor

ehooo commented Jan 17, 2019

I just check the function fnmatch on python core and it was changed:
https://github.com/python/cpython/blob/2.7/Lib/fnmatch.py#L69
https://github.com/python/cpython/blob/3.4/Lib/fnmatch.py#L64

Changed on the commit:
python/cpython@5b0c1e0#diff-b62de5c6bce57a277ae7bc60324011ec

@lukehinds
Copy link
Member

lukehinds commented Jan 18, 2019

In light of @ehooo's findings I suggest we update the README with a recommendation that folks use py3 if utilising the crypto plugins with a link to this issue. That way if they have a super need to run Bandit under py2 and with the crypto plugins, they can pursue the python code linked in comment

If no objections I will assign this to me, and close it on the basis of the above readme additions.

@Plazmaz
Copy link
Author

Plazmaz commented Feb 22, 2019

Just wanted to add that there is still a significant reduction in performance with ANY of the tests under B303 enabled, not just the ones added in 85e5667. It seems to get better the more of those I comment out, but is still very slow in general.
With commented out lines in B303:

231.61user 0.15system 3:51.80elapsed 99%CPU (0avgtext+0avgdata 76112maxresident)k
0inputs+0outputs (0major+24931minor)pagefaults 0swaps

B303 Disabled:

15.70user 0.09system 0:15.84elapsed 99%CPU (0avgtext+0avgdata 74372maxresident)k
104inputs+0outputs (0major+21624minor)pagefaults 0swaps

Also, my original test was with the same version of python in two different versions of bandit, so it seems somewhat suspicious that this is an issue in python 2.7.

@ericwb ericwb added this to the Near Future milestone May 9, 2019
@KelSolaar
Copy link

KelSolaar commented May 21, 2019

Hi,

I just came around this issue for similar problems, I'm extending the blacklists with more content and noticed that the time skyrocketed, the culprit being indeed the fnmatch definition and the calls to the re regex library:

image

Is there a strong rational for using regex here? Looking at the shipped blacklists, none seem to be using regex patterns.

@KelSolaar
Copy link

KelSolaar commented May 21, 2019

A quick test I did was just changing this line:

From

if name is not None and fnmatch.fnmatch(name, qn):

To

if name is not None and name == qn:

and I went from 14m48.641s to 0m20.871s while removing 1 false positive. Given that it seems like regex matching should be optional and not the default? I will certainly monkey-patch the definition on our end.

@Plazmaz
Copy link
Author

Plazmaz commented May 23, 2019

FWIW fnmatch is not regex matching. That'd be via the re library, which also allows for precompiled patterns (which might improve performance)

@KelSolaar
Copy link

@Plazmaz : I might misunderstood what you just said but fnmatch is definitely regex matching: https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/fnmatch.py#L39 and it is also quite clear from the profile screengrab I posted.

@Plazmaz
Copy link
Author

Plazmaz commented May 23, 2019

Ah my bad, I didn't notice it used regex internally. I was looking at https://docs.python.org/3.7/library/fnmatch.html and figured it was just glob matching.

@KelSolaar
Copy link

Yeah, nah :) It explains why it is uber slow.

@Plazmaz
Copy link
Author

Plazmaz commented May 23, 2019

I mean, it could precompile those regexes and it would probably help a lot.

bittner added a commit to behave/behave-django that referenced this issue Jan 10, 2022
Bandit UX is seriously broken, only <1.6 works predictably.

Exclude/ignore of files is currently broken in Bandit:
- PyCQA/bandit#693
- PyCQA/bandit#490
- PyCQA/bandit#438 (comment)

Reading settings from configuration files is broken:
- PyCQA/bandit#753
- PyCQA/bandit#595

Reading from pyproject.toml not yet functional:
- Must install "toml" package and use "-c pyproject.toml".
- PyCQA/bandit#758

INI file configuration and CLI usage is unclear:
- PyCQA/bandit#603
- PyCQA/bandit#467
- PyCQA/bandit#396
bittner added a commit to behave/behave-django that referenced this issue Jan 10, 2022
Bandit UX is seriously broken, only <1.6 works predictably.

Exclude/ignore of files is currently broken in Bandit:
- PyCQA/bandit#693
- PyCQA/bandit#490
- PyCQA/bandit#438 (comment)

Reading settings from configuration files is broken:
- PyCQA/bandit#753
- PyCQA/bandit#595

Reading from pyproject.toml not yet functional:
Must install "toml" package and use "-c pyproject.toml".
- PyCQA/bandit#758

INI file configuration and CLI usage is unclear:
- PyCQA/bandit#603
- PyCQA/bandit#467
- PyCQA/bandit#396
bittner added a commit to behave/behave-django that referenced this issue Jan 10, 2022
Bandit UX is seriously broken, only <1.6 works predictably.

Exclude/ignore of files is currently broken in Bandit:
- PyCQA/bandit#693
- PyCQA/bandit#490
- PyCQA/bandit#438 (comment)

Reading settings from configuration files is broken:
- PyCQA/bandit#753
- PyCQA/bandit#595

Reading from pyproject.toml not yet functional:
Must install "toml" package and use "-c pyproject.toml".
- PyCQA/bandit#758

INI file configuration and CLI usage is unclear:
- PyCQA/bandit#603
- PyCQA/bandit#467
- PyCQA/bandit#396
ericwb added a commit to ericwb/bandit that referenced this issue Jun 23, 2024
The blacklisting function is currently using fnmatch.fnmatch()
to do matching of qualified names of blacklist calls. It seems
it is only used for telnetlib and ftplib where they are setting
the qualified name in a file glob style (telnetlib.*).

This change would slightly break backward compatibility if there
are any third-party plugins that use globbing in the qualified
names for blacklisting. I think the likelyhood is small. I also
think it is better to be more explicit in the qualified name
patterns. In the case of ftplib, FTP is insecure, but FTP_TLS is
not. So this already is resolving one false postive.

The other effect of this change is a slight boost to performance.
When scanning cpython prior to this fix, it would take around 1 min.
After the fix, closer to 50 seconds. So a nice little bump in speed.

Fixes: PyCQA#438

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>
ericwb added a commit to ericwb/bandit that referenced this issue Jun 23, 2024
The blacklisting function is currently using fnmatch.fnmatch()
to do matching of qualified names of blacklist calls. It seems
it is only used for telnetlib and ftplib where they are setting
the qualified name in a file glob style (telnetlib.*).

This change would slightly break backward compatibility if there
are any third-party plugins that use globbing in the qualified
names for blacklisting. I think the likelyhood is small. I also
think it is better to be more explicit in the qualified name
patterns. In the case of ftplib, FTP is insecure, but FTP_TLS is
not. So this already is resolving one false postive.

The other effect of this change is a slight boost to performance.
When scanning cpython prior to this fix, it would take around 1 min.
After the fix, closer to 50 seconds. So a nice little bump in speed.

Fixes: PyCQA#438

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>
ericwb added a commit to ericwb/bandit that referenced this issue Jun 23, 2024
The blacklisting function is currently using fnmatch.fnmatch()
to do matching of qualified names of blacklist calls. It seems
it is only used for telnetlib and ftplib where they are setting
the qualified name in a file glob style (telnetlib.*).

This change would slightly break backward compatibility if there
are any third-party plugins that use globbing in the qualified
names for blacklisting. I think the likelyhood is small. I also
think it is better to be more explicit in the qualified name
patterns. In the case of ftplib, FTP is insecure, but FTP_TLS is
not. So this already is resolving one false postive.

The other effect of this change is a slight boost to performance.
When scanning cpython prior to this fix, it would take around 1 min.
After the fix, closer to 50 seconds. So a nice little bump in speed.

Fixes: PyCQA#438

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>
@ericwb ericwb closed this as completed in 4208e9d Jun 24, 2024
ericwb added a commit that referenced this issue Jun 24, 2024
* Performance improvement in blacklist function

The blacklisting function is currently using fnmatch.fnmatch()
to do matching of qualified names of blacklist calls. It seems
it is only used for telnetlib and ftplib where they are setting
the qualified name in a file glob style (telnetlib.*).

This change would slightly break backward compatibility if there
are any third-party plugins that use globbing in the qualified
names for blacklisting. I think the likelyhood is small. I also
think it is better to be more explicit in the qualified name
patterns. In the case of ftplib, FTP is insecure, but FTP_TLS is
not. So this already is resolving one false postive.

The other effect of this change is a slight boost to performance.
When scanning cpython prior to this fix, it would take around 1 min.
After the fix, closer to 50 seconds. So a nice little bump in speed.

Fixes: #438

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>

* Add test for usage of FTP_TLS

This change adds an FTP_TLS call to the examples. A high severity
error is no longer reported as a result of the fix in PR #1148
that explicitly now matches blacklist call qualified names rather
than using a file glob.

However, you will notice that there is one more high severity
issue reported in the tests as a result of the import of
ftplib.FTP_TLS because the blacklist import is only checking for
"ftplib".

Fixes: #148

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>

---------

Signed-off-by: Eric Brown <eric_wade_brown@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants