Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning while loading framework and some other questions about output #30

Closed
jiansuozhe opened this issue Jan 27, 2022 · 17 comments
Closed

Comments

@jiansuozhe
Copy link

Hi there,

I downloaded the latest version of Counterfit last week and installed all the modules required but still have some problems. When I executed the command 'load art' I got a warning:

load art

The type of the provided estimator is not yet support for automated setting of logits difference loss. Therefore, this attack is defaulting to attacking the loss provided by the model in the provided estimator.
[+] art successfully loaded with defaults (no config file provided)

which did not exist before. When I executed the 'run' command I only got the adversarial input:

run

[-] Running attack HopSkipJump with id 12a70390 on creditfraud)

[-] Preparing attack...
[-] Running attack...
┌─────────┬──────────────┬──────────────────────────┐
│ Success │ Elapsed time │ Total Queries │
├─────────┼──────────────┼──────────────────────────┤
│ 1/1 │ 4.3 │ 24550 (5740.6 query/sec) │
└─────────┴──────────────┴──────────────────────────┘
┌┬┬┬┬────────────────────────────────────────────────────────────────────────┬┐
│││││ Adversarial Input ││
├┼┼┼┼────────────────────────────────────────────────────────────────────────┼┤
│││││ [4462.00 -2.30 1.76 -0.36 2.33 -0.82 -0.07 0.56 -0.40 -0.24 -1.53 2.03 ││
│││││ -6.56 0.17 -1.47 -0.70 -2.28 -4.78 -2.62 -1.34 -0.43 -0.30 -0.93 0.17 ││
│││││ -0.09 -0.15 -0.54 0.04 -0.15 239.93] ││
└┴┴┴┴────────────────────────────────────────────────────────────────────────┴┘
[+] Attack completed 12a70390 (HopSkipJump)

there are some differences in the scan summary as well (no 'queries'):

Additionally, could you please explain some meaning of values in the scan summary and the running output? Could you please tell me what does it mean by 'successes'? When can we say an attack is a 'success' or 'failure'? I guess that 'best score' means the success percentage of samples in an attack, is it correct?

In the running output, I guess the sample index means the number of samples in the attack right? What is the meaning of label and attack label? What is the meaning of % Eucl. dist. and Elapsed Time [sec]? I think 'queries' means the number of the attack on target, is it correct? I saw a list of decimal numbers in the adversarial input value, are they only random numbers or they have some similarities? Where are they from? Do they have some relations with my attack type?

Thank you very much for your help and patience.

@moohax
Copy link
Contributor

moohax commented Feb 3, 2022

Hi @jiansuozhe, thanks for your questions.

In v1.0 art is loaded dynamically and this warning comes from art, The type of the provided estimator is not yet support for automated setting of logits difference loss. Therefore, this attack is defaulting to attacking the loss provided by the model in the provided estimator..

scan and run are two separate commands with two mostly distinct purposes. scan is for "fuzzing", where run is for more manual testing. With scan you will get more metrics that are useful for baselining, versus run where you only really care about the output that you generate.

Defining what is a success or failure is going to depend on the attack and framework you use. For example, in art.py, [check_success](https://github.com/Azure/counterfit/blob/5e385b0a6cf80e90bea76507ec1cdef2c85cea2b/counterfit/frameworks/art/art.py#L480) uses the Adversarial Robustness Toolbox function to define "success" for both evasion attacks and extraction attacks. In augly.py, we just make our own function. It's really up to you how you want to define success, the CFAttack object you pass to reporting.py will have everything you need.

  1. best_score is the attack that generates highest confidence when switching a label, and does so with the least number of queries.
  2. sample_index is the index of the sample from your target.X. During target.load(), samples are loaded as a list into self.X. CF will reference this list to get samples for an attack.
  3. label is the initial label for a sample, before any modifications have been made.
  4. attack_label is the final label after an attack has completed.
  5. Eucl dist. is the % change in the image from the original input to the final output.
  6. Elapsed time is how long the attack took in seconds.
  7. Queries is the number of queries it took to complete the attack. (lower the better)
  8. Adversarial Input is the final output from an attack. Here it is a bunch of numbers because it's the creditfraud model. If you did the satellite demo, it would be an image. They are effectively a modified input sample. All samples get loaded when you interact with a target (creditfraud.py).

Hopefully this is helpful!

@jiansuozhe
Copy link
Author

Thank you very much for your help @moohax

@moohax
Copy link
Contributor

moohax commented Feb 7, 2022

Please don't hesitate to ask more questions!

@moohax moohax closed this as completed Feb 7, 2022
@jiansuozhe
Copy link
Author

Hello @moohax,

Could you please explain to me why I only got the adversarial input in my running output? Is it because of the estimator?
run

[-] Running attack HopSkipJump with id 12a70390 on creditfraud)

[-] Preparing attack...
[-] Running attack...
┌─────────┬──────────────┬──────────────────────────┐
│ Success │ Elapsed time │ Total Queries │
├─────────┼──────────────┼──────────────────────────┤
│ 1/1 │ 4.3 │ 24550 (5740.6 query/sec) │
└─────────┴──────────────┴──────────────────────────┘
WeChat Screenshot_20220114050404

@jiansuozhe
Copy link
Author

Additionally, could you please tell me how can we extract essential information from the adversarial input? For instance, find the useful numbers in a bunch of numbers to evaluate the model? @moohax

@moohax
Copy link
Contributor

moohax commented Feb 9, 2022

@jiansuozhe That's what run provides. Compare the input and output with predict -a and predict -i <sample_index>. You can trace into reporting to customize these reports.

In terms of "evaluating a model". Counterfit is largely designed as a red team tool, and the traditional sort of "robustness" testing is not necessarily a feature that we put front and center. For this type of reporting, I would dig into what art proper has and add those functions or elements to art.py in post_attack_processing.

@jiansuozhe
Copy link
Author

Hello @moohax,

Thank you for your reply. You mean that Counterfit is developed to test the protection level of system and find the leaks right?Could you please tell me how to realize this function?For example, I run an attack on a target and get the running output, I should be able to get some useful information from the output, is it correct? Can I get some useful information, for instance where are the leaks in my AI system or how to improve my AI algorithm to make it safer, from my output? Or I can only get the feedback like "my system is not safe when facing evasion attack" or something like this. Thank you.

@moohax
Copy link
Contributor

moohax commented Feb 10, 2022

The useful information is the output. If there is some metric or some output you would like to see, please let us know.

You could collect all of the outputs (Adversarial Inputs) and use them in an adversarial retraining scheme. But Counterfit has no official retraining mechanism built in. It could be done by adding training code to your target, calling train() in the targets load() function, then reloading the target. Again, that is unofficial and your milage may vary.

To explore targets and the completed attacks, from the counterfit> terminal, drop into an IPython shell and import CFState.

ipy
>> from counterfit.core.state import CFState
>> CFState.state().targets
>>  CFState.state().active_target.attacks

@jiansuozhe
Copy link
Author

Hello @moohax,

I found that when running HopSkipJump I can get the output, but when running the other attacks I only got the bugs. For instance, when running BoundaryAttack I got "Result too large", when running BasicIterativeMethod I got "no attribute 'predict wrapper'". Additionally, the most frequent problem is "object of type 'NoneType' has no len()". Is it because the input data(.npz) and the model file(.pkl) you provided can only be used in HopSkipJump? Or I need to switch my target? Thank you.

@jiansuozhe
Copy link
Author

@moohax, additionally, I have never created a new input data file before. Could you please give me some tips on how to create an input file (.npz)? Thank you.

@moohax
Copy link
Contributor

moohax commented Feb 18, 2022

@jiansuozhe

Each attack has varying requirements. The Boundary Attack likely ran successfully, but the output may have been too big for some auxiliary process. You can trace through counterfit.frameworks.art.post_attack_processing() or check_success() in that same module.

A helpful debugging thing is to put from IPython import embed; embed() into some function you want to examine during runtime. It will drop you into an IPython terminal that allows you to explore the current state. A more advanced debugging alternative is from IPython.core.debugger import set_trace; set_trace(), this will give you pdb style debugging.

You may need to switch the target, an attack can be either open-box (you have the model file), or closed-box (you have access to inference only). Hop Skip Jump is a closed-box attack and the Basic Iterative Method is an open-box attack. The implication is that the backend framework Adversarial Robustness Toolbox requires an estimator/classifier that inherits from CLASSIFIER_LOSS_GRADIENTS_TYPE.

Counterfit passes everything back to the framework to be built and run. The targets provided are for demo purposes, and we artificially force a particular ART loading process a target_classifier attribute attached to the target for testing purposes.

For example, digits_keras vs digits_blackbox

If you provide no target_classifier it will assume you are using a closed-box attack. As all of targets have model files, you can use open-box (whitebox) attacks against them. However, depending on the attack, you may need to provide additional items. We tend to focus on closed-box (blackbox) attacks.

Additionally, the most frequent problem is "object of type 'NoneType' has no len()"

I run into this, It's a bug when the attack fails to run. Because the attack fails to run, results never get set, and then successes can't be properly calculated and reported.

@moohax
Copy link
Contributor

moohax commented Feb 18, 2022

additionally, I have never created a new input data file before. Could you please give me some tips on how to create an input file (.npz)? Thank you.

This is a numpy zip file and is not explicitly a requirement for targets.

self.X should be a list of lists, where each entry is some sample you want an attack to perturb. Whether you keep your data in a text file, a database, or a single image in the target folder Counterfit only cares that self.X is a list of lists. Counterfit uses get_samples when preparing an attack. This function just pulls a sample, whether or not it will work against the target is handled in your predict function.

Similarly, in predict, x is also a list of lists. So you can handle multiple samples, or a batch of samples. For example, if you have a simple REST endpoint that does not take a batch on inputs, your predict would look like...

def predict(self, x):
    for sample in x:
        send sample to endpoint

if you are working with a local model, or an api that can accept a batch, the predict would look like...

def predict(self, x):
    send x to endpoint
    ...

You will also return a list of lists from predict, where each list is the output from the target model for the particular sample.

@jiansuozhe
Copy link
Author

Hello @moohax,

I loaded the framework art and interacted the target satellite, but I found that almost all the attacks did not work. The most frequent two questions are the following:
[-] Preparing attack...
[-] Running attack...
[!] Failed to run ee34d02c (ZooAttack): 'BlackBoxClassifier' object has no attribute 'channels_first'
Even if I interacted the target digits_blackbox the problem still occur. Could you please tell me where is the BlackBoxClassifier object and how can I fix it? Or I should not run the attacks like this? Thank you.

The second problem is like following:
WeChat Screenshot_20220223043750

Could you please tell me how to fix it? Thank you.

@jiansuozhe
Copy link
Author

Hello @moohax,

I found that if I do not use the latest version of counterfit, my attacks really worked. When I interacted the target tutorial, all the attacks worked. When I interacted the target satelliteimages, only the pixel attack and the threshold attack did not run properly. When running pixel attack, the problem was similar as before (no attribute XXX), when running threshold attack, the problem is like follows:
WeChat Screenshot_20220224013645

I do not know if I can fix it by changing the settings of my system.

Now that I would like to utilize counterfit to develop a system testing the security of AI models those are used to classify images, could you please tell me if I just need to consult the target tutorial and satelliteimages to design my own target? Thank you.

@moohax
Copy link
Contributor

moohax commented Feb 28, 2022

Nice work! (I know it doesn't seem like it).

Failed to draw an adversarial image... is the Adversarial Robustness Toolbox saying the attack failed. My recommendation is to use Hop Skip Jump as it is Boundary Attack 2.0. In a not too distant update (internal at the moment) you will be able to optimize attack parameters.

The memory allocation looks like a bug, seems as though the target is trying to process ALL images in the dataset as a sample. Double check to see if self.X is a list of lists after you load. Its often helpful to set a breakpoint ater self.X gets loaded and explore the data to make sure it's as you expect. Same advice for your predict function.

@jiansuozhe
Copy link
Author

Hello @moohax,

I downloaded the latest version of Counterfit and found that I could not load from config.json. Could you please tell me how to deal with this problem? Thank you.
IMG_20220414_190011

@moohax
Copy link
Contributor

moohax commented Apr 18, 2022

This is just a warning. You can provide a config that would limit the available attacks, or provide defaults. Otherwise Counterfit will just dynamically load all attacks.

Each respective framework implementation can be found under the folder named after the framework, art.py for example. This will give you insight into how it all gets loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants