New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to create a model with external predictions? #253
Comments
That's actually pretty straight forward. Think about the website as the model. So instead of a tensorflow or pytorch model giving you the predictions, it's the website. Hence, you should implement a subclass of the Foolbox model base class for which you need to implement only batch_predictions and num_classes. It should look similar like this:
|
I tried my best to do what you suggested to me but after a couple of epochs of uploading the pictures i always get: line 63, in call This happens as soon as i get a prediction for my noise picture to be a cat with confidence >0 Here is my implementation: def save_image_as(image, imagename):
scipy.misc.toimage(image, cmin=0.0, cmax=1.0).save(imagename+".png")
def img2array(imagename):
img = plt.imread(imagename)
rows,cols,colors = img.shape
img_size = rows*cols*colors
img_1D_vector = img.reshape(img_size)
return img_1D_vector
def find_in_nestlist(mylist, char):
for sub_list in mylist:
if char in sub_list:
return mylist.index(sub_list)
raise ValueError("'{char}' is not in list".format(char = char))
class WModel:
def upload_and_get_result(self, image):
self.dirpath = os.path.dirname(os.path.realpath(__file__))
self.driver= webdriver.Chrome(os.path.join(self.dirpath,'chromedriver.exe'))
try: #upload
self.driver.get('https://example.com')
self.driver.find_element_by_name('image').send_keys(os.path.join(self.dirpath,image));
classify=self.driver.find_element_by_xpath("//input[@type='submit' and @value='Classify']")
classify.submit()
except Exception as e:
print(e)
try:#get result of form confidence(float32), class("cat","not-a-cat")
ListOfResults=[]
result=self.driver.find_element_by_xpath("/html/body/pre").text # Import results
SplitResult=result.split('},{', 4)# formatting results
for i in SplitResult:
a=i.strip('[')
a=a.strip(']')
a=a.replace('"class":', '')
a=a.replace('"confidence":', '')
a=a.strip('{')
a=a.strip('}')
ListOfResults.append(a)
res=ListOfResults[0].split(', ')
self.driver.quit()
if any("car" in sl for sl in res):
b0 = float(res[find_in_nestlist(res, "car")][1])
else:
b0 = 0
if any("cat" in sl for sl in res):
b1 = float(res[find_in_nestlist(res, "cat")][1])
else:
b1 = 0
if any("dog" in sl for sl in res):
b2 = float(res[find_in_nestlist(res, "dog")][1])
else:
b2 = 0
return np.array([[b0, b1, b2]])
except Exception as e:
print(e)
class WebsiteModel(Model): #wrapper foolbox model for WModel
def __init__(
self,
model=WModel(),
bounds=(0,1),
num_classes=3,
channel_axis=1,
preprocessing=(0, 1)):
super(WebsiteModel, self).__init__(bounds=bounds,
channel_axis=channel_axis,
preprocessing=preprocessing)
self._num_classes = num_classes
def batch_predictions(self, images):
for image in images:
rows,cols,colors = 64,64,4
arr2img = image.reshape(rows,cols,colors)
save_image_as(arr2img, "currentimg")
return WModel.upload_and_get_result(self, "currentimg.png")
def num_classes(self):
return 3
fmodel = WebsiteModel(WModel,(0,1))
criterion0 = TargetClassProbability(1, p=0.5535189)
img_1D_vector = img2array('car.png')
image, label = img_1D_vector, 0 # get source image and label
start_1D_vector = img2array('catadversarial.png')
attack = foolbox.attacks.BoundaryAttack(fmodel, criterion = criterion0)
adversarial = attack(image, label, tune_batch_size=False, starting_point=start_1D_vector) i think it might be worth mentioning that the website either gives me confidence 100% for not-a-cat or confidence >0 for beeing a cat. |
The traceback says it all: the dimension / shape of your predictions is not correct. I am not exactly sure but I think the method upload_and_get_result should return something like [[0, 1]] (which is a 2D array of shape [1, 2] with a leading batch dimension which you need in batch_predictions) in both cases. Please make sure it really does that for all samples. |
Thank you so much! I updated the code above. |
That's hard to tell - it probably has to do with what is returned by |
Okay so what is it supposed to return in upload_and_get_result? Right now it's returning the prediction for a specific class as float and the class identifier and I can't fit more inside a 2d array of shape (1,2). |
I think I know what happens: the BoundaryAttack uses batching, but your batch_prediction is already returning on the first image. I'd suggest you set "tune_batch_size=False" when you apply the attack, this should solve the issue. You might also want to fix your batch_prediction function to return an array that contains the logits for all images in the batch. |
I again updated the code above with what you mentioned and it works better now! I also added a criterion to make sure i can classify my car.png as a cat with over 90% confidence. Also the website model was updated at my request to handle multiple classes and now knows cat, dog and car. The starting vector in my case is a noise picture that gets classified as a cat with over 90% confidence therefore meeting the criterion. But when the attack tries to evaluate it, it says the starting point is not adversarial: Shape and size are the same as the car.png. Could you please again help me with one of your really competent suggestions. Thank you so much in advance! |
If you look into the code foolbox/foolbox/attacks/boundary_attack.py Lines 649 to 653 in c8af62a
a.predictions(starting_point) which is basically equivalent to calling model.predictions(starting_point) . I suspect that something with your preprocessing is different between directly calling the API and calling it via model.predictions, and this is the best way to check. Alternatively, are you sure the API is deterministic (i.e. always yields > 90% confidence)?
|
There is indeed a difference between calling fmodel.predictions and the batch_predictions. fmodel.predictions(startingpoint) returns me |
The difference between predictions and batch_predictions is expected. I can only guess at this point but basically you have to check why the attack thinks that the starting point is adversarial. This might simply have to do with the adversarial criterion you defined again. Internally the adversarial test is as follows: https://github.com/bethgelab/foolbox/blob/master/foolbox/adversarial.py#L212-L234 Take the model.prediction and test why the outcome of the above code is not adversarial. |
I think i figured the problem out: I thought there are two criteria that allow me to generate pictures with >90% confidence: TargetClassProbability and ConfidentMisclassification. Both call def is_adversarial(self, predictions, label):
top1 = np.argmax(predictions)
probabilities = softmax(predictions)
return (np.max(probabilities) >= self.p) and (top1 != label) So in my case 0.5535 (the probability value) gets >= to 0.9 (i thought this is the prediction value) which is false, therefore the starting point is not adversarial. |
@roughentomologyx Great! By default Foolbox thinks that prediction outputs are logits, but that is easy to adapt. You have two options: (1) you write your own criterion (just copy TargetClassProbability and remove the softmax) or (2) you slightly change your model so that it returns logits instead of probabilities (as we typically do for Keras models, check out https://github.com/bethgelab/foolbox/blob/master/foolbox/models/keras.py#L137-L142). |
Thank you so much. Just changing p is also working fine. |
@roughentomologyx Thanks for your kind words. You can always support Foolbox by spreading the word or by implementing a new attack. Also, we are always happy if you cite the Foolbox and the Boundary Attack in your publications. As for you question, the Boundary attack will try to find an adversarial image that is adversarial (e.g. > 90% confidence) yet looks as closely as possible as the original image. That is usually possible and you can often not notice a difference between original and adversarial image after optimisation. That's the (sometimes unsettling) essence of adversarial vulnerability. |
Is there a way to stop the attack from converging? I always get attack converged after like 1500 steps. In your paper you run the attack like 20000 times to get optimal results. I want to do that to. |
Closing this due to lack of activity. Please reopen if it's still relevant. |
Scenario is the following one:
There is a website where i can send a 128x128 colored PNG to and it get's classified either as a cat with confidence >0 or as not-a-cat (confidence =0) ( i have full authorization to use that website how i want to and i can send really as much as i want, to prevent strange questions).
I would like to create an adversarial example with foolbox (is a car, get's classified as a cat with +90% confidence) but i can't seem to get my head around how i can create a model with that, so that i can attack it with foolbox boundary attack. Here's a template of what i have:
Any help/hints would be appreciated, thank you in advance.
The text was updated successfully, but these errors were encountered: