-
Notifications
You must be signed in to change notification settings - Fork 129
Advanced Use
There is a flow to Counterfit, both in creating targets and in executing attacks. Here we will cover some advanced use cases and show how flexible it can be. You will come to learn that Counterfit does not care how you get traffic back and forth - it only cares outputs are returned to the backend framework in the correct way.
Malicious queries look malicious. On some platforms, the predicted image is presented to the ML engineer in a dashboard. If suddenly, the platform starts to receive queries that look like television static, it will set off some alarms. To hide malicious queries, create a function inside MyTarget
that sends normal traffic to the endpoint and use it in the __call__
function. The example below shows that for every query the attack algorithm makes, a random number of normal queries will be sent. This ups your traffic but depending on the situation it could be worth it.
import random
...
def normal_traffic(num_queries):
for num in range(num_queries):
random_sample = random.choice(self.X)
request.post(self.model_endpoint, data=normal_data)
return
def __call__(self, x):
sample = x[0].tolist()
num_benign_queries = random.randrange(1,25))
self.normal_traffic(num_benign_queries)
response = requests.post(self.endpoint, data={"input": sample})
results = response.json()
cat_proba = results["confidence"]
not_a_cat_proba = 1-cat_proba
return [cat_proba, not_a_cat_proba]
...
Most penetration testing tools can create proxies that allow arbitrary traffic to be passed into an internal network. Counterfit does not require any special configuration for this use case. Simply configure the proxy and point the model_endpoint to the target or proxy - just as you would for RDP or SSH. For example, using the requests library with a socks proxy.
Setup any proxy you like, then use requests to send traffic to the target.
from counterfit.core.interfaces import AbstractTarget
class MyTarget(AbstractTarget):
...
endpoint = "https://10.10.2.11/predict"
...
def request_proxy_session():
session = requests.session()
session.proxies = {
'http': 'socks5://10.10.1.3:9050',
'https': 'socks5://10.10.1.3:9051'
}
return session
def __call__(self, x):
sample = x[0].tolist()
session = request_proxy_session()
response = session.post(self.model_endpoint, data=sample)
...
Write a function to send a query, then write a function to collect the output. Sometimes APIs will provide a redirect or separate URI to collect the results from. The below is a fairly simple example, but we have used this technique to collect from a number of obscure places.
import requests
...
def send_query(query_data):
response = requests.post(self.model_endpoint, data=query_data)
return response
def collect_result(collection_endpoint):
response = requests.get(collection_endpoint)
return response
def __call__(self, x):
sample = x[0].tolist()
response = send_query(sample)
collection_endpoint = response.tojson()['location']
result = collect_result(collection_endpoint)
final_result = result.json()
cat_proba = results["confidence"]
not_a_cat_proba = 1-cat_proba
return [cat_proba, not_a_cat_proba]
...
At some point you may want to load all frameworks or perform some checks on start. Counterfit uses cmd2 to load a startup script for this exact reason. Create a .counterfit file at the root of the project, Counterfit will execute these commands on start.
load art
load textattack
You can technically override any of the functions in the parent target class – and you should be careful to not override functions unnecessarily. However, outputs_to_labels
is one function that could be comfortably overridden for certain scenarios.
A primary reason to override outputs_to_labels
is to incorporate any knowledge you have about the decision threshold for a target model. By default, outputs_to_labels
in the parent class reports the class with highest confidence as the model output. For two classes, that corresponds to an implicit threshold of 0.5, for three classes it corresponds to an implicit threshold of 0.3333, etc.
As an example, suppose you learn through you investigations that a fraud classifier reports fraudulent
only when confidence score exceeds 0.9. In this case, you could override outputs_to_labels
as follows:
def outputs_to_labels(self, output, threshold=0.9):
output = np.atleast_2d(output)
return ['fraudulent' if score[0] > threshold else 'benign' for score in output]
Counterfit does not include any of the training functionality from the frameworks that are normally used for whitebox attacks. However, it is still possible to train a model inline and then attack the newly trained model. Put the training code inside the __init__
function. Beware that the target will fail to load if the __init__
function fails. Errors should be handled gracefully so that the reload
command will continue to work.
def __init__(self):
...
self.model = self.train_model(self.X, self.y)
def train_model(self, X, y):
...
model.fit(X, y)
...
return model
def __call__(self, x):
results = self.model.predict(x)
return results