A reading list of algorithms for getting insights of remote (black-box) services.
This page aims at listing algorithms (with a short review) related to the following scenario:
A user queries a service provider (through available APIs), and tries to infer information about the algorithms in use for providing the results of those queries.
Related keywords include: transparency, bias, inference, API, queries, reverse engineering, black-box, algorithmic accountability,audit.
| Algorithm/paper | Source | Description | Code | Test |
|---|---|---|---|---|
| Adversarial Learning | KDD (2005) | Reverse engineering of remote linear classifiers, using membership queries | Experimented (locally) on mail spam classifiers | |
| Privacy Oracle: a System for Finding Application Leakswith Black Box Differential Testing | CCS (2008) | Privacy Oracle: a system that uncovers appli-cations’ leaks of personal information in transmissions to remoteservers. | Experimented on evaluated 26 popular applications | |
| Query Strategies for Evading Convex-Inducing Classifiers | JMLR (2012) | Evasion methods for convex classifiers. Considers evasion complexity | ||
| Measuring Personalization of Web Search | WWW (2013) | Develops a methodology for measuring personalization in Web search result | Experimented on Google Web Search | |
| XRay: Enhancing the Web’s Transparency with Differential Correlation | USENIX Security (2014) | Audits which user profile data were used for targeting a particular ad, recommendation, or price | Available here | Demonstrated using Gmail, Youtube, and Amazon recommendation services |
| Peeking Beneath the Hood of Uber | IMC (2015) | Infer implementation details of Uber's surge price algorithm | Four weeks of data from Uber (from 43 copies of the Uber app) | |
| Bias in Online Freelance Marketplaces: Evidence from TaskRabbit | dat workshop (2016) | Measures the TaskRabbit’s search algorithm rank | Crawled TaskRabbit website | |
| Stealing Machine Learning Models via Prediction APIs | Usenix Security (2016) | Aims at extracting machine learning models in use by remote services | Available here | Demonstrated on BigMl and Amazon Machine Learning services |
| “Why Should I Trust You?”Explaining the Predictions of Any Classifier | arXiv (2016) | Explains a blackbox classifier model by sampling around data instances | Available here | Experimented on religion newsgroup aond on multi-momain sentiment datasets |
| Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters | Security and Privacy (2016) | Black-box analysis of sanitizers and filters | ||
| Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems | Security and Privacy (2016) | Introduces measures that capture the degree of influence of inputs on outputs of the observed system | Tested inhouse on machine learning models on two datasets | |
| Uncovering Influence Cookbooks : Reverse Engineering the Topological Impact in Peer Ranking Services | CSCW (2017) | Aims at identifying which centrality metrics are in use in a peer ranking service | ||
| The topological face of recommendation: models and application to bias detection | Complex Networks (2017) | Proposes a bias detection framework for items recommended to users | Tested on Youtube crawls | |
| Membership Inference Attacks Against Machine Learning Models | Symposium on Security and Privacy (2017) | Given a machine learning model and a record, determine whether this record was used as part of the model’s training dataset or not | Tested using Amazon ML and Google Prediction API | |
| Adversarial Frontier Stitching for Remote Neural Network Watermarking | Neural Computing and Applications (2019) | Check if a remote machine learning model is a "leaked" one: through standard API requests to a remote model, extract (or not) a zero-bit watermark, that was inserted to watermark valuable models (eg, large deep neural networks) | Available here | |
| Practical Black-Box Attacks against Machine Learning | Asia CCS (2017) | Understand how vulnerable is a remote service to adversarial classification attacks | Tested against Amazon and Google classification APIs | |
| Towards Reverse-Engineering Black-Box Neural Networks | ICLR (2018) | Infer inner hyperparameters (eg number of layers, non-linear activation type) of a remote neural network model by analysing its response patterns to certain inputs | Available here | |
| Data driven exploratory attacks on black box classifiers in adversarial domains | Neurocomputing (2018) | Reverse engineers remote classifier models (e.g., for evading a CAPTCHA test) | Tested on Google Cloud Prediction API | |
| xGEMs: Generating Examplars to Explain Black-Box Models | arXiv (2018) | Searches bias in the black box model by training an unsupervised implicit generative model. Thensummarizes the black-box model behavior quantitatively by perturbing data samples along the data manifold. | Tested on Resnet models | |
| Learning Networks from Random Walk-Based Node Similarities | arXiv (2018), to appear in NIPS | Reversing graphs by observing some random walk commute times. | ||
| Identifying the Machine Learning Family from Black-Box Models | CAEPIA (2018) | Determines which kind of machine learning model is behind the returned predictions. | ||
| Knockoff Nets: Stealing Functionality of Black-Box Models | CVPR (2019) | ask to what extent can an adversary steal functionality of such "victim" models based solely on blackbox interactions: image in, predictions out. | ||
| Stealing Neural Networks via Timing Side Channels | arXiv (2018) | Stealing/approximating a model through timing attacks usin queries | ||
| Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data | IJCNN (2018) | Stealing black-box models (CNNs) knowledge by querying them with random natural images (ImageNet and Microsoft-COCO). | Available here | Tested on three problem domains (facial recognition, general object, and crosswalk classification) and Azure. |
| Data driven exploratory attacks on black box classifiers in adversarial domains | Neurocomputing (2018) | An explore-exploit framework to assess the vulnerabilities of black-box classifiers | Tested on few security datasetson the Google Cloud platform. | |
| Making targeted black-box evasion attacks effective andefficient | arXiv (2019) | Investigates how an adversary can optimally use its query budget for targeted evasion attacks against deep neural networks. | Tested on t Google Cloud Vision. | |
| Online Learning for Measuring Incentive Compatibility in Ad Auctions | WWW (2019) | Measures the incentive compatible (IC) mechanisms (regret) of black-box auction lpatforms | ||
| TamperNN: Efficient Tampering Detection of Deployed Neural Nets | ISSRE (2019) | Algorithms to craft inputs that can detect the tampering with a remotely executed classifier model | Tested on classic image classifiers available in Keras | |
| Neural Network Model Extraction Attacks in Edge Devicesby Hearing Architectural Hints | arxiv (2019) | Through the acquisition of memory access events from bus snooping, layer sequence identification bythe LSTM-CTC model, layer topology connection according to the memory access pattern, and layer dimension estimation under data volume constraints, it demonstrates one can accurately recover the a similar network architecture as the attack starting point | ||
| Stealing Knowledge from Protected Deep Neural Networks Using Composite Unlabeled Data | ICNN (2019) | Composite method which can be used to attack and extract the knowledge ofa black box model even if it completely conceals its softmaxoutput. | Tested on NVidia hardware | |
| Neural Network Inversion in Adversarial Setting via Background Knowledge Alignment | CCS (2019) | Model inversion approach in the adversary setting based on training an inversion model that acts as aninverse of the original model. With no fullknowledge about the original training data, an accurate inversion is still possible by training the inversion model on auxiliary samplesdrawn from a more generic data distribution. | Tested on Amazon Rekognition API | |
| Adversarial Model Extraction on Graph Neural Networks | AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (DLGMA) (2020) | Introduces GNN model extraction and presents a preliminary approach for this. | ||
| Remote Explainability faces the bouncer problem | Nature Machine Intelligence volume 2, pages529–539 (2020) | Shows the impossibility (with one request) or the difficulty to spot lies on the explanations of a remote AI decision. | Available here | |
| GeoDA: a geometric framework for black-box adversarial attacks | CVPR (2020) | Crafts adversarial examples to fool models, in a pure blackbox setup (no gradients, inferred class only). | Available here | |
| The Imitation Game: Algorithm Selectionby Exploiting Black-Box Recommender | Netys (2020) | Parametrize a local recommendation algorithm by imitating the decision of a remote and better trained one. |
- TODO