This toolbox has been implemented as part of the project Analyse und Bekämpfung von bandenmäßigem Betrug im Onlinehandel (ABBO). The project was funded by the German Federal Ministry of Education and Research and aimed to impede the success of organized fraud targeting online retailers.
Ubuntu 12.04 LTS:
The ABBO toolbox uses Cython. Therefore, you first need to install:
sudo apt-get install python-dev g++ pip install cython
Once you have installed these packages, you can compile the
code and finally install the ABBO toolbox with
python setup.py build_ext -i python setup.py install
In the following, we present some examples of the usage of the toolbox.
An overview of all available modules can be displayed with
abbo_cli.py --help. Furthermore, a demo of the tool can be found in
The tool allows creating artificial toy data which can be used to
examine attacks on the pseudonymization method. In particular, the
data is derived from several marginal distributions (mostly German
name distributions) stored in
The following command creates a random dataset with 100 entries:
abbo_cli generate -c 50 -n 100 example.json
This will create 100 orders from 50 different customers and store them
in JSON format. Note that in this example, multiple orders can be
assigned to the same customer. In case you require unique assignments,
you can use the
The data set created in the previous step can now be pseudonymized. To
do so, we use the
pseudonymization module of the toolbox. For
instance, we can pseudonymize the data using Bloom filters with a
length of 4000 Bits (
abbo_cli pseudonymize -m 4000 -d colored -n 3 -k 3 -t keyed -e KEY example.json example.dat
In this example, the orders are first decomposed using a
colored 3-gram decomposition (
-d colored and
-n 3). Subsequently, the
resulting elements are inserted into the Bloom filters using 3 keyed
hash functions (
-k 3 and
-t keyed) where the used key is defined
A full list of available options can be displayed with
abbo_cli.py pseudonymize --help.
Hardening Bloom filters
We can further improve the pseudonymization strength of the Bloom
filter by applying the mechanisms implemented in the
module. In particular, it is possible to
add noise to the filters or
merge them to decrease the probability of a de-pseudonymization.
abbo_cli hardening -l 3 -n 50 example.dat harden.dat
This will merge groups of three filters (
-l 3) and add fifty percent
-n 50) to the resulting filter.
Converting to LIBSVM format
The toolbox also provides a module to convert data into the LIBSVM format. This can be helpful in order to apply common machine learning tools like LIBSVM or LIBLINEAR.
abbo_cli convert example.dat example.libsvm
Finally, the toolbox allows predicting fraud using a linear SVM model.
Please note that the provided model is for demonstration purposes only
since it has been trained on toy data. However, the model can easily
be replaced by a model trained on actual data. An example on how to
train and use a (simple) custom model can be found in the