RedCASTLE

Practically Applicable k_s-Anonymity for IoT Streaming Data at the Edge in Node-RED

@inproceedings{redcastle2021,
	author = {Frank Pallas and Julian Legler and Niklas Amslgruber and Elias Grünewald},
	title = {{RedCASTLE}: Practically Applicable $k_s$-Anonymity for {IoT} Streaming Data at the Edge in {Node-RED}},
	year = {2021},
	isbn = {978-1-4503-9167-2/21/12},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	url = {https://doi.org/10.1145/3429881.3430107},
	doi = {10.1145/3493369.3493601},
	location = {Virtual Event, Canada},
	bookTitle = {Proceedings of the 8th International Workshop on Middleware and Applications for the Internet of Things}
}

Summary

This is a project mainl< done during the Summer Semester 2021 at the Technical University Berlin in the module Privacy Engineering. The goal was to implement privacy related features in an actual use case to provide value for others in the future. In this project k-Anonymity for streaming data is implemented in the Node-Red environment.

This repository is based on CASTLEGUARD, which implements the CASTLE (Continuously Anonymizing STreaming data via adaptive cLustEring) algorithm by J. Cao, B. Carminati, E. Ferrari and K. Tan.

Contributions

The CASTLEGUARD algorithm has been extended by several features:

Integrated streaming interface for input and output data based on MQTT
Support for non-numerical data through automatic conversion
No constraints regarding the number of senders
High Configurability
Multiple deployment options

How to run it

There are three possible options for running the system:

Manually: Install all needed dependencies and start it locally
Docker: Build a Docker image from the DOCKERFILE or use the pre-configured docker image from DockerHub
Cloud deployment: Deploy and provision the code on a Cloud VM with TerraForm

Manual

To run the project you need to install these dependencies:

Node-Red
MQTT Mosquitto
Python (>= 3.6)
pandas, numpy, paho-mqtt, matplotlib

The required Python packages can be installed automatically with pip install -r requirements.txt. To pass and retrieve data from the component, you need to specify the host and port of your MQTT server in the config.json or use the default configuration on localhost:1883.

For starting both Mosquitto and Node-Red you can simple execute the setup.sh script (macOS & Linux only) You can access Node-RED on localhost:1880.

Docker

The example setup can also be run using Docker. Simply build a Docker image from the DOCKERFILE or pull the latest Docker image from Docker Hub.

Alternatively, you can run the Docker image with docker run -ti -p 1883:1883 -p 1880:1880 niklasamslgruber/node-red-castle and navigate to localhost:1880 to see Node-RED.

Cloud Deployment

We prepared a Terrafrom deployment to easily deploy RedCASTLE in the cloud.

You need to install Terraform and the GCloud SDK.
Configure GCloud with gcloud init and set it as the default login mechanism gcloud auth application-default login
Then run ones terraform init in the root directory of this repository.
Afterwards you can start the deployment with terraform apply.

To cleanup you need to run terraform destroy.

How to configure

You can modify the default configuration by adjusting the config.json file in CASTLE/src/config.json. The config file should be kept in the src directory. The config.json file is split into two parts (params and io).

`params`

k: Value for k-anonymity
delta: Maximum number of tuples
beta: Number of non-k-anonymized clusters in memory
mu: Threshold for deciding whether to push a new datapoint into an existing cluster or create a new one
seed: Random seed (optional)
sensitive_attribute: The sensitive attribute used for the k-anonymity
quasi_identifiers: All identifiers which should be generalized
non_categorized_columns: All columns which should be automatically transformed into numerical categories (required for all columns with string values)
pid_column: Name of the column with a unique identifier
history: Whether CASTLE should record all input tuples (optional)

`io`

host: Host of your MQTT server (default: localhost) (optional)
port: Port of your MQTT server (localhost: 1883) (optional)
mqtt_topics_in: All MQTT topics the system should subscribe on
mqtt_topic_out: The topic to publish the output data on

How to use

When starting the project over one of the three ways shown above, you should be able to access the Node-RED web ui under <ip-address>:1880, the Node-Red dashboard under <ip-address>:1880/ui and the mqtt broker via <ip-address>:1883.

Quickstart

To run our validation test scenario browse to <ip-address>:1880/ui and press the button "start simulation and castleguard".

Use your own data

change the CASTLE/src/config.json accordingly to your data.
click the "manual start" node in the ks-Anonymization tab of Node-REDs web ui.
Send your data to the mqtt broker <ip-address>:1883
now the anonymized data should be written to the file anonymized_tuples.csv

Overview

When accessing the Node-RED web ui at <ip-address>:1880 you should see the nodes and relations of the current flow. There are 5 flows represented by the tabs at the top: ks-Anonymization, Charging Station Emulator, Statistics, Filtering, Dashboard entrypoint.

ks-Anonymization and Charging Station Emulator are the core flows that will run in parallel. The Charging Station Emulator flow finish by publish to the MQTT broker and ks-Anonymization is triggers by receive a message from the MQTT subscriber node. Statistics is always called as one of the final steps of ks-Anonymization to generate statistics shown in the Node-RED Dashboard (<ip-address>:1880/ui). There are implemented subflows to manipulate data, that can also be found in the node selection of the left side. These manipulations are currently used during the generation of you test data and in the normal executing process at the end of ks-Anonymization when the flow Filtering is called. See section Data manipulation for further information how to use this.

Data send to the MQTT broker have to be in JSON format. In order to deal with this data, you need to make changes to CASTLE/src/config.json. The sensitive_attribute, quasi_identifiers and non_categorized_columns have to be changed in respond to the data you expect to be sent to the MQTT broker. Without these changes, the chances are high that the Castleguard backend will immediately crash when unknown data arrives. A Castleguard crash can be detected by having a look at the "run CASTLEGUARD" node in the ks-Anonymization flow. When this node is running correctly, it should show a blue dot with pid: <number>. rc: <number> means the background process has crashed. For further debugging purposes you could also simply run CASTLEGUARD in a command line and stop Node-RED von starting it by deleting ingoing lines to the "run CASTLEGUARD" node.

Performance Impact

Depending on the used machine and the throughput rate you have to expect multiple seconds to minutes overhead.

Higher throughput lowers the added delay. [sic] This is because the algorithm needs to collect a specific amount of data to achieve the set privacy constraint. With lower throughput the data have to sit longer inside the algorithms clusters to wait until they can get released.

Benchmarks

We performed some benchmarks on a n2-standard-2 GCloud Compute Engine with 2 vCPUs and 8 GB RAM.

Throughput: 40 messages per second were sent and processed.

Throughput: 8 messages per second were sent and processed.

Note the differences in the axis scale.

On the n2-standard-2 GCloud Compute Engine with 2 vCPUs and 8 GB RAM we achieved a maximum of 45 messages per second. Further research showed, that without the use of the modified CASTLEGUARD algorithm, Node-Red and Mosquitto can deal with 95 MQTT messages per second but slowly trends towards 45 msg/s when the message queue fills up. Without the use of a message queue or broker and without the modified CASTLEGUARD algorithm, we where able to measure a constant throughput of 195 messages per second.

These findings indicate that the used message broker may be a possible bottleneck in the current implementation state.

The benchmark images are screenshots from the integrated Node-RED dashboard. (accessible under <ip-address>:1880/ui)

Example Dataset

The used validation use case is a dataset with electric vehicle charging data. The data used is provided by the city of Boulder in Colorado (USA) via their Open Data Plattform in a CC0 1.0 Public Domain Dedication license model. To spice up the dataset, a number of fake persons with specific vehicle models and unique ids are generated and used to enrich the original dataset.

Station Name	Address	Zip/Postal Code	Start Date & Time	End Date & Time	Total Duration (hh:mm:ss)	Charging Time (hh:mm:ss)	Energy (kWh)	GHG Savings (kg)	Gasoline Savings (gallons)	customer id	allow dynamic charging	car brand	car model
BOULDER / JUNCTION ST1	2280 Junction Pl	80301	1/1/2018 17:49	1/1/2018 19:52	2:03:02	2:02:44	6.504	2.732	0.816	1006	true	Tesla	Model Y
BOULDER / JUNCTION ST1	2280 Junction Pl	80301	1/2/2018 8:52	1/2/2018 9:16	0:24:34	0:24:19	2.481	1.042	0.311	1052	true	BMW	i3

What output you can expect

These are just a compressed version of the Data above to show what transformations are made during the k-anonymization process.

The list of ids, for example for Car Brand can be mapped back to the corresponding strings via the generated mapping.json.

Data manipulation

Additionally a few functionalities were added to assist further to achieve privacy when working with personal data. Fro this a filter, reduce and a change function are implemented.

These functions can be configured via a json object. This configuration object has to be added to the message object, not the msg.payload object.

Example configurations for filter, reduce and changes:

Suppressed specific properties:

{
    "suppressed_properties": [
        "ObjectId",
        "Address",
        "City"
    ]
}

message object structure for this example: msg.suppressed_properties.suppressed_properties (array[3])

Filter for specific conditions. Currently supported are range filtering as well as whitelist and blacklist filtering. A entry has to pass all filter conditions, otherwise the entry is removed from the set. You could also use only one or two of the filter conditions.

{
    "filterCondition": {
        "rangeFilter": {
            "columnName": "ObjectId",
            "minValue": 10000,
            "maxValue": 30000
        },
        "whitelistFilter": {
            "columnName": "ObjectId",
            "whitelistValues": [
                10459,
                22794,
                20286,
                872
            ]
        },
        "blacklistFilter": {
            "columnName": "ObjectId",
            "blacklistValues": [
                22794
            ]
        }
    }
}

message object structure for this example: msg.filterCondition.rangeFilter (object) ...

Change or append new properties based on existing properties.

Add a car_price property based on a given mapping of car_model names and prices.

{
    "changeStringEqual": {
        "sourceAttributeName": "car_modell",
        "changeAttributeName": "car_price",
        "change": [
            {
                "conditionStringEqual": "e-tron 55",
                "valueForChangeAttributeName": 84459
            },
            {
                "conditionStringEqual": "e-tron 50",
                "valueForChangeAttributeName": 69100
            }
        ]
    }
}

message object structure for this example: msg.changeConditions.changeStringEqual (object)

Add a alternative way to interpret the car prices by adding car_price_alt which sets different values depending on if the car_price is in a specific range.

{
    "changeRangeBased": {
        "sourceAttributeName": "car_price",
        "changeAttributeName": "car_price_alt",
        "change": [
            {
                "conditionMin": 40000,
                "conditionMax": 120000,
                "valueForChangeAttributeName": "expensive"
            },
            {
                "conditionMin": 0,
                "conditionMax": 39999,
                "valueForChangeAttributeName": "cheap"
            }
        ]
    }
}

message object structure for this example: msg.changeConditions.changeRangeBased (object)

Troubleshooting

For some reason the ZeroMQ packages are not properly installed automatically, so this has to be done manually:

So you access the Node-RED dashboard: http://127.0.0.1:1880
Press the "Hamburger menu" in the top right corner
Manage Palette
In the new window then click on the installation tab
Enter "node-red-contrib-zeromq" in the search field
Then click on "install" above the found entry in the list below.
Reload Dashboard

We already had enormous problems with the ZeroMQ library and the reliable installation and use back then. We then decided on an alternative Node-RED broker that is not extremely outdated and at the same time compatible with zeromq, since the "CASTLE" part in the form of the Python scripts should use zeromq.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
CASTLE		CASTLE
benchmark_results		benchmark_results
configs		configs
images		images
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
Electric_Vehicle_Charging_Station_Energy_Consumption (1).csv		Electric_Vehicle_Charging_Station_Energy_Consumption (1).csv
Electric_Vehicle_Charging_Station_Energy_Consumption_extended.csv		Electric_Vehicle_Charging_Station_Energy_Consumption_extended.csv
LICENSE		LICENSE
README.md		README.md
flow.json		flow.json
flow_cred.json		flow_cred.json
main.tf		main.tf
mapping.json		mapping.json
outputs.tf		outputs.tf
package.json		package.json
requirements.txt		requirements.txt
setup.sh		setup.sh
setup_local.sh		setup_local.sh
start_castle.sh		start_castle.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedCASTLE

Practically Applicable k_s-Anonymity for IoT Streaming Data at the Edge in Node-RED

Summary

Contributions

How to run it

Manual

Docker

Cloud Deployment

How to configure

`params`

`io`

How to use

Quickstart

Use your own data

Overview

Performance Impact

Benchmarks

Example Dataset

What output you can expect

Data manipulation

Example configurations for filter, reduce and changes:

Troubleshooting

About

Releases

Packages

Contributors 3

Languages

License

PrivacyEngineering/RedCASTLE

Folders and files

Latest commit

History

Repository files navigation

RedCASTLE

Practically Applicable k_s-Anonymity for IoT Streaming Data at the Edge in Node-RED

Summary

Contributions

How to run it

Manual

Docker

Cloud Deployment

How to configure

params

io

How to use

Quickstart

Use your own data

Overview

Performance Impact

Benchmarks

Example Dataset

What output you can expect

Data manipulation

Example configurations for filter, reduce and changes:

Troubleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`params`

`io`

Packages