Ancile - Use-Based Privacy for applications
This project implements the following paper:
Eugene Bagdasaryan, Griffin Berlstein, Jason Waterman, Eleanor Birrell, Nate Foster, Fred B. Schneider, Deborah Estrin, 2019 Ancile: Enhancing Privacy for Ubiquitous Computing with Use-Based Privacy, WPES.
Table of Contents
Widespread deployment of Intelligent Infrastructure and the Internet of Things (IoT) creates large quantities passively-generated data. This has ushered in the era of data-rich applications, such as location-based services, while posing new privacy threats. This project explores the challenges that arise in applying use-based privacy to such data. We have developed Ancile, a platform that enforces use-based privacy for applications wishing to access users' personal data. We find that Ancile constitutes a functional, performant platform for deploying privacy-enhancing ubiquitous computing applications.
Our system allows applications to submit an arbitrary Python program that requests data from Ancile registered data sources. Ancile, upon receiving this program, fetches the policy and access tokens associated with the user and the data source. Ancile attempts to execute the application's program in a restricted environment, enforcing the policies. If the program completes without policy violations the result of the program is returned back to the application.
Use-based privacy (Birrell et al.) focuses on preventing harmful uses (NYTimes) rather than restricting access to data. The application gets to use all necessary data for non-harmful purposes. Each datapoint in Ancile has a policy that specifies what uses are permitted. Furthermore, this framework utilizes reactive approach meaning that after performing transformations on data policy will change.
- Company's data -- data collected by the company's internal services such as emails, location data, etc. Novel third-party applications propose new services such as optimizing workplaces, person/room finders, depression/suicide preventions. However, these services require access to sensitive data, but usually given access is too broad for the needs of the applications. For example, a service that provides information on nearby available rooms does not need constant access to user location data. Unrestricted release of raw data can lead to malicious uses where the user location is accessed after hours or outside of the office. Ancile can address this problem by defining a policy on user's location data that shares data only at specific hours or at the specific location.
We define three roles:
- Admin - responsible for configuring Ancile, approving applications, maintaining user policies
- Application -- needs user's sensitive data
- User -- possesses sensitive information available through OAuth endpoints
Once Ancile is installed we assume the following sample workflow:
- Admin configures Ancile and connects OAuth-enabled data sources
- User registers on Ancile and performs OAuth-authentication with required data sources.
- Application developer registers on Ancile
- User picks a policy associated with the application and connected data source
- Application sends a Python program that requests user's data
- Ancile executes the program with the associated policy and if successful returns the data back to the application otherwise return error.
Policies define an automata that changes on operations with data. For example, applying transformation that fuzzes the location can enable a bigger set of further operations on this data.
Our policy is defined as a regular expression over an alphabet of operations (Python commands) using the following operations:
- Sequence --
commandA . commandBdeclares that the program has two call
commandBonly after calling
- Union --
commandA + commandBeither of both commands can be invoked.
- Intersection --
commandA & commandBboth commands need to match.
- Iteration --
commandA*command can be repeated multiple times.
- Negation --
!commandAcan be any command except
We use Brzozowski derivatives approach that allows to advance the regular expression when calling a command. Brzozowski defines two key operations: D-step that applies when any command is invoked and E-step that applies only when the application wants to get data back from Ancile.
Data Policy Pair
In Ancile data travels with the policy in a special container: DataPolicyPair. This object is protected using RestrictedPython framework. To obtain data from the user the developer submits the following program:
dpp = fetch_data(user=user('firstname.lastname@example.org'))
That puts fetched data into the object
dpp. The developer can only execute
functions that are allowed by the policy framework. For example, if the policy specifies:
transform.return_to_app for some commands
then the following program will work:
dpp1 = transform(data=dpp) return_to_app(dpp1)
return_to_app are special commands that have to run only in the end of the policy
and if successful Ancile will return data back to the application.
Ancile supports custom functions as well as normal third-party libraries to be controlled
by the policies. All custom functions have to be defined under
We use three different types of functions:
- Fetch functions: annotated by
@ExternalDecorator()functions can get OAuth token for the user and perform external calls
- Transformation functions: annotated by
DataPolicyPairobject and return transformed
- Return functions: annoted by
DataPolicyPairobject and return it back if successful.
Beyond these functions we as well support conditional and collection operations that we will introduce later.
Here are the installation Instructions.
We have a development environment running at https://dev.ancile.smalldata.io
so please free to explore it. There are few test accounts set up for exploration.
- Login with app credentials
- Choose app view on the right-top corner
- Click on
Conolein the left bar
- Pick the first app
- Specify user as
userand press Enter
- My user has the following policy:
- Put the following program and click
dpp1 = indoor_location.fetch_location(user=user('user')) dpp2 = indoor_location.fuzz_location(data=dpp1['location'], mean=0, std=0.2) return_to_app(data=dpp2['sta_location_x'])
- You will get my distorted location.