GreyNSights

The grey area between privacy and utility

GreyNSights is a Framework for Privacy-Preserving Data Analysis. Currently with support only for Pandas. The framework allows analysts to remotely query a dataset such that the dataset remains at source and private to data analyst. The package offers flexbility to the analyst by ensuring that they can use the same pandas syntax for analyzing and transforming datasets, but cannot view the indiviual rows. GreyNSights also offers flexibility to query several parties together and get aggregate statistics without revealing individual counts of parties.

Not for production usage.

The three major principles behind the library:

No raw data is exposed only aggregates

The analyst can query and transform the dataset however they would want to, but can only get the aggregate results back.
The aggregates or analysis does not leak any information about individual rows

The aggregate results are differentially private securing data rows from differencing attacks.
Pandas capabilities to transform and process datasets is still preserved

The analyst might have to add a few lines of code for initializing the setup with dataowner, but they would essentially use the same pandas syntax ensuring anybody who already knows pandas could use without having to learn anything more.

Installation

Clone the repository

https://github.com/kamathhrishi/GreyNSights.git
Install the required packages

pip install requirements.txt
Install the library from source

python3 setup.py install

Workflow Diagram

Usage

Analysis using GreyNSights hosted remotely.

#Initilization code of GreyNSights
import GreyNsights
from GreyNsights.analyst import DataWorker, DataSource, Pointer, Command, Analyst
from GreyNsights.frameworks import framework

identity = Analyst("Alice", port=65441, host="127.0.0.1")
worker = DataWorker(port=6544, host="127.0.0.1")
dataset = DataSource(identity,worker, "Sample Data")
config = dataset.get_config()

#Initialization Pointer
dataset_pt = config.approve().init_pointer()

#Analysis of dataset
df = pandas.DataFrame(dataset_pt)
df.columns
df.describe().get()
df['carrots_eaten'].mean().get()
df['carrots_eaten'].sum().get()
(df['carrots_eaten']>70).sum().get()
df['carrots_eaten'].max().get()

Analysis using Pandas

dataset=pd.read_csv(<PATH>)

df = pandas.DataFrame(dataset)
df.columns
df.describe().get()
df['carrots_eaten'].mean()
df['carrots_eaten'].sum()
(df['carrots_eaten']>70).sum()
df['carrots_eaten'].max()

Examples

Accidents example provides examples of how range of queries could be performed and how datasets could be transformed using GreyNSights
Federated Analytics example which shows how you could analyze datasets of several parties together. This is only restricted to linear queries such as sum, average, std and counts.

Contributing

Read CONTRIBUTING documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
GreyNsights		GreyNsights
docker-images/base		docker-images/base
examples		examples
images		images
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pre_commit.sh		pre_commit.sh
requirements.txt		requirements.txt
setup.py		setup.py

License

kamathhrishi/GreyNSights

Folders and files

Latest commit

History

Repository files navigation

GreyNSights

Not for production usage.

Installation

Workflow Diagram

Usage

Examples

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages