# Tulap Mechanism

In [1]:
pip install opendp


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import scipy.optimize
from scipy.stats import binom
import math
import pandas as pd
import scipy.optimize
from postprocessors import Tulap
import opendp.prelude as dp
from opendp.transformations import make_split_dataframe, make_select_column
dp.enable_features("contrib")


In this example we will use the California demographics data. Here we get the data from the URL below: 
We see that the age, sex, education, race, income and whether an individual is married or not is captured.

In [3]:
url = 'https://raw.githubusercontent.com/opendp/opendp/main/docs/source/data/PUMS_california_demographics_1000/data-with-header.csv'
data = pd.read_csv(url)
data.head()

Unnamed: 0,age,sex,educ,race,income,married
0,59,1,9,1,0.0,1
1,31,0,1,3,17000.0,0
2,36,1,11,1,0.0,1
3,54,1,11,1,9100.0,1
4,39,0,5,3,37000.0,0


If we wish to publish inference based on the 'sex and 'married' column we can use the tulap mechanism. We can perform UMP test and get a single p value and get the confidence interval as post processing with a fixed privacy cost. 

### Step 1: Initiating the Tulap Object

We can initiate a Tulap object by epsilon, delta and the length of the data. The epsilon and delta specified will be used as the parameters of the Tulap distribution. The distribution is constructed by combination of discrete Laplace noise and continuous uniform noise. See proof for how epsilon and delta are related to the parameters of the Tulap distribution

In [4]:
epsilon = 1
delta = 0.01
size = len(data)
count_sex =  sum (data['sex'])
count_married = sum (data['married'])
tulap_sex = Tulap(count_sex, epsilon, delta, size)
tulap_married = Tulap(count_married, epsilon, delta, size)

Let's construct one-sided DP UMP test for theta = 0.5  as null hypothesis and theta > 0.5 as alternate hypothesis.  

In [11]:
alpha = 0.05
theta = 0.5
ump_result_sex = tulap_sex.ump_test(theta, alpha, 'left')
ump_result_married = tulap_married.ump_test(theta, alpha, 'left')

print("UMP Test Result for 'sex':", ump_result_sex)
print("UMP Test Result for 'married':", ump_result_married)

UMP Test Result for 'sex': [0. 0. 0. ... 1. 1. 1.]
UMP Test Result for 'married': [0. 0. 0. ... 1. 1. 1.]


### Calculate p-values

For the same null hypothesis with theta = 0.5, we can calculate one sided p value for sex and marriage as below:

The p values obtained are private

In [6]:

p_value_sex = tulap_sex.p_value(theta, 'left',)
p_value_married = tulap_married.p_value(theta, 'left')

print("P-value for 'sex':", p_value_sex)
print("P-value for 'married':", p_value_married)


P-value for 'sex': 0.8111999187156591
P-value for 'married': 0.9990002996808233


### Calculate Confidence Interval

The 95% one sided confidence interval for theta = 0.5 is shown below:

In [7]:

# Calculate Confidence Intervals
tulap_sex_CI = Tulap(data['sex'], epsilon, delta, size)
tulap_married_CI = Tulap(data['married'], epsilon, delta, size)

ci_sex_lower = tulap_sex_CI.CI(alpha, 'lower')
ci_married_lower = tulap_married_CI.CI(alpha, 'lower')


print("Lower Confidence Interval for 'sex':", ci_sex_lower)
print("Lower Confidence Interval for 'married':", ci_married_lower)


binary search, stepsize =  0.001
binary search, stepsize =  0.001
Lower Confidence Interval for 'sex': 0.0009570312499999998
Lower Confidence Interval for 'married': 0.0009570312499999998


Similarly upper CI for theta = 0.5 can also be calculated

In [9]:

ci_sex_upper = tulap_sex_CI.CI(alpha, 'upper')
ci_married_upper = tulap_married_CI.CI(alpha, 'upper')
print("Upper Confidence Interval for 'sex':", ci_sex_upper)
print("Upper Confidence Interval for 'married':", ci_married_upper)

binary search, stepsize =  0.001
binary search, stepsize =  0.001
Upper Confidence Interval for 'sex': 0.0048710937499999996
Upper Confidence Interval for 'married': 0.0048710937499999996
