# Tulap Mechanism

The Tulap mechanism is well-suited for privatizing estimates that follow a binomial distribution,
like a count (or boolean sum).

An example of this can be found when counting the number of married people in the California demographics dataset,
a microdata-level dataset with attributes for age, sex, education, race, income and marriage status.

In [1]:
![ -e data.csv ] || wget https://raw.githubusercontent.com/opendp/opendp/main/docs/source/data/PUMS_california_demographics_1000/data.csv

In [2]:
data = open("data.csv").read()

We'll use the Tulap mechanism to conduct statistical inference on the 'married' column,
where values are either 0 or 1, indicating marriage status.

The following transformation parses a CSV and sums.

In [3]:
import opendp.prelude as dp
dp.enable_features("contrib")

t_count_married = (
    dp.t.make_split_dataframe(",", col_names=["age", "sex", "educ", "race", "income", "married"]) >>
    dp.t.make_select_column("married", str) >>
    dp.t.then_cast_default(float) >>
    dp.t.then_clamp((0., 1.)) >>
    dp.t.then_sum()
)

# what is the exact number of married individuals?
t_count_married(data)

549.0

The Tulap mechanism behaves similarly to the Laplace or Gaussian mechanism in that it is an additive noise mechanism.

> At this time, the implementation of the Tulap mechanism in OpenDP only supports scalar-valued float inputs with a sensitivity at most one.

In [4]:
epsilon, delta = 0.1, 1e-8
m_count_married = t_count_married >> dp.m.then_tulap(epsilon=epsilon, delta=delta)

# what is the DP estimate of the number of married individuals?
private_estimate = m_count_married(data)
private_estimate

547.2191955328317

The Tulap mechanism comes with a collection of utilities for conducting hypothesis tests and constructing confidence intervals in the setting where data is binomially distributed.

In [5]:
epsilon, delta = 0.1, 1e-8
from opendp._extrinsics.tulap import then_binomial_tulap
m_count_married = t_count_married >> then_binomial_tulap(epsilon, delta, size=1_000)
m_count_married(data)

Tulap(553.6118727191592)

### Step 1: Initiating the Tulap Object

We can initiate a Tulap object by epsilon, delta and the length of the data. The epsilon and delta specified will be used as the parameters of the Tulap distribution. The distribution is constructed by combination of discrete Laplace noise and continuous uniform noise. See proof for how epsilon and delta are related to the parameters of the Tulap distribution

In [7]:
from opendp._extrinsics.tulap import Tulap
tulap_married = Tulap(private_estimate, epsilon=epsilon, delta=delta, size=1_000)

Let's construct one-sided DP UMP test for theta = 0.5  as null hypothesis and theta > 0.5 as alternate hypothesis.  

In [8]:
alpha = 0.05
theta = 0.5
ump_result_married = tulap_married.ump_test(theta, alpha, 'left')
print("UMP Test Result for 'married':", ump_result_married)

UMP Test Result for 'married': [0. 0. 0. ... 1. 1. 1.]


### Calculate p-values

For the same null hypothesis with theta = 0.5, we can calculate one sided p value for sex and marriage as below:

The p values obtained are private

In [9]:
p_value_married = tulap_married.p_value(theta, 'left')

print("P-value for 'married':", p_value_married)


P-value for 'married': 0.9847932435885637


### Calculate Confidence Interval

The 95% one sided confidence interval for theta = 0.5 is shown below:

In [11]:
ci_married_lower = tulap_married.CI(alpha, 'lower')

print("Lower Confidence Interval for 'married':", ci_married_lower)

binary search, stepsize =  0.001


TypeError: oneside_pvalue() missing 1 required positional argument: 'theta'

Similarly upper CI for theta = 0.5 can also be calculated

In [12]:

ci_married_upper = tulap_married.CI(alpha, 'upper')
print("Upper Confidence Interval for 'married':", ci_married_upper)

binary search, stepsize =  0.001


TypeError: oneside_pvalue() missing 1 required positional argument: 'theta'