The goal of this notebook is to extend the RDP cache. At the moment the minimum possible PB spend for a query is 7.766, which is far too high.

Constraints:
- 300k values in an .npy file = 2.3MB. Assuming it scales linearly, we could probably offer 1.2M cached values in ~9.2MB.
    - 10,000 numbers between rdp_constant of 0 and 1, 10,000 between 1 and 2, this should let us get to a max RDP_constant of around 1.2e6/1e4 = 120 which corresponds to an epsilon spend of ~200
    - however at this point the differences between adjacent integer epsilon values is >1% (i.e. rdp of 118 = 197.858, 119 = 199.195, of 120 = 200.530)
- We want very high coverage of epsilon values from 0 to ~5 as these queries would have very strong privacy
    - the difference in epsilon spent at high RDP Constants seems to be very small (i.e. for an RDP constant of 120, the epsilon spent is 200.53. For 119, it's 119.195, for 121, it's 201.86). So perhaps we could have regions with variable levels of granularity/resolution (i.e., cache is more populated for smaller RDP constants, and rounds to the nearest integer for higher RDP constants)

In [2]:
import syft as sy
import numpy as np

## Find rough ballpark numbers

Let's find at which RDP constants the epsilon spend exceeds -> 0.1, 1, 5, 10, 100, 200

In [3]:
current_cache = np.load("/home/e/PycharmProjects/PySyft/packages/syft/src/syft/cache/constant2epsilon_300k.npy")
print(len(current_cache))

300000


In [75]:
def lowest_index_with_epsilon_spend_of(eps_spend):
    for index, value in enumerate(current_cache):
        if value > eps_spend:
            return f"epsilon spend >= {eps_spend} in index={index} with a value={value}"
lowest_index_with_epsilon_spend_of(10)

'epsilon spend >= 10 in index=1 with a value=11.688596249354894'

In [76]:
lowest_index_with_epsilon_spend_of(100)

'epsilon spend >= 100 in index=49 with a value=100.68990516105823'

In [77]:
lowest_index_with_epsilon_spend_of(200)

'epsilon spend >= 200 in index=120 with a value=200.53014130518943'

In [4]:
from syft.core.adp.data_subject_ledger import DataSubjectLedger as DSL

In [4]:
DSL.mro()

[syft.core.adp.data_subject_ledger.DataSubjectLedger,
 syft.core.adp.abstract_ledger_store.AbstractDataSubjectLedger,
 object]

In [5]:
# future
from __future__ import annotations

# stdlib
from functools import partial
import os
from pathlib import Path
import time
from typing import Any
from typing import Callable
from typing import Optional
from typing import TYPE_CHECKING
from typing import Tuple

In [6]:
# third party
from typing_extensions import Final

# relative
# from ...logger import info

if TYPE_CHECKING:
    # stdlib
    from dataclasses import dataclass
else:
    from flax.struct import dataclass

# third party
import jax
from jax import numpy as jnp
from nacl.signing import VerifyKey
import numpy as np
from scipy.optimize import minimize_scalar

In [None]:
# relative
from ...core.node.common.node_manager.user_manager import RefreshBudgetException
from ...lib.numpy.array import capnp_deserialize
from ...lib.numpy.array import capnp_serialize
from ..common.serde.capnp import CapnpModule
from ..common.serde.capnp import get_capnp_schema
from ..common.serde.capnp import serde_magic_header
from ..common.serde.serializable import serializable
from .abstract_ledger_store import AbstractDataSubjectLedger
from .abstract_ledger_store import AbstractLedgerStore


def get_cache_path(cache_filename: str) -> str:
    here = os.path.dirname(__file__)
    root_dir = Path(here) / ".." / ".." / "cache"
    return os.path.abspath(root_dir / cache_filename)


def load_cache(filename: str) -> np.ndarray:
    CACHE_PATH = get_cache_path(filename)
    if not os.path.exists(CACHE_PATH):
        raise Exception(f"Cannot load {CACHE_PATH}")
    cache_array = np.load(CACHE_PATH)
    info(f"Loaded constant2epsilon cache of size: {cache_array.shape}")
    return cache_array


@dataclass
class RDPParams:
    sigmas: jnp.array
    l2_norms: jnp.array
    l2_norm_bounds: jnp.array
    Ls: jnp.array
    coeffs: jnp.array

    def __repr__(self) -> str:
        res = "RDPParams:"
        res = f"{res}\n sigmas:{self.sigmas}"
        res = f"{res}\n l2_norms:{self.l2_norms}"
        res = f"{res}\n l2_norm_bounds:{self.l2_norm_bounds}"
        res = f"{res}\n Ls:{self.Ls}"
        res = f"{res}\n coeffs:{self.coeffs}"

        return res


@partial(jax.jit, static_argnums=3, donate_argnums=(1, 2))
def first_try_branch(
    constant: jax.numpy.DeviceArray,
    rdp_constants: np.ndarray,
    entity_ids_query: np.ndarray,
    max_entity: int,
) -> jax.numpy.DeviceArray:
    summed_constant = constant.take(entity_ids_query) + rdp_constants.take(
        entity_ids_query
    )
    if max_entity < len(rdp_constants):
        return rdp_constants.at[entity_ids_query].set(summed_constant)
    else:
        pad_length = max_entity - len(rdp_constants) + 1
        rdp_constants = jnp.concatenate([rdp_constants, jnp.zeros(shape=pad_length)])
        summed_constant = constant + rdp_constants.take(entity_ids_query)
        return rdp_constants.at[entity_ids_query].set(summed_constant)


@partial(jax.jit, static_argnums=1)
def compute_rdp_constant(rdp_params: RDPParams, private: bool) -> jax.numpy.DeviceArray:
    squared_Ls = rdp_params.Ls**2
    squared_sigma = rdp_params.sigmas**2

    if private:
        # this is calculated on the private true values
        squared_l2 = rdp_params.l2_norms**2
    else:
        # bounds is computed on the metadata
        squared_l2 = rdp_params.l2_norm_bounds**2

    return (squared_Ls * squared_l2 / (2 * squared_sigma)) * rdp_params.coeffs


@jax.jit
def get_budgets_and_mask(
    epsilon_spend: jnp.array, user_budget: jnp.float64
) -> Tuple[float, float, jax.numpy.DeviceArray]:
    # Function to vectorize the result of the budget computation.
    mask = jnp.ones_like(epsilon_spend) * user_budget < epsilon_spend
    # get the highest value which was under budget and represented by False in the mask
    highest_possible_spend = jnp.max(epsilon_spend * (1 - mask))
    return (highest_possible_spend, user_budget, mask)


@serializable(capnp_bytes=True)
class DataSubjectLedger(AbstractDataSubjectLedger):
    """for a particular data subject, this is the list
    of all mechanisms releasing informationo about this
    particular subject, stored in a vectorized form"""

    CONSTANT2EPSILSON_CACHE_FILENAME = "constant2epsilon_300k.npy"
    _cache_constant2epsilon = load_cache(filename=CONSTANT2EPSILSON_CACHE_FILENAME)

    def __init__(
        self,
        constants: Optional[np.ndarray] = None,
        update_number: int = 0,
        timestamp_of_last_update: Optional[float] = None,
    ) -> None:
        self._rdp_constants = (
            constants if constants is not None else np.array([], dtype=np.float64)
        )
        self._update_number = update_number
        self._timestamp_of_last_update = (
            timestamp_of_last_update
            if timestamp_of_last_update is not None
            else time.time()
        )
        self._pending_save = False

    def __eq__(self, other: Any) -> bool:
        if not isinstance(other, DataSubjectLedger):
            return self == other
        return (
            self._update_number == other._update_number
            and self._timestamp_of_last_update == other._timestamp_of_last_update
            and all(self._rdp_constants == other._rdp_constants)
        )

    @property
    def delta(self) -> float:
        FIXED_DELTA: Final = 1e-6
        return FIXED_DELTA  # WARNING: CHANGING DELTA INVALIDATES THE CACHE

    def bind_to_store_with_key(
        self, store: AbstractLedgerStore, user_key: VerifyKey
    ) -> None:
        self.store = store
        self.user_key = user_key

    @staticmethod
    def get_or_create(
        store: AbstractLedgerStore, user_key: VerifyKey
    ) -> Optional[AbstractDataSubjectLedger]:
        ledger: Optional[AbstractDataSubjectLedger] = None
        try:
            # todo change user_key or uid?
            ledger = store.get(key=user_key)
            ledger.bind_to_store_with_key(store=store, user_key=user_key)
        except KeyError:
            print("Creating new Ledger")
            ledger = DataSubjectLedger()
            ledger.bind_to_store_with_key(store=store, user_key=user_key)
        except Exception as e:
            print(f"Failed to read ledger from ledger store. {e}")

        return ledger

    def get_entity_overbudget_mask_for_epsilon_and_append(
        self,
        unique_entity_ids_query: np.ndarray,
        rdp_params: RDPParams,
        get_budget_for_user: Callable,
        deduct_epsilon_for_user: Callable,
        private: bool = True,
    ) -> np.ndarray:
        # coerce to np.int64
        entity_ids_query: np.ndarray = unique_entity_ids_query.astype(np.int64)
        # calculate constants
        rdp_constants = self._get_batch_rdp_constants(
            entity_ids_query=entity_ids_query, rdp_params=rdp_params, private=private
        )

        # here we iteratively attempt to calculate the overbudget mask and save
        # changes to the database
        mask = self._get_overbudgeted_entities(
            get_budget_for_user=get_budget_for_user,
            deduct_epsilon_for_user=deduct_epsilon_for_user,
            rdp_constants=rdp_constants,
        )

        # at this point we are confident that the database budget field has been updated
        # so now we should flush the _rdp_constants that we have calculated to storage
        if self._write_ledger():
            return mask

    def _write_ledger(self) -> bool:

        self._update_number += 1
        try:
            self._pending_save = False
            self.store.set(key=self.user_key, value=self)
            return True
        except Exception as e:
            self._pending_save = True
            print(f"Failed to write ledger to ledger store. {e}")
            raise e

    def _increase_max_cache(self, new_size: int) -> None:
        new_entries = []
        current_size = len(self._cache_constant2epsilon)
        new_alphas = []
        for i in range(new_size - current_size):
            alph, eps = self._get_optimal_alpha_for_constant(
                constant=i + 1 + current_size
            )
            new_entries.append(eps)
            new_alphas.append(alph)

        self._cache_constant2epsilon = np.concatenate(
            [self._cache_constant2epsilon, np.array(new_entries)]
        )

    def _get_fake_rdp_func(self, constant: int) -> Callable:
        def func(alpha: float) -> float:
            return alpha * constant

        return func

    def _get_alpha_search_function(self, rdp_compose_func: Callable) -> Callable:
        log_delta = np.log(self.delta)

        def fun(alpha: float) -> float:  # the input is the RDP's \alpha
            if alpha <= 1:
                return np.inf
            else:
                alpha_minus_1 = alpha - 1
                return np.maximum(
                    rdp_compose_func(alpha)
                    + np.log(alpha_minus_1 / alpha)
                    - (log_delta + np.log(alpha)) / alpha_minus_1,
                    0,
                )

        return fun

    def _get_optimal_alpha_for_constant(
        self, constant: int = 3
    ) -> Tuple[np.ndarray, Callable]:
        f = self._get_fake_rdp_func(constant=constant)
        f2 = self._get_alpha_search_function(rdp_compose_func=f)
        results = minimize_scalar(
            f2, method="Brent", bracket=(1, 2), bounds=[1, np.inf]
        )

        return results.x, results.fun

    def _get_batch_rdp_constants(
        self, entity_ids_query: jnp.ndarray, rdp_params: RDPParams, private: bool = True
    ) -> jnp.ndarray:
        constant = compute_rdp_constant(rdp_params, private)
        if self._rdp_constants.size == 0:
            self._rdp_constants = np.zeros_like(np.asarray(constant, constant.dtype))
        print("constant: ", constant)
        print("_rdp_constants: ", self._rdp_constants)
        print("entity ids query", entity_ids_query)
        print(jnp.max(entity_ids_query))
        self._rdp_constants = first_try_branch(
            constant,
            self._rdp_constants,
            entity_ids_query,
            int(jnp.max(entity_ids_query)),
        )
        return constant

    def _get_epsilon_spend(self, rdp_constants: np.ndarray) -> np.ndarray:
        rdp_constants_lookup = (rdp_constants - 1).astype(np.int64)
        try:
            # needed as np.int64 to use take
            eps_spend = jax.jit(jnp.take)(
                self._cache_constant2epsilon, rdp_constants_lookup
            )
        except IndexError:
            print(f"Cache missed the value at {max(rdp_constants_lookup)}")
            self._increase_max_cache(int(max(rdp_constants_lookup) * 1.1))
            eps_spend = jax.jit(jnp.take)(
                self._cache_constant2epsilon, rdp_constants_lookup
            )
        return eps_spend

    def _calculate_mask_for_current_budget(
        self, get_budget_for_user: Callable, epsilon_spend: np.ndarray
    ) -> Tuple[float, float, np.ndarray]:
        user_budget = get_budget_for_user(verify_key=self.user_key)
        # create a mask of True and False where true is over current user_budget
        return get_budgets_and_mask(epsilon_spend, user_budget)

    def _get_overbudgeted_entities(
        self,
        get_budget_for_user: Callable,
        deduct_epsilon_for_user: Callable,
        rdp_constants: np.ndarray,
    ) -> Tuple[np.ndarray]:
        """TODO:
        In our current implementation, user_budget is obtained by querying the
        Adversarial Accountant's entity2ledger with the Data Scientist's User Key.
        When we replace the entity2ledger with something else, we could perhaps directly
        add it into this method
        """
        epsilon_spend = self._get_epsilon_spend(rdp_constants=rdp_constants)

        # try first time
        (
            highest_possible_spend,
            user_budget,
            mask,
        ) = self._calculate_mask_for_current_budget(
            get_budget_for_user=get_budget_for_user, epsilon_spend=epsilon_spend
        )

        mask = np.array(mask, copy=False)
        highest_possible_spend = float(highest_possible_spend)
        user_budget = float(user_budget)
        print("Epsilon spend ", epsilon_spend)
        print("Highest possible spend ", highest_possible_spend)
        if highest_possible_spend > 0:
            # go spend it in the db
            attempts = 0
            while attempts < 5:
                print(
                    f"Attemping to spend epsilon: {highest_possible_spend}. Try: {attempts}"
                )
                attempts += 1
                try:
                    user_budget = self.spend_epsilon(
                        deduct_epsilon_for_user=deduct_epsilon_for_user,
                        epsilon_spend=highest_possible_spend,
                        old_user_budget=user_budget,
                    )
                    break
                except RefreshBudgetException:  # nosec
                    # this is the only exception we allow to retry
                    (
                        highest_possible_spend,
                        user_budget,
                        mask,
                    ) = self._calculate_mask_for_current_budget(
                        get_budget_for_user=get_budget_for_user,
                        epsilon_spend=epsilon_spend,
                    )
                except Exception as e:
                    print(f"Problem spending epsilon. {e}")
                    raise e

        if user_budget is None:
            raise Exception("Failed to spend_epsilon")

        return mask

    def spend_epsilon(
        self,
        deduct_epsilon_for_user: Callable,
        epsilon_spend: float,
        old_user_budget: float,
    ) -> float:
        # get the budget
        print("got user budget", old_user_budget, "epsilon_spent", epsilon_spend)
        deduct_epsilon_for_user(
            verify_key=self.user_key,
            old_budget=old_user_budget,
            epsilon_spend=epsilon_spend,
        )
        # return the budget we used
        return old_user_budget

In [37]:
class DSL_dummy:
    def __init__(self):
        self._cache_constant2epsilon = np.zeros(0)
        self.delta = 1e-6
    def _increase_max_cache(self, new_size: int) -> None:
        new_entries = []
        current_size = len(self._cache_constant2epsilon)
        new_alphas = []
        for i in range(new_size - current_size):
            current_constant = i + 1 + current_size
            alph, eps = self._get_optimal_alpha_for_constant(
                constant=current_constant
            )
            print(f"current_constant={current_constant}, alpha={alph}, eps={eps}")
            new_entries.append(eps)
            new_alphas.append(alph)

        self._cache_constant2epsilon = np.concatenate(
            [self._cache_constant2epsilon, np.array(new_entries)]
        )

    def _get_fake_rdp_func(self, constant: int) -> Callable:
        def func(alpha: float) -> float:
            return alpha * constant

        return func

    def _get_alpha_search_function(self, rdp_compose_func: Callable) -> Callable:
        log_delta = np.log(self.delta)

        def fun(alpha: float) -> float:  # the input is the RDP's \alpha
            if alpha <= 1:
                return np.inf
            else:
                alpha_minus_1 = alpha - 1
                return np.maximum(
                    rdp_compose_func(alpha)
                    + np.log(alpha_minus_1 / alpha)
                    - (log_delta + np.log(alpha)) / alpha_minus_1,
                    0,
                )

        return fun

    def _get_optimal_alpha_for_constant(
        self, constant: int = 3
    ) -> Tuple[np.ndarray, Callable]:
        f = self._get_fake_rdp_func(constant=constant)
        f2 = self._get_alpha_search_function(rdp_compose_func=f)
        results = minimize_scalar(
            f2, method="Brent", bracket=(1, 2), bounds=[1, np.inf]
        )

        return results.x, results.fun

dsl = DSL_dummy()

In [38]:
dsl._increase_max_cache(10)

current_constant=1, alpha=4.508496357814772, eps=7.766216625311721
current_constant=2, alpha=3.5060933368492333, eps=11.688596249354894
current_constant=3, alpha=3.0573417104696294, eps=14.947919164492593
current_constant=4, alpha=2.7881643639977356, eps=17.861121033014
current_constant=5, alpha=2.6036578950410423, eps=20.551948814041253
current_constant=6, alpha=2.4669985432838435, eps=23.08419874777858
current_constant=7, alpha=2.360495652447485, eps=25.495916596130975
current_constant=8, alpha=2.2744494959477497, eps=27.811968501910776
current_constant=9, alpha=2.2030366150496095, eps=30.049671251820154
current_constant=10, alpha=2.1425203717085815, eps=32.2216609490976


In [32]:
dsl._cache_constant2epsilon

array([ 7.76621663, 11.68859625, 14.94791916, 17.86112103, 20.55194881,
       23.08419875, 25.4959166 , 27.8119685 , 30.04967125, 32.22166095])

In [27]:
class DSL_dummy:
    def __init__(self):
        self._cache_constant2epsilon = np.zeros(0)
        self.delta = 1e-6
    def generate_cache(self, new_size: int) -> None:
        new_entries = []
        current_size = len(self._cache_constant2epsilon)
        new_alphas = []
        for i in range(new_size - current_size):
            current_constant = i + 1 + current_size
            alph, eps = self._get_optimal_alpha_for_constant(
                constant=current_constant
            )
            print(f"current_constant={current_constant}, alpha={alph}, eps={eps}")
            new_entries.append(eps)
            new_alphas.append(alph)

        self._cache_constant2epsilon = np.concatenate(
            [self._cache_constant2epsilon, np.array(new_entries)]
        )

    def _get_fake_rdp_func(self, constant: int) -> Callable:
        def func(alpha: float) -> float:
            return alpha * constant

        return func

    def _get_alpha_search_function(self, rdp_compose_func: Callable) -> Callable:
        log_delta = np.log(self.delta)

        def fun(alpha: float) -> float:  # the input is the RDP's \alpha
            if alpha <= 1:
                return np.inf
            else:
                alpha_minus_1 = alpha - 1
                return np.maximum(
                    rdp_compose_func(alpha)
                    + np.log(alpha_minus_1 / alpha)
                    - (log_delta + np.log(alpha)) / alpha_minus_1,
                    0,
                )

        return fun

    def _get_optimal_alpha_for_constant(
        self, constant: int = 3
    ) -> Tuple[np.ndarray, Callable]:
        f = self._get_fake_rdp_func(constant=constant)
        f2 = self._get_alpha_search_function(rdp_compose_func=f)
        results = minimize_scalar(
            f2, method="Brent", bracket=(1, 2), bounds=[1, np.inf]
        )

        return results.x, results.fun

dsl = DSL_dummy()
dsl.generate_cache(10)

current_constant=1, alpha=4.508496357814772, eps=7.766216625311721
current_constant=2, alpha=3.5060933368492333, eps=11.688596249354894
current_constant=3, alpha=3.0573417104696294, eps=14.947919164492593
current_constant=4, alpha=2.7881643639977356, eps=17.861121033014
current_constant=5, alpha=2.6036578950410423, eps=20.551948814041253
current_constant=6, alpha=2.4669985432838435, eps=23.08419874777858
current_constant=7, alpha=2.360495652447485, eps=25.495916596130975
current_constant=8, alpha=2.2744494959477497, eps=27.811968501910776
current_constant=9, alpha=2.2030366150496095, eps=30.049671251820154
current_constant=10, alpha=2.1425203717085815, eps=32.2216609490976


  w = xb - ((xb - xc) * tmp2 - (xb - xa) * tmp1) / denom


In [28]:
dsl = DSL_dummy()
alphas = []
epsilons = []

# This will also serve as our step size
min_val = 0.0001

for i in np.arange(min_val, 1 + min_val, min_val):
    alpha, eps = dsl._get_optimal_alpha_for_constant(i)
    alphas.append(alpha)
    epsilons.append(eps)

print(alphas[:10], alphas[-10:])
print("Epsilon values")
print(epsilons[:10], epsilons[-10:])

[286.6113345763701, 206.94946372761132, 171.03566817267512, 149.39888177623587, 134.5200190425928, 123.47110612221316, 114.84171780748302, 107.85682934365019, 102.05028906097073, 97.12251083101317] [4.510027823812622, 4.5098575600812385, 4.509687321577367, 4.509517108285959, 4.509346920217402, 4.509176824091769, 4.509006619684135, 4.50883650720463, 4.508666419920537, 4.508496357814772]
Epsilon values
[0.05372712063485988, 0.07773597369831031, 0.09645750759431188, 0.11240310981848231, 0.12655841098879092, 0.13943329329185075, 0.15133263525161025, 0.16245613111787732, 0.17294310302157573, 0.1828953771900762] [7.762158289561581, 7.762609283833006, 7.763060261079307, 7.7635112213030055, 7.763962164506623, 7.764413090692681, 7.764863999863702, 7.765314892022202, 7.765765767170702, 7.766216625311721]


Okay so now we can generate cache values for any RDP constant!

Now we need a mapping from:
{arbitrary RDP constant} (float) -> index of cache (integer)

<br>
<hr>
<br>

What could we try:

1. scaling by a constant factor of 10,000 

This means that a given index i in the cache corresponds to an epsilon value for an RDP constant of i/10,000: 

<table>
<th>
    <tr>
        <th>index</th>
        <th>rdp</th>
    </tr>
    <tr>
        <td> 0 </td>
        <td> 0 </td>
    </tr>
        <tr>
        <td> 1 </td>
        <td> 0.0001 </td>
    </tr>
        <tr>
        <td> 2 </td>
        <td> 0.0002 </td>
    </tr>
    <tr>
        <td> ... </td>
        <td> ... </td>
    </tr>
        <tr>
            <td> <i>i</i> </td>
            <td> <i>i/1000</i> </td>
    </tr>
</table>
    
So if I'm trying to check the epsilon spend for an RDP constant of 80, that would be an index of 80*10_000 = 800,000.

Extending the cache could be very slow at this point- imagine having to extend the cache to support an RDP constant query of 300; you'd have to perform (300*1000) searches
- I wonder if we could approximate a search? Like find a simple polynomial to fit it and see how the % difference changes as i -> infinity, and see if we could use that as a reasonable substitution for manually conducting polynomial searches at high RDP constants

2. get floating point bitwise representation, convert it to integer representation




In [67]:
# SCALING BY 10,000
1/10_000

0.0001

In [29]:
min_val = 0.0001
rdp_constants = np.arange(min_val, 1 + min_val, min_val)
print(len(rdp_constants), rdp_constants[0], rdp_constants[-1])

dsl = DSL_dummy()
alphas = []
epsilons = []

# This will also serve as our step size


for i in rdp_constants:
    alpha, eps = dsl._get_optimal_alpha_for_constant(i)
    alphas.append(alpha)
    epsilons.append(eps)

print(alphas[:10], alphas[-10:])
print("Epsilon values")
print(epsilons[:10], epsilons[-10:])

10000 0.0001 1.0
[286.6113345763701, 206.94946372761132, 171.03566817267512, 149.39888177623587, 134.5200190425928, 123.47110612221316, 114.84171780748302, 107.85682934365019, 102.05028906097073, 97.12251083101317] [4.510027823812622, 4.5098575600812385, 4.509687321577367, 4.509517108285959, 4.509346920217402, 4.509176824091769, 4.509006619684135, 4.50883650720463, 4.508666419920537, 4.508496357814772]
Epsilon values
[0.05372712063485988, 0.07773597369831031, 0.09645750759431188, 0.11240310981848231, 0.12655841098879092, 0.13943329329185075, 0.15133263525161025, 0.16245613111787732, 0.17294310302157573, 0.1828953771900762] [7.762158289561581, 7.762609283833006, 7.763060261079307, 7.7635112213030055, 7.763962164506623, 7.764413090692681, 7.764863999863702, 7.765314892022202, 7.765765767170702, 7.766216625311721]


In [33]:
min_val = 0.0001
rdp_constants = np.concatenate((np.arange(min_val, 50 + min_val, min_val),np.arange(51, 700_051)))
print(len(rdp_constants), rdp_constants[0], rdp_constants[-1])

1200000 0.0001 700050.0


In [38]:
rdp_constants[500_000]

50.00000000000001

In [39]:
from tqdm import tqdm
dsl = DSL_dummy()
epsilons = []

# This will also serve as our step size


for i in tqdm(rdp_constants):
    _, eps = dsl._get_optimal_alpha_for_constant(i)
    epsilons.append(eps)



  w = xb - ((xb - xc) * tmp2 - (xb - xa) * tmp1) / denom
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1200000/1200000 [04:41<00:00, 4265.25it/s]


In [40]:
epsilons[:5]

[0.05372712063485988,
 0.07773597369831031,
 0.09645750759431188,
 0.11240310981848231,
 0.12655841098879092]

In [41]:
epsilons[-5:]

[706259.3859365697,
 706260.3903782812,
 706261.3948199897,
 706262.3992616949,
 706263.403703397]

In [43]:
epsilons[convert_constants_to_indices(np.array([1]))[0]]

[ True]
[1]
sub50:  [10000]
gt50:  [0]


7.766667466447775

In [44]:
np.save('constant2epsilon_1200k.npy', epsilons) # save

In [82]:
lowest_index_with_epsilon_spend_of(25)

'epsilon spend >= 25 in index=6 with a value=25.495916596130975'

In [83]:
lowest_index_with_epsilon_spend_of(50)

'epsilon spend >= 50 in index=19 with a value=51.719404463000245'

Perhaps we could have 10,000 values between RDP_Constant from 0 to 50- resulting in 50 * 10,000 = 500,000 values

Then we could use 700,000 values to support RDP_constants from 50 to 750,000?

So our cache would look like:


<table>
<th>
    <tr>
        <th>index</th>
        <th>rdp</th>
    </tr>
    <tr>
        <td> 0 </td>
        <td> 0.0001 </td>
    </tr>
    <tr>
        <td> 1 </td>
        <td> 0.0002 </td>
    </tr>
    <tr>
        <td> 2 </td>
        <td> 0.0003 </td>
    </tr>
    <tr>
        <td> ... </td>
        <td> ... </td>
    </tr>
    <tr>
        <td> <i>rdp_constant*10_000 - 1</i> </td>
        <td> <i>rdp_constant less than 50 </i> </td>
    </tr>
    <tr>
        <td> ... </td>
        <td> ... </td>
    </tr>
    <tr>
        <td> 500,000 </td>
        <td> 50 </td>
    </tr>
    <tr>
        <td> 500,001 </td>
        <td> 51 </td>
    </tr>
    <tr>
        <td> 500,002 </td>
        <td> 52 </td>
    </tr>
    <tr>
        <td> rdp_constant - 50 + 500,000 </td>
        <td> rdp_constant greater than 50 </td>
    </tr>
    <tr>
        <td> ... </td>
        <td> ... </td>
    </tr>
        <tr>
        <td> 1,200,000 </td>
        <td> 700,050 </td>
    </tr>
</table>

This would let us support a minimum PB query of 0.05, 

In [84]:
current_cache[19:29]

array([51.71940446, 53.52268593, 55.30736617, 57.07468954, 58.82576726,
       60.56159652, 62.28307625, 63.9910202 , 65.6861678 , 67.36919338])

In [87]:
dsl._get_optimal_alpha_for_constant(20)

(1.8130347005616798, 51.719404463000245)

In [24]:
def convert_constants_to_indices(rdp_constant_array: np.ndarray) -> np.ndarray:
    """
    Given an array of RDP Constants, this will return an array of the same size/shape telling you which indices in the DataSubjectLedger's cache you need to query.
    
    This currently assumes the cache generated on May 4th 2022, where there are 1.2M values in total.
    - 500,000 of these correspond to RDP constants between 0 and 50 (10,000 between any two consecutive integers)
    - 700,000 of these correspond to RDP constants between 50 and 700,050    
    
    An easy way to check if you're using the right cache is that the very first value in the cache should be 0.05372712063485988
    
    MAKE SURE THERE ARE NO ZEROS IN THE CACHE!!
    """
    # Find indices for all RDP constants <= 50
    sub50_mask = rdp_constant_array <= 50
    sub50_indices = (((rdp_constant_array - 1) * sub50_mask) * 10_000).astype(int)
    
    # Find indices for all RDP constants > 50
    gt50_mask = rdp_constant_array > 50
    gt50_indices = ((rdp_constant_array - 51 + 500_000) * gt50_mask ).astype(int)
    
    # We should be able to do a straight addition because 
    return sub50_indices + gt50_indices
    
    

def get_epsilon_spent(rdp_constant_array: np.ndarray, cache: np.ndarray) -> np.ndarray:
    indices = convert_constants_to_indices(rdp_constant_array)
    epsilon_spent = cache.take(indices)
    return epsilon_spent

In [25]:
mock_constants = np.random.randint(low=1, high=100, size=(10))
print("Our mock constants are")
print(mock_constants)

idx = convert_constants_to_indices(mock_constants)
print("These were turned into indices of:")
print(idx)

Our mock constants are
[71 41 19 64  9 51 22 87 10 61]
[False  True  True False  True False  True False  True False]
[ 0 41 19  0  9  0 22  0 10  0]
sub50:  [     0 410000 190000      0  90000      0 220000      0 100000      0]
gt50:  [500021      0      0 500014      0 500001      0 500037      0 500011]
These were turned into indices of:
[500021 410000 190000 500014  90000 500001 220000 500037 100000 500011]


# TODO:
- Make sure 0 is not in the cached array!!! otherwise infinite data leaks