Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Client and Server items are in some certain sizes, server.CreateSetupMessage raise ValueError: basic_string::resize #143

Closed
chesterxgchen opened this issue Jan 5, 2023 · 13 comments · Fixed by #151
Labels
Type: Bug 🐛 Some functionality not working in the codebase as intended

Comments

@chesterxgchen
Copy link

chesterxgchen commented Jan 5, 2023

Description

When client items size and server items size are in certain values, the PSI code throw basic_string::resize Value Error.
For example, using the tests.py https://github.com/OpenMined/PSI/blob/master/private_set_intersection/python/tests.py#L28

if we change the client items and server items range into the following values:
client_items = ["Element " + str(i) for i in range(333334)]
server_items = ["Element " + str(2 * i) for i in range(397)]

You should be able to get the exception

File ~/nvflare-env2/lib/python3.8/site-packages/openmined_psi/__init__.py:140, in server.CreateSetupMessage(self, fpr, num_client_inputs, inputs)
    129 def CreateSetupMessage(
    130     self, fpr: float, num_client_inputs: int, inputs: List[str]
    131 ) -> ServerSetup:
    132     """Create a setup message from the server's dataset to be sent to the client.
    133     Args:
    134         fpr: the probability that any query of size `num_client_inputs` will result in a false positive.
   (...)
    138         A Protobuf with the setup message.
    139     """
--> 140     interm_msg = self.data.CreateSetupMessage(fpr, num_client_inputs, inputs).save()
    141     msg = ServerSetup()
    142     msg.ParseFromString(interm_msg)

ValueError: basic_string::resize

How to Reproduce

  1. check the code in https://github.com/chesterxgchen/psi_mpc_related/blob/main/psi/psi_intersect.ipynb
    command Cell No. 5 should re-produce the error.
    The code in this notebook is essentially the same as https://github.com/OpenMined/PSI/blob/master/private_set_intersection/python/tests.py#L28

  2. pip install the openmined.psi

  3. start the jupyter notebook with https://github.com/chesterxgchen/psi_mpc_related/blob/main/psi/psi_intersect.ipynb

  4. execute the import the psi packages, and def dup() function, then run cmd 5, to see error

Expected Behavior

No error

Screenshots

image

System Information

  • OS: Ubuntu
  • OS Version: 20.04
  • Openmined PSI version: openmined.psi-0.3.5 ( directly downloaded from Pipy)
  • Language Version: python 3.8.16 (default, Dec 7 2022, 01:12:06)
  • Package Manager: Version pip 22.3.1
  • Browser (if applicable): NA
  • Browser Version (if applicable): NA

Additional Context

We experience this issue while testing our openmined-based PSI solution

@chesterxgchen chesterxgchen added the Type: Bug 🐛 Some functionality not working in the codebase as intended label Jan 5, 2023
@chesterxgchen chesterxgchen changed the title when PSI Client and Serve items in some certain size, CreateSetupMessage raise ValueError: basic_string::resize When PSI Client and Server items are in some certain sizes, server.CreateSetupMessage raise ValueError: basic_string::resize Jan 5, 2023
@chesterxgchen chesterxgchen changed the title When PSI Client and Server items are in some certain sizes, server.CreateSetupMessage raise ValueError: basic_string::resize When Client and Server items are in some certain sizes, server.CreateSetupMessage raise ValueError: basic_string::resize Jan 5, 2023
@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 5, 2023

This is a bug with the GCS data structure and having an extremely low fpr. The data structure enum (GCS, BloomFilter) was added much later and was never exposed as a configurable parameter in Python's binding for CreateSetupMessage. We will publish a new release to allow you to specify BloomFilter which temporarily solves the underlying problem.

@chesterxgchen
Copy link
Author

great thanks

@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 6, 2023

This has been fixed in https://github.com/OpenMined/PSI/releases/tag/v1.0.1

@s0l0ist s0l0ist closed this as completed Jan 6, 2023
@chesterxgchen
Copy link
Author

@s0l0ist Nick, seem when the fpr is small 1e-11, it still fails. I updated the notebook:
https://github.com/chesterxgchen/psi_mpc_related/blob/main/psi/psi_intersect.ipynb
In cell 10, you will get
ValueError: basic_string::_M_replace_aux

But I did try with larger fpr=1e-9, the code seems to work

@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 9, 2023

Yep, looks like smaller false positive rate causes the backing GCS datastructure to try and allocate too much memory. This is 'fixed' in the sense that you must use the psi.DataStructure.BLOOM_FILTER enum and pass it into CreateSetupMessage as the last argument (overriding the default GCS option when not specified).

So I admit, the bug still exists in GCS, but the workaround is to use bloom filters.

setup = dup(
        duplicate, s.CreateSetupMessage(fpr, len(client_items), server_items, psi.DataStructure.BLOOM_FILTER), psi.ServerSetup()
)

@chesterxgchen
Copy link
Author

Let me try this. I don't mind extra argument parameter.

@chesterxgchen
Copy link
Author

This works:

setup = dup(
        duplicate, s.CreateSetupMessage(fpr, len(client_items), server_items, psi.DataStructure.BLOOM_FILTER), psi.ServerSetup())

@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 9, 2023

Ah yes, I put the param outside of the function. Oops.

@chesterxgchen
Copy link
Author

@s0l0ist Nick,
Unfortunately, the work around doesn't work. adding extra filter fixed the exception. But now the resulting result is incorrect in some cases. For example:

https://github.com/chesterxgchen/psi_mpc_related/blob/main/psi/psi_intersect.ipynb

Compare Cell 11 and 14. Cell 11 is with extra argument, Cell 14 is not.

In Cell 11, the resulting intersect is 19 items which is larger than the Server Items of 17. Noticed that few larger numbers is much larger than max number of the item from server.

Cell 14 is without extra argument, the result is correct.

@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 10, 2023

Looking at the example code, it looks like the client has many more values than the server. This will yield much higher chances of receiving a false positive value in return and isn't the intended way of how to use this library regardless of the backing data structure used.

The protocol is asymmetric in the sense that both client and server sets can be different sizes, but the real use case is for when the client has much fewer items in its set and wants to check against a server with many more items in its set.

If you flip the client and server set sizes, you should get more reasonable results in expectation of the FPR you've set.

@chesterxgchen
Copy link
Author

chesterxgchen commented Jan 11, 2023

Nick,
This is a different use case. In Federated Learning (FL) case for example, we have two sites participating to the FL study.
They need to find out the intersect of the users (user id match) before they can do the training. Different from the normal social contact use cases for PSI, we need let both sites know the intersection, not just one site.
see site-1 has 1,000 user_ids
site-2 has 10,000,000 user ids
We can treat site-1 as PSI client, site-2 as PSI Server ==> find the intersect, assume the intersection size = 100
at this point, the intersection is only known to site-1. Site-2 doesn't know it.
so we have to reverse the process, make site-1 as PSI server, and site-2 as PSI Client , and take the intersection as the site-1's items ==> find the intersect, which gives site-2 the intersections. In the reverse process, site-2 items = 10,000,000, site-1 items = 100.

All the examples I showed in the notebook ( client and server items) are not arbitrarily chosen, they are actually from test failure cases, which I simplified to put notebook

@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 14, 2023

I see, yeah this is a bug. I'll investigate.

@s0l0ist
Copy link
Contributor

s0l0ist commented Jan 14, 2023

There was a bug that produced more erroneous results than normal; however, it is quite possible for false positives to manifest themselves in the form of intersection index values that are out of range of the original client set. Checkout the my comments in the PR to handle those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug 🐛 Some functionality not working in the codebase as intended
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants