Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add joint ZK data guilds post with Lily #8

Open
Divide-By-0 opened this issue Sep 24, 2023 · 3 comments
Open

Add joint ZK data guilds post with Lily #8

Divide-By-0 opened this issue Sep 24, 2023 · 3 comments

Comments

@Divide-By-0
Copy link
Owner

Divide-By-0 commented Sep 24, 2023

Draft for any signed in user: https://hackmd.io/CMnvMcmzR7CRXOxCLAi2LQ

An alternatives to traditional ad platforms.

Discussion:

FAQ (by me and lily):

clarify more how the alternative to platform attestations would work?

yeah so you could bootstrap off of either

  1. the ads that were clicked in the past, in which the server sends u a signature and only pays u once u prove that ur vector has transitioned via a zk proof on chain
  2. general proof carrying data of any kind, where pcds include any provable data such as those parsable by https://pcd.team/
  • is a given person limited to one group, or can they be in many?

either! if in many, when ads are served to them it would just randomly choose which group they'd want to prove membership in or something, maybe based on a probabilistic breakdown. you could also be in a "group" where everyone spends similar amounts to you so that you dont feel like you're carrying freeloaders. might make sense for it just to be whichever advertiser bids the most to serve ads to some group the user is in. but yeah the math seems to work out fine regardless!

  • is this more for existing platforms or as a model for a new platform? or agnostic?

seems like a thing i.e. bluesky or friend tech might be more into, but i doubt legacy platforms will adopt

  • did you have in mind a system where the platform pays out part of the ad money to the user, or is that an orthogonal concern to privacy

this is how we encourage users to use this system! you could bootstrap a network effect for a new platform in this way. (if so, I wonder why no one has successfully done that before, since the ZK part isn't necessary for the ad revenue sharing? maybe because they need users to get advertisers so it's a chicken and an egg?)

  • where does the ad vector come from? is it based on lookalike audiences?

it could be any vectors of features that is a n dimensional representation describing users, via facets that people care about! regular targeted ads either provide an explicit list of features they want, or just generate some high-dimensional vector similar to vectors of their existing customers.

if it's the latter, it looks like they would just cross-reference emails to see which of their customers are on the platform and then aggregate those users' vectors, and then target similar vectors. but if you were setting up a system like this with a new platform that hadn't previously done data collection/selling, how would you bootstrap the set of lookalike users? maybe just pay them for use of their email data, I guess?

the former would be easier, but then you'd either have to have the users explicitly provide demographic data (with more financial incentive to distort it) or infer it, in which case I guess you'd have to run the inference locally too and prove you did it honestly. I guess you could send a proof of aggregate data about ads you'd clicked in the past and then the advertiser looks for people who clicked on ads similar to their ad, so working off ad similarity rather than user similarity

how do you deal with bots?

Idea 1 which I like less is that you could combine it with proof of unique humanity. it's tough though because of the privacy nullifier tradeoff. Maybe proof of personhood i.e. zk proof of gitcoin passport score, can increase your dividends?

Idea 2 (the most promising) is that spending $ is an effective way (maybe the only simple way?) that bots can be effectively detected. If clickthru to purchase rate was the metric for payment ratio over vector similarity, then you can effectively make bot farms useless. Maybe people who are part of a guild where people actually clickthrough to purchase get an attestation that they did, and get a significantly larger chunk of the ad $ pool. A guild with 0% purchase rates (i.e. all bots) will get 0% of the revenue. so its advantageous to be part of a guild where folks are buying at a similar rate [and maybe you're even automatically made part of one] as you are, so simultaneously people can't freeload and bots are penalized?

New users can also be much cheaper cheap to serve, so then the market evens it out!

Idea 3 is that users want to see kore relevant ads actually as long as their data is theirs, and so they'll update honestly. Their zk proof is of the form, I know the preimage of the commitment to this id hash and vector commitment on chain. This ad vector was added or multiplied to it, and has a valid signature from adco x. Now this new vector commitment hashed with the same ID commitment is y, but on chain u only see the hash

how do you preserve user privacy while serving user-level relevant ads?

Neither the platform nor ad server knows the vector of the user. The advertiser can either get a single average aggregate vector, or a smoothed convex hull of the vectors, or mean and variance in each dimension, then based on that can give the most relevant set of ads or decline to. Those ads can be all the same or heterogeneous and distributed internally in the group to the most relevant party!

@lilyjordan
Copy link
Collaborator

Remaining questions:

1. Inputs and outputs to model
In the following sequence:
a. User keeps some info from platform server (logs, attestations, model output, etc)
b. User contributes proof to recursive SNARK
c. Guild submits completed snark to advertiser for payment
d. Advertiser sends guild set of ads
e. Guild tells platform which ads should go to which users

for each step, can you write out really specifically what type of data gets passed? especially the inputs and outputs to the model that the advertiser receives at step c?

particular points of confusion:

  • are click rate, purchase rate, and vector similarity (eg for vectors of personal characteristics) all direct substitutes for each other?
  • what exactly are click rate and/or purchase rate conditioned on? eg, rate of guild users who have clicked on similar ads in the past (and how is ad similarity measured)? or rate of guild users who have clicked on this advertiser's ads in the past? or could you even have a system where the guild is paid after the ad campaign is run, based on how many users click the ads?

2. How to preserve privacy when computing the SNARK
If you (a user) pass your vector into a recursive computation step, the rest of the guild (or at least the next user) can determine what your vector was, right? So how would we preserve anonymity here?

3. How the ad-serving step works
If the group is homogeneous enough to all be worth targeting with the same ad, then that defeats the point of privacy. So presumably they're heterogeneous and get served different ads. But in that case, how does the guild distribute ads among users without compromising their privacy? Eg, if you bid on an ad relevant to you, you're kind of doxxing yourself as being close to the target audience for that ad. Or if the guild assigns ads to users, how does it do that without knowing individual users' vectors?

@RiverRuby
Copy link

  1. How to preserve privacy when computing the SNARK
    If you (a user) pass your vector into a recursive computation step, the rest of the guild (or at least the next user) can determine what your vector was, right? So how would we preserve anonymity here?

If everyone in the guild can't see every update, then intuitively I don't think this is true. Here's a potential scheme:

Initialize with some random noise, and then people add their data one by one, or the vector makes multiple random rounds of people where people add different subsets of their recent data. Then everyone will only see the aggregate vector so far, and won't know what it looked before or even who came before them in line?

@RiverRuby
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants