Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help create Linnia IRIS (InfoRmation Integrity Score) score algorithm #33

Open
godfreyhobbs opened this issue Jun 1, 2018 · 19 comments

Comments

@godfreyhobbs
Copy link
Contributor

godfreyhobbs commented Jun 1, 2018

Context

Linnia is a core component of the future of the web; Web 3.0. Linnia is a new Ethereum Blockchain protocol that brings the power of decentralization to your lifetime data. The Linnia protocol provides the foundation for secure decentralized applications in multiple spheres including the sphere of electronic healthcare records.

What

We would like to incentivize you, as a member of the Gitcoin/Bounties family, to innovate and help create the Linnia IRIS score algorithm. The IRIS stands for InfoRmation Integrity Score. The Linnia IRIS score is a critical part of the linnia protocol. We will be awarding the submission for this task a .7 ETH bounty, assuming the below requirements are met.
The proposed Linnia IRIS score algorithm must be in the spirit of the following two Linnia papers;

  1. https://github.com/ConsenSys/linnia-resources/blob/master/Technical-Whitepaper.md
  2. https://github.com/ConsenSys/linnia-resources/blob/master/Introducing%20Linnia.pdf

In particular, the following section describes the IRIS score:

  1. https://github.com/ConsenSys/linnia-resources/blob/master/Technical-Whitepaper.md#6-quality-scoring-with-iris

Note: Linnia is a WORK IN PROGRESS. The Linnia smart contracts are only a small subset of what is described in these papers.

Our Ideas

  1. IRIS score should increase as the number of attestations increase
  2. IRIS score may consider the following:
    1. Attestations
    2. Provenance
    3. Metadata including keywords
    4. User Roles
  3. IRIS score of zero indicates junk or garbage data
  4. IRIS score must reflect a real-world value

Requirements

  1. You have read and understood the WIP nature expressed in the README.md
  2. It must be impossible to game the IRIS score algorithm
  3. IRIS score algorithm must work end-to-end
  4. Must NOT be limited to the medical
  5. Your ideas must be submitted as Documentation and Code in a fork of this repo
  6. Consider the use of encryption to keep data secure. The computation must not leak or reveal any information related to the nature of the underlying encrypted data
  7. Provide two or three real-world non-trivial end-user facing use cases for your IRIS score algorithm
  8. Be prepared to present your submission to the Linnia team and the Linnia community
  9. Be prepared for a review of your code by the Linnia team and the Linnia community
  10. Finally, use your imagination
@lookfwd
Copy link

lookfwd commented Jun 1, 2018

Nice!

@godfreyhobbs godfreyhobbs changed the title Help create Linnia IRIS (InfoRmation Integrity Score) score algorithm draft -- Help create Linnia IRIS (InfoRmation Integrity Score) score algorithm Jun 1, 2018
@godfreyhobbs godfreyhobbs changed the title draft -- Help create Linnia IRIS (InfoRmation Integrity Score) score algorithm Help create Linnia IRIS (InfoRmation Integrity Score) score algorithm Jun 4, 2018
@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 0.7 ETH (419.71 USD @ $599.58/ETH) attached to it.

@timxor
Copy link

timxor commented Jun 22, 2018

Only a 2 week window for this — closed already?

@godfreyhobbs
Copy link
Contributor Author

16 days left

@satoshi101
Copy link

I can see that for x = IRISmax and y = IRISmax, f = IRISmax/3 which seems counterintuitive. It's also quite counterintuitive that own user's attestation counts (that) negatively. Max for f is when y = IRISmax and x=0, In which case f = IRISmax. Again, a little bit counterintuitive that in order to maximize my IRIS score, I should have 0, IRIS(user_x_appreciation_of_data). To make f zero, you have to have x= IRISmax (I guess that's a malicious submission) and y = 0, i.e. the crowd believe it's useless. This makes sense.

@uivlis
Copy link

uivlis commented Jul 6, 2018

So I made a couple mistakes here. In the formulas above, instead of:

IRIS(attestation_for_data_of_user_y) = { IRIS(user_x_appreciation_for_data_of_user_y), if no IRIS(other_users_appreciation_of_data_of_user_y) ; f(median(IRIS(other_users_appreciation_for_data_of_user_y)), IRIS(user_x_appreciation_for_data_of_user_y)), otherwise}

should be:

IRIS(attestation_for_data_of_user_y) = { IRIS(user_y_appreciation_for_data_of_user_y), if no IRIS(other_users_appreciation_of_data_of_user_y) ;
f(IRIS(user_y_appreciation_for_data_of_user_y), median(IRIS(other_users_appreciation_for_data_of_user_y))), otherwise}

The idea behind f(x, y) = (-2x + y + IRISmax) /3 is as it goes:
The problem of the value of information (carate of diamonds, ...) is a dual issue, like a Cartesian plane, where the Vertical axis representing Imaginary numbers is your imagination or your opinion of the value, and the Horizontal axis representing Real numbers is the reality check or other's opinion of the value of info.

Theoretically, the most valuable information, or person, or diamond, or whatever is that which considers themselves not of great importance, but others see great potential. The self-approximation is more important since the Real axis might not exist from the beginning.

You are by no means incentivized neither to super grade your data, neither to ultra degrade it. But you are incentivized to be careful in making your decision, which may result in lower self-adjustment, yet higher other-adjustment. This is because your IRIS score is first given by your own measurement => you would like higher self-grade. You would also like higher self-grade since you will be graded faster by others who see a great IRIS next to your profile because they also get that same IRIS when there is no other attestation than theirs.

@satoshi101
Copy link

satoshi101 commented Jul 8, 2018

Theoretically, the most valuable information, or person, or diamond, or whatever is that which considers themselves not of great importance, but others see great potential.

"Theoretically" based on which theory? "the most valuable [...] is that which considers themselves not of great importance" - I don't quite see that. I could see why something of great importance might be obvious to the owner as such. "others see great potential" - or they might not... this is "wisdom of the crowd" VS "wisdom of experts". It's opinion - not something we can easily conclude or verify. Except if there's some obvious theory I'm not aware of.

This is because your IRIS score is first given by your own measurement => you would like higher self-grade. You would also like higher self-grade since you will be graded faster by others who see a great IRIS next to your profile because they also get that same IRIS when there is no other attestation than theirs.

I can see some game-theoretic argument here, which uses, though, mechanics that I don't right now see anywhere defined. More specifically:

  • "see a great IRIS next to your profile" - I'm not aware of any IRIS score for profiles. It now belongs to datapoints. You might need to define it.
  • "they also get that same IRIS when there is no other attestation than theirs" - You describe some back-propagation of IRIS between profiles as far as I can see. I haven't seen this mechanism described anywhere. You might need to define it.

I don't get those fine details of this method. When I put whatever I understand into code, this is what I see:

output_2_0

You can find the code here and in the same directory there's the juypter notebook with the analysis. I used the original formula with the -2 * IRISmax because with just -IRISmax doesn't normalize properly in the 0-10 range.

Overall any such approach seems to me to have an optimal vote for the owner of the data. E.g. it's always optimum for them to vote 7 or 8 (right now the optimal seems to be 0... but whatever). If they do so, then given arbitrary evaluation, they won't damage their reputation, but they won't damage their datapoint starting odds either. Both in the simulation and overall, we're overlooking the fact that nobody would buy a data-point with self-rating e.g. '2'... even if it's really trash and the median would reward it with actually... rating it as trash.

Assuming there's an optimum voting strategy for the owner (e.g. vote 8), the rest of the formula is just median. That's fine - but isn't immune to all real-life threats, e.g. lobbies that vote certain data-points and/or anti-vote other people or datapoints.

What Linia seems to need is Google's page-rank. There are four problems though:

a) Google's page-rank is too expensive to implement on blockchain
b) Google's page-rank doesn't work in reality
c) Google's page-rank with all the Google hacks doesn't work in reality in a trustless environment
d) We don't even know if a single IRIS score can be defined

On a) - yes - page-rank should be a native ethereum operation along with other ranking mechanism. They will get there one day... but not soon.
On b) link farms etc. make google page rank not work in practice. The way Google ranks is an ongoing process that keeps google's results of acceptable quality. It has 1000's of features for each page + quite a few of them are user-provided e.g. average time before you return to google. i.e. stuff that is quite more than just page-rank
On c) Google in stone-age started with .edu and other domains having high seed rank before doing random surfing with teleportation. i.e. it wasn't trustless, but exactly the opposite. More recently, Google does tons of work of testing and manually adjusting ranking parameters to ensure safe and relevant results. This is by operators who are Google employees. It's not algorithmic. I believe that if there was an algorithmic solution to the problem, Google wouldn't employ people to adjust the algo.
On d) - Google now has a huge feature vector for each one of us and matches stuff according to how good they are for who we appear to be. There's no single "rank" for a page anymore. In Linia's context, I can see different types of researchers to have different data needs.

Good book on the subject:

image

https://www.amazon.com/Whos-1-Science-Rating-Ranking/dp/069116231X/

The cover looks silly but there's quite a bit of good mathematics in there, and it's very well written.

@uivlis
Copy link

uivlis commented Jul 9, 2018

Ok, I wrote some very stupid comments and then I deleted them.

I think you may be right.
I'll have to come up with something different. I'll think.

@godfreyhobbs
Copy link
Contributor Author

@uivlis @satoshi101 Thanks for the thoughtful discussion. It is really awesome.

Yes, it may be best to have a set of domain-specific of IRIS scores.

It may be a useful exercise to pick a specific real-world use case and walk through how a domain-specific IRIS score would work.

@satoshi101
Copy link

@uivlis

Say I have a precious gem, the most precious in the world. Would I knock on everyone's door saying look upon my precious gem? Certainly not, for I would get stolen, even if they acknowledge that it is precious. However, if somebody saw my precious gem that I keep it hidden, they would require me of them seeing it a bit more. And they would desire it, even because I keep it hidden.

Implicitly we might see traces of a "marketplace" here. The idea that IRIS is the "fair price" i.e. a value that if someone pays, they can own that datapoint and then future cashflows will go to them sounds interesting. Effectively, every datapoint can be a non-fungible token with it's own track history of exchanges at different price-points. Far reached, but it could work. If one can't "steal" the gem and own future cashflows or future appreciation, then it doesn't matter if you show it or hide it. :) But if ownership of the data changes, then it's a whole different story.

@godfreyhobbs - I think that you mentioned in the past, the idea of "IRIS score" providers that have different levels of trust or credibility? - Which is not that decentralized of course, but it might be realistic. Something like "credit rating agencies".

Overall, this problem - for me - is way to complex for a bounty, let alone one with such broad and strict-looking requirements. I would be very surprised if a "final solution" was found here.

@uivlis
Copy link

uivlis commented Jul 11, 2018

Just to mention that the vision described by @satoshi101, derived from my aphorisms, of Linnia being a marketplace of data implies that there is no IRIS beforehand, but rather that "the marketplace" is the IRIS and decides it.

@satoshi101
Copy link

implies that there is no IRIS beforehand, but rather that "the marketplace" is the IRIS and decides it

Correct. It's quite challenging to setup a functional marketplace though. At first order, all you seem to need is an ASK and potentially a BID IRIS in IRIStokens. Then you need to have a record of at least the last transaction (if any). Ownership of the datapoint should be transferred, whenever ask/bid prices cross, to the new owner.

Specifically in the case of Linnia, a data point might be leased to a user for single use. This has a price set by the owner. The owner has incentive to set a "fair" price compared to the alternatives so that people lease the datapoints. Otherwise competitors will be leased more often than others. If someone plans to lease a datapoint many times, or if they believe that the datapoint is valuable and likely the market will appreciate that, they will want to buy instead of lease it. For the first transaction, we expect that someone to lease a datapoint at least once before buying it. Otherwise they will trust the original owner and the quality of the data they provide. Someone might decide to buy the data without seeing them, but we expect most people to want to first see/use the data, before they decide to own them. Once someone owns a datapoint, they might decide to amend the current lease price. The owner is the receiver of IRIStokens from data leases.

Note that the original issuer of the data, loses control of those data, permanently, as soon as the first transfer of ownership happens. i.e. they can't reclaim or offline the data. There might be a provision to be able to offline the data if you buy them back (potentially at some other IRIStoken price point) and you're the original owner. Generally we don't expect data and their history to go away.

This is the basic fabric of a marketplace based on datapoints as non-fungible tokens (à la kryptokitties) + a liquid crypto-token IRIStoken. It seems to me like those rules could be coded on a smart contract. I'm not sure how efficient/scalable an implementation could be. The non-fungible token standard ERC-721 , could be a starting point in terms of interface. One thing that is a bit worrying is that many of those "tokens" might be trash or spam tokens. Someone or a farm might create millions of fake datapoints to collect the first lease. Another reason people might create or even trade fake datapoints (in order to increase their apparent value) would be to influence research by injecting tons of fake data that represent for example themselves or some distribution of choice. For example, people might try to associate white caucasians with high blood pressure by forging data. This means that we still have the original problem where normal attestations from doctors or issuers comes from (along with the associated centralization). Meaning - that for example a blood sample datapoint must be entered to the system or attested by an accredited healthcare provider. Could a healthcare provider collude with real or manufactured patients to manipulate data? Highly likely but it might be too much trouble. On the other hand, incidents like these ["Fake Clinics Outnumber Abortion Providers 10 to 1 in Texas"] mean that it's not impossible that medical legal entities collude to push an agenda. Another issue is that if the ratio of market participants to datapoints is, for example, 1:1,000,000, it will be really hard for this marketplace to retain efficient market assumptions. There might be points in time during the development of the system where those conditions hold true. At that point trash and spam datapoints might easily overwhelm market participants who won't have the time nor money to evaluate all those datapoints. All those issues seem to make such a system lean towards measures for evaluating market participants too, e.g. banning bad actors. This would be damaging to decentralization, and could substantially increase the complexity of implementation, but this might be inevitable.

It's quite challenging to setup a functional marketplace.

@godfreyhobbs
Copy link
Contributor Author

Currently, with this and other bounties, we are trying not to start with a given solution but allow the Linnia Community the freedom to come up with new solutions that we have not already considered.

@satoshi101 Thanks for your insights. Linnia will primarily be about empowering individuals so data will be leased not purchased outright. In this way, the individual will always be able to revoke access. The revoke action may happen manually or be triggered by a policy (#35).

That said, consortiums may form and act as proxies for a large number of users. Again, the concept of policy-based sharing premissions become critical (bounty Issue #35 ). Consortiums could be either of the following:

  1. A trusted centralized actors
  2. A DOA type decentralized organization

@satoshi101
Copy link

satoshi101 commented Jul 13, 2018

I don't understand exactly what the above means.

I can understand an element of "don't sell" and there should be the right to revoke. This makes even more difficult to create an efficient marketplace with IRIS score as "value unit". A marketplace might be an option out of the corner of the 'carat' corner 1 and all those "specific model for each domain" requirements that doesn't take any advantage of blockchain and it's probably just a matter of 5-10 years of consortium work with paid experts from each domain you are targeting.

Actually in the context of the whitepaper, I don't even see what's the requirement out of this bounty.

There's also some contradictions, in the whitepaper:

  • "all data that is pushed into the system will have their IRIS score calculated in the same way"
  • "Every time a user pushes data through Linnia, its IRIS score will go up proportionally to the value of its data"
  • "that data will have a good provenance score"
  • "users can't gain much from sharing fake data as the provenance source is low (and so will be the IRIS)"

i.e.

  • "provenance score" VS "IRIS score"
  • "IRIS score for user" or "IRIS score for data" or both

On the code, records have IRIS and users have provenance.

@godfreyhobbs
Copy link
Contributor Author

@satoshi101
consortium as part of a data marketplace
When I mentioned consortium I was thinking about something like the data-labour union mentioned in this article. The consortium may help make the Linnia marketplace fair and efficient.

IRIS score.
It is likely that the consortium or data-labour union would not play a role in the IRIS score algorithm.

@gitcoinbot
Copy link

@uivlis Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • warning (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@gitcoinbot
Copy link

@godfreyhobbs is this one good to pay out to anyone or should I cancel the bounty?

@godfreyhobbs
Copy link
Contributor Author

godfreyhobbs commented Oct 18, 2018

@uivlis @satoshi101 @tcsiwula we have created an interface to clarify how IRIS fits into the linnia protocol.

Here is the update with some tests:
https://github.com/ConsenSys/Linnia-Smart-Contracts/pull/101/files

@godfreyhobbs
Copy link
Contributor Author

@satoshi101 You comment was very helpful:
There's no single "rank" for a page anymore. In Linia's context, I can see different types of researchers to have different data needs.

I have introduced the following mapping to the linniaRecords.sol.

mapping (address => uint256) irisProvidersReports;

source

The irisProvidersReports will allow many different providers to each use their own algorithms. This means that there is no single "rank". Instead, each researche can choose a different set of weights for each irisProvidersReports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants