Transparency of the project #113

bleichenbacher-daniel · 2024-04-08T08:41:55Z

Could we please have more transparency what is happening with this project to avoid unnecessary and parallel work.

At this point I have rewritten a significant part of the test vector generation for Wycheproof, just so that I can continue with the project. Without some coordination even more time might be wasted, which could be used for solving new issues instead. At this point I don't know who is interested in the project and what potentially outstanding issues are.

For example it would be helpful to have just a list of cryptographic primitives and their status (i.e. whether they are supported, whether test vectors have been tested or which party might be interested to use them)

FiloSottile · 2024-04-08T09:26:06Z

We're still ramping up a maintenance framework for the project, so I think what comes across as a lack of transparency is really work in progress. For context, until end of March we worked on getting the handoff finalized, and we're still working on getting the generators published, for example.

What I can tell you is that most if not every major cryptographic implementation is interested in consuming the vectors, and some are interested in contributing. I have not heard requests for specific primitives besides ML-KEM and ML-DSA, for which we have some in-progress contributions. There is definitely interest in defining a new reusable format for test vectors, although I don't expect that to be something that works out in the span of a few weeks.

We had a session at OSCW to talk about what implementers want from test vector libraries. I am not sure it's super easy to follow in video, since there's a lot of audience participation, but you might be interested in the recording https://archive.org/details/oscw-2024-fillippo-valsorda-cryptographic-test-vectors or I can send you a transcript.

There is consensus around making the Wycheproof project not just a source of test vectors, but a repository where different sources/people/projects can pool vectors, so that downstreams can use them all at once. We'll work over the next few weeks to make it easier to contribute vectors and to consume them.

You're very welcome to send any new vectors. If you're worried about duplicating work, maybe open an issue to announce what you are working on, and then close it with the PR that submits those vectors?

Note that since we intend to accept vectors from multiple sources, we can't rely on regenerating them all when changing formats or adding new ones, but we will have to port the old ones, and iteratively add new ones.

bleichenbacher-daniel · 2024-04-08T15:31:22Z

Maybe I wasn't clear enough. When I left Google a year ago two managers independently asked me if I'm willing to continue the project. I also received some non-committal promises that I might get access to my generator code. Hence I've continued working on the project. Now we have two parallel projects. This is obviously not ideal. Hence it would make sense to have a meeting to clear things out. What worries me most is that I have worked on the project a over a decade. Hence I don't want to lose the project a second time.

Thanks for the link to the video. I have a few comments:

There are already tests comparing the test vectors to the JSON schemas. The JSON schemas and the documentation of the test vector formats are generated from the same source, so that they would not fall out of sync. I don't know how to set this up on github however.
You talked about test vectors with intermediate values. In most cases these should be relatively easy to generate. Another option would be code that guesses the location where an error occurred. Most of the code I had there were colabs. If this kind of stuff is of interest then maybe it would be possible to recover these colabs (or rewrite them. They are probably less than 1000 lines of code)
You talked about testing the test vectors. One issue that needs to be discussed are test vectors with unclear states. An example are test vectors with modified private key. Here it is unclear if a crash with a modified private key is a vulnerability or not, since in most cases users modifying their own key means that they are just shooting themselves into the foot. However if private keys can be uploaded to an HSM, then crashes do matter. For such tests it is important to have a way to gain consensus whether libraries need strong private key validations or not. A big question here is how to decide what checks a library should perform when importing keys, or performing similar functions that are difficult to attack. I have generated a relatively large number of faulty keys. They have not been published exactly, since I don't know the expectations.
Test vector format: if we want to change the format of the test vectors then it would make sense to tackle this now before making big announcements.
Data structures for various languages: I think it should be possible to generate the data structure from the same source that generates the JSON schema and documentation.

FiloSottile · 2024-04-08T15:46:35Z

I also received some non-committal promises that I might get access to my generator code.

I'm trying to enable that!

Hence it would make sense to have a meeting to clear things out.

Sure! I still don't have an email address for you, but you can reach out at hi@filippo.io and we can set up a call.

I want to be upfront: the goal of C2SP, my own intention, and the community's interest is in growing Wycheproof into a repository for (properly attributed!) test vectors from multiple sources. I think what you worked on can fit perfectly, but I want to be clear it's not the same single-source design of Wycheproof-at-Google.

There are already tests comparing the test vectors to the JSON schemas. The JSON schemas and the documentation of the test vector formats are generated from the same source, so that they would not fall out of sync. I don't know how to set this up on github however.

Happy to do the GitHub Actions setup.

I'm not sure I see the autogenerated documentation, where is it?

You talked about test vectors with intermediate values. In most cases these should be relatively easy to generate. Another option would be code that guesses the location where an error occurred. Most of the code I had there were colabs. If this kind of stuff is of interest then maybe it would be possible to recover these colabs (or rewrite them. They are probably less than 1000 lines of code)

Intermediate values are useful while developing an implementation, so I am not sure they make sense in the same format/place as the rest, but they would definitely be useful. Maybe they fit in the more "free-form" part of the repository we talked about at OSCW.

You talked about testing the test vectors. One issue that needs to be discussed are test vectors with unclear states.

When I talk about testing the test vector I just mean making sure they were generated correctly given their intention, so we can just write implementations that pass/fail based on the "acceptable" state.

Here it is unclear if a crash with a modified private key is a vulnerability or not, since in most cases users modifying their own key means that they are just shooting themselves into the foot.

Heh, this is a whole topic among implementers and different libraries take different views. I think it would make sense to have them, maybe with a specific flag/in specific files, to let libraries decide if they fit the threat model.

Test vector format: if we want to change the format of the test vectors then it would make sense to tackle this now before making big announcements.

I would rather take our time to gather community feedback on the new format. For now, I want to get us set up with refreshed docs, and the tooling to smoothly add and consume vectors in the current v1 format.

Data structures for various languages: I think it should be possible to generate the data structure from the same source that generates the JSON schema and documentation.

Generating them is indeed easy, but knowing what the right data structure is requires language-specific knowledge that I don't have across all languages. For now I think just making the JSON available is a good first step.

bleichenbacher-daniel · 2024-04-08T17:39:27Z

The auto generated file I talked about is
https://github.com/C2SP/wycheproof/blob/master/doc/types.md
Unfortunately, this is an old version, which is sad because the main goal was to generate doc and schemas from the same source, then test the schemas against the test vectors, which should ensure that the documentation reflects the test vector files. Of course if that gets out of sync, then nothing is gained.

Yes, feedback would be nice. One thing I could do is generate some sample test vector files with a new format, just for discussion. An issue about the about the test vector format is open #106. So far there are no comments yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transparency of the project #113

Transparency of the project #113

bleichenbacher-daniel commented Apr 8, 2024

FiloSottile commented Apr 8, 2024

bleichenbacher-daniel commented Apr 8, 2024

FiloSottile commented Apr 8, 2024

bleichenbacher-daniel commented Apr 8, 2024 •

edited

Transparency of the project #113

Transparency of the project #113

Comments

bleichenbacher-daniel commented Apr 8, 2024

FiloSottile commented Apr 8, 2024

bleichenbacher-daniel commented Apr 8, 2024

FiloSottile commented Apr 8, 2024

bleichenbacher-daniel commented Apr 8, 2024 • edited

bleichenbacher-daniel commented Apr 8, 2024 •

edited