identify more robust output format for data synthesis #29

cartazio · 2023-06-29T19:01:15Z

the first version uses JSON, which can often be malformed, and theres no good error recovery in that case, need to identify and switch to a more "error tolerant" self aligning format. (meaning we can skip a bad pair and recover useful outputs)

cartazio · 2023-06-29T21:24:36Z

i've a few ideas for this i'll try.

mmirman · 2023-06-30T17:00:30Z

XML is the key. Its really the only key. Foundation LLMs have a lot of XML in their outputs so are super-primed to output it. Also you can scrub inputs by ensuring the tags are unlikely to be guessed.

Also this is what LMQL is for!

cartazio · 2023-06-30T17:15:11Z

Good points. I’ll dig into this direction

mmirman · 2023-06-30T17:20:42Z

I do want to add regex and lmql to our platform btwBest,Matt==================Dr. Matthew Mirmanhttps://www.mirman.comOn 30 Jun 2023, at 7:15 PM, Carter Tazio Schonwald ***@***.***> wrote: Good points. I’ll dig into this direction —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

cartazio · 2023-07-05T18:55:36Z

should be a bit more robust, we can revisit this later

This was referenced Jun 29, 2023

June July tasks #24

Closed

Data synthesis #3

Closed

cartazio mentioned this issue Jul 5, 2023

Feature/carter/more qol example gen reliability #32

Merged

cartazio closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identify more robust output format for data synthesis #29

identify more robust output format for data synthesis #29

cartazio commented Jun 29, 2023

cartazio commented Jun 29, 2023

mmirman commented Jun 30, 2023

cartazio commented Jun 30, 2023

mmirman commented Jun 30, 2023 via email

cartazio commented Jul 5, 2023

identify more robust output format for data synthesis #29

identify more robust output format for data synthesis #29

Comments

cartazio commented Jun 29, 2023

cartazio commented Jun 29, 2023

mmirman commented Jun 30, 2023

cartazio commented Jun 30, 2023

mmirman commented Jun 30, 2023 via email

cartazio commented Jul 5, 2023