Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identify more robust output format for data synthesis #29

Closed
cartazio opened this issue Jun 29, 2023 · 5 comments
Closed

identify more robust output format for data synthesis #29

cartazio opened this issue Jun 29, 2023 · 5 comments

Comments

@cartazio
Copy link
Contributor

the first version uses JSON, which can often be malformed, and theres no good error recovery in that case, need to identify and switch to a more "error tolerant" self aligning format. (meaning we can skip a bad pair and recover useful outputs)

This was referenced Jun 29, 2023
@cartazio
Copy link
Contributor Author

i've a few ideas for this i'll try.

@mmirman
Copy link
Contributor

mmirman commented Jun 30, 2023

XML is the key. Its really the only key. Foundation LLMs have a lot of XML in their outputs so are super-primed to output it. Also you can scrub inputs by ensuring the tags are unlikely to be guessed.

Also this is what LMQL is for!

@cartazio
Copy link
Contributor Author

Good points. I’ll dig into this direction

@mmirman
Copy link
Contributor

mmirman commented Jun 30, 2023 via email

@cartazio
Copy link
Contributor Author

cartazio commented Jul 5, 2023

should be a bit more robust, we can revisit this later

@cartazio cartazio closed this as completed Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants