Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move some work to backend, improve algorithms #46

Open
FraserLee opened this issue Jul 5, 2023 · 1 comment
Open

move some work to backend, improve algorithms #46

FraserLee opened this issue Jul 5, 2023 · 1 comment

Comments

@FraserLee
Copy link
Collaborator

Right now there's some text processing that happens in the front-end, including

  • Unification of formats for single citations with Regices (the LLM isn't particularly consistent with its output)
  • Unification of formats for citation groups
  • Reordering citations within one message (such that the first is [1] , the second [2] , etc)
  • Reordering citations such that each subsequent message starts counting at n + 1

Some (all?) of these should be moved to the backend in python. This would improve citation quality for alternate front-ends, including any ops retroactively looking at logs.

Additionally we're currently doing all this stuff through string manipulation when really it's better suited to some custom ropes. We could even consider serializing as ropes such that the front-end doesn't need to re-parse out where citations are, but my intuition here is that the added json overhead wouldn't be worth the trade-off.

If someone's particularly excited for this issue, I can drop some links to locations in the codebase to get started. If not, I'll tackle it at some point.

@FraserLee
Copy link
Collaborator Author

After some investigation and a few prototypes, I believe the form of this I had initially envisioned is inviable. It would mean either sacrificing working citations while streaming, or refactoring to a diff-syncing protocol (likely hand-rolled - there doesn't seem to be much out of the box in this space).

Just as an example of why this is non-trivial, if the LLM outputs

chunk 1: "foo [a, b, c,"
chunk 2: " d, e] bar"

then there's no purely incremental message we can send without to both process out citations ("foo [a, b] bar" -> "foo ", (citation object 1), (citation object 2), " bar") while sending our chunks in sync with the ones we recieve.

Possible solutions:

  • wait until entire message received, don't stream in real time -> terrible user experience
  • re-send the entire message with every new chunk -> quadratic order network usage, not an option
  • come up with a scheme to send diffs instead of purely incremental messages -> might work, but technically complicated and probably pretty fragile
  • maintain a n chunk buffer such that we can always process far enough ahead before sending data -> most feasible, will proceed with this option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant