Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query replay #45

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Query replay #45

wants to merge 1 commit into from

Conversation

spirali
Copy link
Contributor

@spirali spirali commented Jul 31, 2023

Simple query caching

Fixes #40

image

@spirali
Copy link
Contributor Author

spirali commented Jul 31, 2023

@jas-ho @gavento Any opinion on this?

@gavento
Copy link
Contributor

gavento commented Jul 31, 2023

How about making this part of the language model (engine), e.g. as a wrapper?
Pros:

  • Can work in any context, not just query_engine and other special methods.
    Cons:
  • Needs to be done separately for each LLM.

Also, it would be nice to have a verification that only cached queries are made, and in the right order.

@spirali
Copy link
Contributor Author

spirali commented Jul 31, 2023

Checks that query match are already there.

I wanted to create a ReplayModel but it has some caveats:

  • We do not have standardized "config" across the models so it is not easy to directly pass the underlying configuration, without it you cannot use replayed queries again in replay. I think this is important when you create a replay, then use a replay and create some new queries and then use this new context again for replay.
  • You have to use query_model anyway because otherwise the query is not stored in context

@jas-ho
Copy link
Collaborator

jas-ho commented Aug 2, 2023

do I understand correctly that

  • the context from which responses are taken in the replay (replay = Replay(context1)) and
  • the context in which I use the replay (with context2: query_model(..., replay=replay) )

are two independent things @spirali?

prompt: str | FormatStr,
kwargs: dict = None,
with_context=True,
replay: Replay = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing should be Optional[Replay] or Replay | None

class Replay:
def __init__(self, context: Context):
self.replays = {}
for context in context.find_contexts(lambda ctx: ctx.kind == "query"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to extract string constants like "query", "conf", "prompt" etc into a constants module. it can reduce the risk of bugs from typos. it can also signal to future devs that these are keys which more than one module relies on

self.replays.setdefault(key, [])
self.replays[key].append(context.result)
for replay in self.replays.values():
replay.reverse()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here assumes that future implementations of StorageBase.list adhere to a certain order of listing contexts. This should be specified in the doc string of StorageBase.list and tested in a unit test (if not there already)



@patch("interlab.lang_models.openai._make_openai_chat_query")
def test_replay_model(_make_openai_chat_query):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

@jas-ho
Copy link
Collaborator

jas-ho commented Aug 2, 2023

returning to this point by @gavento

Can work in any context, not just query_engine and other special methods.

calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize
...and this will likely expand in the future.

What are your thoughts about making replay easily available consistently across all methods which rely on LLM calls @spirali?

PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run

@gavento
Copy link
Contributor

gavento commented Aug 2, 2023

Major thought why not: separation of concerns.

query_model is now doing one thing: calling a variety of models with unified API and context logging. (The plan is to also get e.g. token limit to work as a parameter here, which is not trivial with LC models).

I am against adding more functionality to it - I think it is bad design, and the functionality does not seem core to me (or, at least, I am not convinced it is the right way to do it), and bloating one function for it seems bad design for any future.

calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize
...and this will likely expand in the future.

Prevalence of query_model should IMO be irrelevant here - currently you can also call models directly (for any reason at all, including extra functionality of the call etc) and nothing will break (except for a missing log). With adding this to query_model, that becomes a fragile bottle neck.

PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run

So how about adding it as a wrapper around LLMs individually?

  • Easy to understand what is going on. Composable, and an independent feature (that we can even deprecate -- good point of every non-core design bit, and even there).
  • You can still feed it a Context object as a source of the logs
  • Tweakable (e.g. having slightly different behavior for different actors etc.)

@spirali
Copy link
Contributor Author

spirali commented Aug 8, 2023

I am rewriting replay functionality from scratch. I want to make this functionality completely generic (any function / method call can be replayed, not necessarily only LLMs). I have some PoC but I would like to sanity check of the idea before finishing it. Here is how it should look at the end:

def my_function(x, y):
   return x + y

replay = Replay("my-replay-stream-name")

# Storing replay

with Context("root") as root:
   replay(my_function)(1, 2)
   
   
# Using replay

replay = Replay("my-replay-stream-name", context=root)
with Context("root") as root:
   replay(my_function)(2, 3)  # Call my_function as it is not in cache
   replay(my_function)(1, 2)  # Recalls from cache
   replay(my_function)(1, 2)  # Call my_function as it is not in cache (2nd call)
   

# Wrapping any object, in replay log is stored also "self"

from langchain.llms import OpenAI

llm = replay(OpenAI(...))
llm.generate(["Tell me a joke."])


# Partial matching

replay(my_function, ignore_args=["y"])(2, 3)  # Ignores argument "y" when trying to find matching call in cache

Any comments?

@gavento
Copy link
Contributor

gavento commented Aug 8, 2023

Thanks!

Three comments:

  1. Why as a decorator/wrapper function? This way all code needs to be wrapped in replay, including internal querying, so either we use it everywhere, or you pass the wrapped function instead of a LLModel (but then you lose any LLM information)
  • Why not just have it as LangChain model wrapper class? (subclass of LLMBase that does pass-through on most attributes and non-query methods, maybe with some modifications), or or model class?
  1. Storage logic: How does the storage/retrieval in contexts work?
  • the demo seems a bit magic - maybe have explicit call to store vs retrieve it? E.g. I assume your code will silently load an empty replay if I mistype the replay name now.
  • How are replays stored in non-root contexts? Are they also present in sub-contexts? etc.
  • I assume that usually there will be only one replay in a context, so make the name optional?
    • (or wild idea: have a leaf context for each replay, and thus show it in the UI tree?)
  1. Cache-misses and hits (esp. with repeated calls with same values) seem possibly inconsistent or illogical now. Way I think about it as a user
  • Either it offers "strict" replay, and then cache-miss before cache-hit should be an error, indicating non-deterministic replayed function (that should not fail silently!) (This is the primary way to present this, IMO)
  • Or it offers "just caching" to the user and allows e.g. mixing cache-hits with cache-misses, but then repeated queries for the same value seem problematic anyway (what if I do an extra call during replay that would be uncached normally, but it coincides with some cached value that gets called multliple times? etc.)

@gavento
Copy link
Contributor

gavento commented Aug 8, 2023

I also want to pitch games/situations with storable state (pickle is fine) and meaningful steps where the state can be stored/restored ;-)

@spirali
Copy link
Contributor Author

spirali commented Aug 8, 2023

  1. No, you just pass the instance. Look at my example, I call there .generate() on the wrapped instance without any additional wrapping, it automatically pass through arguments and methods. So you just wrap the instance when it is created and pass the instance in other cases. We can add some LangChainWrapper, I am not against it, but it will be just oneliner.

  2. Right now, it always records calls. When you pass context to replay constructor, it uses stored info when necessary. It wanted to create it like this to avoid changed code when adding new code. I am still bit experimenting when it is actually stored.

  3. I agree that we should offer a possibility to set a policy what to do with cache miss. What should be default needs to be discussed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching model responses for quicker development and branching of runs
3 participants