Define a strategy for prompt testing #144

mspronesti · 2023-05-18T10:14:18Z

mspronesti
May 18, 2023

Hi,
we've been changing / adjusting the code generation prompt a couple of times in the past days. I highly suggest defining a strategy for prompt evaluation, to measure the improvement and make sure it is not patching one failing scenario while breaking several others. Do we have one at the moment ?

gventuri · 2023-05-18T12:18:52Z

gventuri
May 18, 2023
Maintainer

Totally. This is actually a very good point, also considering that some of the recent changes have introduced bugs.
Here's the strategy as of now: we try not to edit the prompt unless it's strictly necessary.

However, it might happen we will need to change it at some point. In that case, I recommend to write down a list of "questions/prompts" we can ask PandasAI to benchmark whether there are regressions.

I suggest we write them down in this conversation and once we have, say 50 different use cases, we add to the documentation.

In the long run, it would also be cool to run them in the CI when something is changed within a prompt (a little bit expensive, but still shouldn't happen too often).

What do you think?

2 replies

mspronesti May 19, 2023
Author

I believe we need a meaningful dataset of "must work" queries. Adding them to the CI makes sense, as long we test the actual service without mocking it :-)

Another point I'd like to raise is that I'm not sure having a unique prompt for all the supported model is a good idea and I recommend further investigating this aspect.

gventuri May 19, 2023
Maintainer

@mspronesti totally! To to that, tho, we need to refactor a little bit the architecture of prompts in general.
I was considering having a Prompt class which contains the prompt + other info. And we could assign a prompt for each use case to each LLM (or worse case fallback to the "default" prompt for each use case.
Then I think it will make sense, at some point, to also have a different prompt when we want to generate a chart to make it more effective. This might require some kind of "orchestration", but still make sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a strategy for prompt testing #144

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Define a strategy for prompt testing #144

mspronesti May 18, 2023

Replies: 1 comment · 2 replies

gventuri May 18, 2023 Maintainer

mspronesti May 19, 2023 Author

gventuri May 19, 2023 Maintainer

mspronesti
May 18, 2023

Replies: 1 comment 2 replies

gventuri
May 18, 2023
Maintainer

mspronesti May 19, 2023
Author

gventuri May 19, 2023
Maintainer