Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement of AI Response Parsing Mechanism #537

Merged

Conversation

koid
Copy link
Contributor

@koid koid commented Dec 20, 2023

Type

Enhancement


Description

This PR introduces an enhancement to the AI response parsing mechanism. The main changes include:

  • A regular expression pattern has been introduced to extract the YAML snippet from the AI response, ignoring any explanatory text before the snippet.
  • The load_yaml function in pr_agent/algo/utils.py has been modified to use this pattern and extract the YAML snippet.
  • A new test case has been added in tests/unittest/test_load_yaml.py to verify the correct parsing of AI responses that include a description before the YAML snippet.

PR changes walkthrough

Relevant files                                                                                                                                 
Enhancement
1 files
utils.py                                                                                                       
    pr_agent/algo/utils.py

    The `load_yaml` function has been enhanced with a regular
    expression pattern to extract the YAML snippet from the AI
    response, ignoring any explanatory text before the snippet.
+6/-0
Tests
1 files
test_load_yaml.py                                                                                     
    tests/unittest/test_load_yaml.py

    A new test case has been added to verify the correct parsing
    of AI responses that include a description before the YAML
    snippet.
+12/-0

User description

As with added test cases during PR, AI responses are not only encapsulated in quotes as snippets but may also include explanatory text at the beginning.

While we already handle cases where existing code becomes a snippet, adding a step to extract the snippet portion from AI responses in advance is desired to prevent parsing errors when explanatory text is included.

Copy link
Contributor

PR Description updated to latest commit (4854007)

@koid
Copy link
Contributor Author

koid commented Dec 20, 2023

/describe

Copy link
Contributor

codiumai-pr-agent-pro bot commented Dec 20, 2023

PR Analysis

(review updated until commit 16b61eb)

  • 🎯 Main theme: Enhancement of AI Response Parsing Mechanism
  • 📝 PR summary: This PR introduces an enhancement to the AI response parsing mechanism. It adds a regular expression pattern to extract the YAML snippet from the AI response, ignoring any explanatory text before the snippet. The load_yaml function in pr_agent/algo/utils.py has been modified to use this pattern and extract the YAML snippet. A new test case has been added to verify the correct parsing of AI responses.
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: Yes
  • ⏱️ Estimated effort to review [1-5]: 2, because the PR is relatively small and the changes are straightforward. The code is well-structured and easy to understand.
  • 🔒 Security concerns: No security concerns found

PR Feedback

  • 💡 General suggestions: The PR is well-structured and the changes are clear. The use of regular expressions to extract the YAML snippet is a good approach. However, it would be better to handle the case where the regular expression does not match anything in the response text.

  • 🤖 Code feedback:
    relevant filepr_agent/algo/utils.py
    suggestion      It would be better to handle the case where the regular expression does not match anything in the response text. This can be done by adding an else clause to the if statement and raising an exception or returning an appropriate default value. [important]
    relevant lineif snipet:

    relevant filetests/unittest/test_load_yaml.py
    suggestion      It would be beneficial to add a test case for the scenario where the regular expression does not match anything in the response text. This will ensure that the function behaves as expected in this scenario. [medium]
    relevant linedef test_load_valid_yaml_with_description(self):

How to use

Instructions

To invoke the PR-Agent, add a comment using one of the following commands:
/review: Request a review of your Pull Request.
/describe: Update the PR title and description based on the contents of the PR.
/improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
/ask <QUESTION>: Ask a question about the PR.
/update_changelog: Update the changelog based on the PR's contents.
/add_docs: Generate docstring for new components introduced in the PR.
/generate_labels: Generate labels for the PR based on the PR's contents.
see the tools guide for more details.

To edit any configuration parameter from the configuration.toml, add --config_path=new_value.
For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
To list the possible configuration parameters, add a /config comment.

@codiumai-pr-agent-pro codiumai-pr-agent-pro bot changed the title ignore header description in ai response Enhancement of AI Response Parsing Mechanism Dec 20, 2023
Copy link
Contributor

PR Description updated to latest commit (4854007)

@koid koid force-pushed the feature/ignore-header-description-in-ai-response branch from f36d77f to 16b61eb Compare December 20, 2023 02:50
@koid
Copy link
Contributor Author

koid commented Dec 20, 2023

I apologize; I found a typo, so I re-pushed it.

@koid
Copy link
Contributor Author

koid commented Dec 20, 2023

/describe

Copy link
Contributor

PR Description updated to latest commit (16b61eb)

@koid
Copy link
Contributor Author

koid commented Dec 20, 2023

/review

Copy link
Contributor

Persistent review updated to latest commit 16b61eb

@mrT23
Copy link
Collaborator

mrT23 commented Dec 20, 2023

/ask explain to me what the following code does. Be specific, and give examples. Be critical, and think of edge cases.

    # remove everything before the first ```yaml
    snipet_pattern = r'```(yaml)?[\s\S]*?```'
    snipet = re.search(snipet_pattern, response_text)
    if snipet:
        response_text = snipet.group()

Copy link
Contributor

Question: explain to me what the following code does. Be specific, and give examples. Be critical, and think of edge cases. python snipet_pattern = r'(yaml)?[sS]*?' snipet = re.search(snipet_pattern, response_text) if snipet: response_text = snipet.group()

Answer:
The provided code is a regular expression pattern matching operation that is used to extract a YAML snippet from a larger text string, which is the AI's response.

Here's a breakdown of the code:

  1. snipet_pattern = r'```(yaml)?[\s\S]*?```': This line defines a regular expression pattern. The pattern is looking for a string that starts with '' optionally followed by 'yaml', then any number of characters (including newline characters due to `[\s\S]`), and ends with ''. The '?' after '(yaml)' and '*?' makes the 'yaml' and the following characters optional and non-greedy respectively. This means it will stop at the first occurrence of '' it finds after the initial ''.

  2. snipet = re.search(snipet_pattern, response_text): This line uses the re.search function to search the response_text for the first location where the snipet_pattern matches.

  3. if snipet: response_text = snipet.group(): If a match is found (if snipet), the matched string is extracted using the group() method and assigned back to response_text.

For example, if the AI's response is:

Here is the answer in YAML format:

```yaml
name: John Smith
age: 35
The code will extract:
name: John Smith
age: 35
and assign it back to `response_text`.

As for edge cases, this code assumes that the YAML snippet will always be enclosed within '```'. If the AI's response doesn't follow this format, the code might not work as expected. Also, if there are multiple YAML snippets in the response, this code will only extract the first one because `re.search` stops after finding the first match.

@mrT23
Copy link
Collaborator

mrT23 commented Dec 20, 2023

Hi @koid
The logic behind this PR is sound, and the tests helped understand the usecase 👍

My request is to move the condition to the fallback:

data = try_fix_yaml(response_text)

Why:

  1. This is not a case that happens very often (with openai). The model usually (>98%) doesn't add a prefix. we specifically instructed him not to
  2. we want to document such cases, and add a proper log message

Once you will move it to the fallback section, we can merge

@mrT23 mrT23 self-requested a review December 20, 2023 07:07
@koid
Copy link
Contributor Author

koid commented Dec 20, 2023

Thank you for the review.
I appreciate your request and the reasons behind it.
I will re-implement it as a fallback within try_fix_yaml.

@koid koid force-pushed the feature/ignore-header-description-in-ai-response branch from ce1c4f8 to e2797ad Compare December 21, 2023 02:06
data = {}
for i in range(1, len(response_text_lines)):
response_text_lines_tmp = '\n'.join(response_text_lines[:-i])
try:
data = yaml.safe_load(response_text_lines_tmp,)
get_logger().info(f"Successfully parsed AI prediction after removing {i} lines")
break
return data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 'return' statement because even when it could be loaded as YAML, it was not returning a value.
Although I considered creating a separate PR for this, I decided to include the fix together.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@koid
Copy link
Contributor Author

koid commented Dec 21, 2023

Hi @mrT23

I've completed the corrections, so please review it again.

The changes are as follows:

  • e2797ad Implemented a fallback within try_fix_yaml.
  • f998620 Additionally, modified the test cases from load_yaml to try_fix_yaml to align with the changes. While fixing the above test cases, noticed a discrepancy in the file name prefix, so corrected the file names.
  • a9d7899 Discovered a bug in the implementation of other fallbacks in try_fix_yaml, so fixing them concurrently.

@mrT23
Copy link
Collaborator

mrT23 commented Dec 21, 2023

Thanks @koid
merged

@mrT23 mrT23 merged commit 66cbd6e into Codium-ai:main Dec 21, 2023
2 checks passed
@koid koid deleted the feature/ignore-header-description-in-ai-response branch December 21, 2023 07:08
yochail pushed a commit to yochail/pr-agent that referenced this pull request Feb 11, 2024
…ription-in-ai-response

Enhancement of AI Response Parsing Mechanism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants