Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically enable /improve --extended mode for large PRs #564

Merged
merged 2 commits into from
Jan 4, 2024

Conversation

zmeir
Copy link
Contributor

@zmeir zmeir commented Jan 3, 2024

Type

Enhancement


Description

  • Introduced a feature that automatically enables extended mode for large PRs. This is determined by the number of files, additions, and deletions in the PR.
  • Modified the logic for reducing the number of suggestions, ensuring it doesn't go below a certain threshold.
  • Updated the documentation and configuration file to include the new parameters for automatically enabling extended mode for large PRs and their default values.

PR changes walkthrough

Relevant files                                                                                                                                 
Enhancement
1 files
pr_code_suggestions.py                                                                           
    pr_agent/tools/pr_code_suggestions.py

    The PR introduces a new feature that automatically enables
    extended mode for large PRs. This is determined by the
    number of files, additions, and deletions in the PR. The PR
    also modifies the logic for reducing the number of
    suggestions, ensuring it doesn't go below a certain
    threshold.

+24/-3
Documentation
1 files
IMPROVE.md                                                                                                   
    docs/IMPROVE.md

    The PR updates the documentation to include the new
    parameters for automatically enabling extended mode for
    large PRs. It also provides default values for these
    parameters.

+4/-0
Configuration changes
1 files
configuration.toml                                                                                   
    pr_agent/settings/configuration.toml

    The PR updates the configuration file to include the new
    parameters for automatically enabling extended mode for
    large PRs and their default values.

+4/-0

User description

Thought it could be nice to add an option to allow the /improve command to automatically run with --extended mode based on the size of the PR (number of files/additions/deletions)

Copy link
Contributor

PR Description updated to latest commit (2f9fbbf)

Copy link
Contributor

PR Analysis

  • 🎯 Main theme: Enhancement of the 'improve' tool to automatically enable extended mode for large PRs
  • 📝 PR summary: This PR introduces a feature that automatically enables extended mode for large PRs based on the number of files, additions, and deletions. It also modifies the logic for reducing the number of suggestions, ensuring it doesn't go below a certain threshold. The documentation and configuration file are updated to include the new parameters.
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: No
  • ⏱️ Estimated effort to review [1-5]: 3, because the PR involves changes in the core logic of the 'improve' tool and introduces new features. The changes are not too complex but require a good understanding of the existing codebase.
  • 🔒 Security concerns: No security concerns found

PR Feedback

💡 General suggestions: The PR is well-structured and the changes are clearly explained. It would be beneficial to add tests to verify the new functionality and ensure it works as expected. Also, consider handling exceptions more specifically rather than using a general exception clause.

🤖 Code feedback:
relevant filepr_agent/tools/pr_code_suggestions.py
suggestion      

Consider handling exceptions more specifically. Using a general exception clause can make it harder to debug issues and may also hide unexpected behavior. [important]

relevant lineexcept Exception as e:

relevant filepr_agent/tools/pr_code_suggestions.py
suggestion      

Consider adding a docstring to the _get_is_extended method to explain its purpose and the meaning of its return value. [medium]

relevant linedef _get_is_extended(self, args: list[str]) -> bool:

relevant filepr_agent/tools/pr_code_suggestions.py
suggestion      

Consider using a more descriptive variable name instead of s in the for loop. This would make the code more readable. [medium]

relevant linefor s in sort_order['Sort Order']:

✨ Usage tips:

To invoke the PR-Agent, add a comment using one of the following commands:

  • /review: Request a review of your Pull Request.
  • /describe: Update the PR title and description based on the contents of the PR.
  • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
  • /ask <QUESTION>: Ask a question about the PR.
  • /update_changelog: Update the changelog based on the PR's contents.
  • /add_docs 💎: Generate docstring for new components introduced in the PR.
  • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
  • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

See the tools guide for more details.
To edit any configuration parameter from the configuration.toml, add --config_path=new_value.
For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
To list the possible configuration parameters, add a /config comment.

Comment on lines +289 to +296
max_len = max(
len(data_sorted),
get_settings().pr_code_suggestions.num_code_suggestions,
get_settings().pr_code_suggestions.num_code_suggestions_per_chunk,
)
new_len = int(0.5 + max_len * get_settings().pr_code_suggestions.final_clip_factor)
if new_len < len(data_sorted):
data_sorted = data_sorted[:new_len]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intended to handle an edge-case where the total size of data_sorted before clipping is already small enough in comparison to num_code_suggestions or num_code_suggestions_per_chunk, and so clipping it will result it potentially much fewer suggestions.

For example:
Assume num_code_suggestions_per_chunk=10, final_clip_factor=0.5 and len(data_sorted)==2 before clipping. Based on these configurations, it would be reasonable to expect that the PR-Agent will give us all of the (2) suggestions it generated, but in reality it would have applied the final_clip_factor and cut the data_sorted list in half, giving us only the first suggestion.
With this "fix" the final_clip_factor won't reduce the size of the data_sorted list if it's already small enough relative to the max number of suggestions we set in the configuration.

@mrT23
Copy link
Collaborator

mrT23 commented Jan 4, 2024

@zmeir i am not sure about the need of this specific PR.

I recommend always using '/improve --extended'.
If the PR is below 8000 tokens, it will make a single call. Only if needed, it will make extra calls, and they will be according the extra tokens needed.

So the command itself is already "automatic", according to the correct input - the number of tokens, that will lead to the minimal number of calls

Can you elaborate why do we need extra logic beyond the original one ?
Or maybe the correct question is - why is the 'extended' mode not the default ?

@zmeir
Copy link
Contributor Author

zmeir commented Jan 4, 2024

@zmeir i am not sure about the need of this specific PR.

I recommend always using '/improve --extended'. If the PR is below 8000 tokens, it will make a single call. Only if needed, it will make extra calls, and they will be according the extra tokens needed.

So the command itself is already "automatic", according to the correct input - the number of tokens

Can you elaborate why do we need extra logic beyond the original one ? Or maybe the correct question is - why is the 'extended' mode not the default ?

I think it comes down to usability and cost.

From manager's perspective, you might not want to control the cost of using the /improve command. If you disable the extended mode (or set a very high threshold to activate it), you know it's just 1 API call. You might even want for even large PRs to only incur 1 API call "by default", and only use the extended mode for specific cases.

However, from a user's perspective, I don't really want to care about the difference, I just want to best results and the simplest usage. So ideally I only want to run /improve and have it use the best strategy according to whatever the algorithmic and monetary constraints may be.

So to your question why the extended mode is not the default, I think this what I was aiming for with this PR - if this is what you want you can set the thresholds to 0 and it will always run with extended mode. But I wanted to still leave some wiggle room in case you're concerned about the extra calls.

I hope that answers your questions.

I should also note that in the extended mode you have some extra logic for clipping some percentage of the suggestions that will result in fewer suggestions than the normal mode, if there was only one call and it produced fewer than the max number of suggestions. This is another part of this PR - I can separate it if you'd like.

@zmeir
Copy link
Contributor Author

zmeir commented Jan 4, 2024

To simplify my previous response: I'm a lazy user that wants to type fewer characters, so I want to type /improve instead of /improve --extended. There was no configuration to allow that so I added the auto_extended_mode. The rest of the thresholds are just added because I could and I wanted to give more control if anyone wants it, but as you said they can all just be 0 and the extended mode will always be the default.

@mrT23
Copy link
Collaborator

mrT23 commented Jan 4, 2024

@zmeir
I am not at peace with this specific PR.

It contains a lot of magic numbers, that are hard to tune or understand their meaning, and how they will impact the results.
In contrast, the 'extended' mode as-is already uses the optimal tradeof parameter - number of tokens, which strikes the correct adaptive balance between the number of calls (I.e. cost) to quality.

Also notice you can always lower the number of possible tokens per model call, and it will give you a similar effect what you talked about, yet in a more controlled, manner and still using a single parameter.

Maybe the default option should just be to enable the 'extended' mode.
(with or without clipping, with or without ranking). But i want to avoid magic numbers, hard to understand their impact.

I am not sure if 500 lines is a lot or not. maybe they have no impact at all, and in practice its just the same as doing 'extended' all the time. or maybe the other way around - extended won't happen, even if needed, because 500 lines are too much. I think this is correct for most users.

@zmeir
Copy link
Contributor Author

zmeir commented Jan 4, 2024

@mrT23 that makes perfect sense, thanks for the explanation.

I made a simpler version of this PR with just the auto_extended_mode toggle and what I believe is a fix for the clipping logic - let me know what you think: #569

@mrT23 mrT23 merged commit 2f9fbbf into Codium-ai:main Jan 4, 2024
1 check passed
@zmeir zmeir deleted the zmeir/enhance/auto_improve_extended branch January 4, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants