Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding numbered hunks and code suggestions feature #50

Merged
merged 1 commit into from
Jul 16, 2023
Merged

Conversation

mrT23
Copy link
Collaborator

@mrT23 mrT23 commented Jul 15, 2023

Type of PR:
Enhancement


PR Description:
This PR introduces a new feature that allows the conversion of hunks in a patch to numbered hunks, and the generation of code suggestions for a PR. It also includes the necessary changes to support this feature in the Github and Gitlab providers, and updates the CLI and configuration settings accordingly.


PR Main Files Walkthrough:
-pr_agent/algo/git_patch_processing.py: Added a new function convert_to_hunks_with_lines_numbers that converts a patch to numbered hunks. This function is used to generate more detailed diffs for code suggestions.
-pr_agent/algo/pr_processing.py: Updated the get_pr_diff function to support the addition of line numbers to hunks. This is used when generating the PR diff for code suggestions.
-pr_agent/cli.py: Added a new command line argument --pr_code_suggestions to trigger the code suggestions feature.
-pr_agent/git_providers/git_provider.py: Added a new abstract method publish_code_suggestion to the GitProvider base class, which is implemented by the Github and Gitlab providers.
-pr_agent/git_providers/github_provider.py: Implemented the publish_code_suggestion method for the Github provider.
-pr_agent/git_providers/gitlab_provider.py: Implemented the publish_code_suggestion method for the Gitlab provider, but it currently raises a 'not implemented yet' exception.
-pr_agent/tools/pr_code_suggestions.py: Added a new class PRCodeSuggestions that handles the generation and publishing of code suggestions for a PR.
-pr_agent/settings/configuration.toml: Updated the configuration settings to include a new setting for the number of code suggestions to generate.
-pr_agent/settings/pr_code_suggestions_prompts.toml: Added a new file that contains the prompts for the code suggestions feature.

@mrT23 mrT23 requested a review from hussam789 July 15, 2023 13:10
@mrT23 mrT23 changed the title Code suggestions as a separate tool Adding numbered hunks and code suggestions feature Jul 15, 2023
pr_agent/cli.py Show resolved Hide resolved
@mrT23
Copy link
Collaborator Author

mrT23 commented Jul 16, 2023

Preparing review...

3 similar comments
@mrT23
Copy link
Collaborator Author

mrT23 commented Jul 16, 2023

Preparing review...

@mrT23
Copy link
Collaborator Author

mrT23 commented Jul 16, 2023

Preparing review...

@mrT23
Copy link
Collaborator Author

mrT23 commented Jul 16, 2023

Preparing review...

Comment on lines +23 to +24
def get_pr_diff(git_provider: Union[GithubProvider, Any], token_handler: TokenHandler,
add_line_numbers_to_hunks: bool = False, disable_extra_lines: bool =False) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider using a class or a data structure to encapsulate the parameters of the get_pr_diff function. This will make the function signature cleaner and easier to manage as the number of parameters grows.

Suggested change
def get_pr_diff(git_provider: Union[GithubProvider, Any], token_handler: TokenHandler,
add_line_numbers_to_hunks: bool = False, disable_extra_lines: bool =False) -> str:
class PrDiffParams:
def __init__(self, git_provider, token_handler, add_line_numbers_to_hunks=False, disable_extra_lines=False):
self.git_provider = git_provider
self.token_handler = token_handler
self.add_line_numbers_to_hunks = add_line_numbers_to_hunks
self.disable_extra_lines = disable_extra_lines
def get_pr_diff(params: PrDiffParams) -> str:

Comment on lines +28 to +35
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.get_pr_description(),
"language": self.main_language,
"diff": "", # empty diff for initial calculation
'num_code_suggestions': settings.pr_code_suggestions.num_code_suggestions,
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider using a data class or a named tuple for the vars attribute in the PRCodeSuggestions class. This will make the code more readable and easier to maintain.

Suggested change
self.vars = {
"title": self.git_provider.pr.title,
"branch": self.git_provider.get_pr_branch(),
"description": self.git_provider.get_pr_description(),
"language": self.main_language,
"diff": "", # empty diff for initial calculation
'num_code_suggestions': settings.pr_code_suggestions.num_code_suggestions,
}
from dataclasses import dataclass
@dataclass
class PrVars:
title: str
branch: str
description: str
language: str
diff: str
num_code_suggestions: int
self.vars = PrVars(
title=self.git_provider.pr.title,
branch=self.git_provider.get_pr_branch(),
description=self.git_provider.get_pr_description(),
language=self.main_language,
diff="", # empty diff for initial calculation
num_code_suggestions=settings.pr_code_suggestions.num_code_suggestions
)

Comment on lines +113 to +184
def convert_to_hunks_with_lines_numbers(patch: str, file) -> str:
# toDO: (maybe remove '-' and '+' from the beginning of the line)
"""
## src/file.ts
--new hunk--
881 line1
882 line2
883 line3
884 line4
885 line6
886 line7
887 + line8
888 + line9
889 line10
890 line11
...
--old hunk--
line1
line2
- line3
- line4
line5
line6
...

"""
patch_with_lines_str = f"## {file.filename}\n"
import re
patch_lines = patch.splitlines()
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
new_content_lines = []
old_content_lines = []
match = None
start1, size1, start2, size2 = -1, -1, -1, -1
for line in patch_lines:
if 'no newline at end of file' in line.lower():
continue

if line.startswith('@@'):
match = RE_HUNK_HEADER.match(line)
if match and new_content_lines: # found a new hunk, split the previous lines
if new_content_lines:
patch_with_lines_str += '\n--new hunk--\n'
for i, line_new in enumerate(new_content_lines):
patch_with_lines_str += f"{start2 + i} {line_new}\n"
if old_content_lines:
patch_with_lines_str += '--old hunk--\n'
for i, line_old in enumerate(old_content_lines):
patch_with_lines_str += f"{line_old}\n"
new_content_lines = []
old_content_lines = []
start1, size1, start2, size2 = map(int, match.groups()[:4])
elif line.startswith('+'):
new_content_lines.append(line)
elif line.startswith('-'):
old_content_lines.append(line)
else:
new_content_lines.append(line)
old_content_lines.append(line)

# finishing last hunk
if match and new_content_lines:
if new_content_lines:
patch_with_lines_str += '\n--new hunk--\n'
for i, line_new in enumerate(new_content_lines):
patch_with_lines_str += f"{start2 + i} {line_new}\n"
if old_content_lines:
patch_with_lines_str += '\n--old hunk--\n'
for i, line_old in enumerate(old_content_lines):
patch_with_lines_str += f"{line_old}\n"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The convert_to_hunks_with_lines_numbers function is quite long and does multiple things. Consider breaking it down into smaller, more manageable functions. This will make the code easier to read and maintain.

Suggested change
def convert_to_hunks_with_lines_numbers(patch: str, file) -> str:
# toDO: (maybe remove '-' and '+' from the beginning of the line)
"""
## src/file.ts
--new hunk--
881 line1
882 line2
883 line3
884 line4
885 line6
886 line7
887 + line8
888 + line9
889 line10
890 line11
...
--old hunk--
line1
line2
- line3
- line4
line5
line6
...
"""
patch_with_lines_str = f"## {file.filename}\n"
import re
patch_lines = patch.splitlines()
RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
new_content_lines = []
old_content_lines = []
match = None
start1, size1, start2, size2 = -1, -1, -1, -1
for line in patch_lines:
if 'no newline at end of file' in line.lower():
continue
if line.startswith('@@'):
match = RE_HUNK_HEADER.match(line)
if match and new_content_lines: # found a new hunk, split the previous lines
if new_content_lines:
patch_with_lines_str += '\n--new hunk--\n'
for i, line_new in enumerate(new_content_lines):
patch_with_lines_str += f"{start2 + i} {line_new}\n"
if old_content_lines:
patch_with_lines_str += '--old hunk--\n'
for i, line_old in enumerate(old_content_lines):
patch_with_lines_str += f"{line_old}\n"
new_content_lines = []
old_content_lines = []
start1, size1, start2, size2 = map(int, match.groups()[:4])
elif line.startswith('+'):
new_content_lines.append(line)
elif line.startswith('-'):
old_content_lines.append(line)
else:
new_content_lines.append(line)
old_content_lines.append(line)
# finishing last hunk
if match and new_content_lines:
if new_content_lines:
patch_with_lines_str += '\n--new hunk--\n'
for i, line_new in enumerate(new_content_lines):
patch_with_lines_str += f"{start2 + i} {line_new}\n"
if old_content_lines:
patch_with_lines_str += '\n--old hunk--\n'
for i, line_old in enumerate(old_content_lines):
patch_with_lines_str += f"{line_old}\n"
def convert_to_hunks_with_lines_numbers(patch: str, file) -> str:
# toDO: (maybe remove '-' and '+' from the beginning of the line)
...
patch_with_lines_str = process_patch_lines(patch_lines)
return patch_with_lines_str.strip()
def process_patch_lines(patch_lines):
...
return patch_with_lines_str

Comment on lines +16 to +17
parser.add_argument('--pr_description', action='store_true', required=False)
parser.add_argument('--pr_code_suggestions', action='store_true', required=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider using subcommands instead of flags for the different modes of operation (pr_description, pr_code_suggestions). This will make the command line interface more intuitive and easier to use.

Suggested change
parser.add_argument('--pr_description', action='store_true', required=False)
parser.add_argument('--pr_code_suggestions', action='store_true', required=False)
subparsers = parser.add_subparsers(dest='command')
pr_description_parser = subparsers.add_parser('pr_description')
pr_code_suggestions_parser = subparsers.add_parser('pr_code_suggestions')

full code suggestions
@hussam789
Copy link
Collaborator

ToDo:

  • Add Code Suggestion to GitHub polling in handle_request

@mrT23 mrT23 merged commit 84bf95e into main Jul 16, 2023
@mrT23 mrT23 deleted the tr/numbered_hunks branch July 16, 2023 09:27
@UjjawalKRoy
Copy link

Preparing review...

24 similar comments
@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

@UjjawalKRoy
Copy link

Preparing review...

yochail pushed a commit to yochail/pr-agent that referenced this pull request Feb 11, 2024
Adding numbered hunks and code suggestions feature
@ElonaZharri
Copy link

Preparing review...

@ElonaZharri
Copy link

Preparing PR description...

@ElonaZharri
Copy link

Question: Why PR Review is failing for local run through source code

Answer:
The provided code is a Python implementation of a PR tool that generates a diff between two versions of a file, taking into account the language of the file and the token budget. Here's a breakdown of the code:

  1. The get_pr_diff function takes in a GithubProvider object and a TokenHandler object as inputs. It returns a tuple containing the full diff string and the total number of tokens used in the diff.
  2. The pr_generate_extended_diff function takes in a list of languages and a TokenHandler object as inputs. It generates a standard diff string with patch extension, and then prunes the diff if the total number of tokens exceeds the limit.
  3. The pr_generate_compressed_diff function takes in a list of languages, a TokenHandler object, and an optional convert_hunks_to_line_numbers parameter. It generates a compressed diff string by applying Diff Minimization techniques to reduce the number of tokens.
  4. The token_handler object is used to count the number of tokens in each file.
  5. The pr_languages list contains the languages for which the diff is being generated.
  6. The patches list stores the generated diff patches.
  7. The total_tokens variable keeps track of the total number of tokens used in the diff.
  8. The add_line_numbers_to_hunks parameter is used to convert hunks to line numbers if necessary.

The code uses several techniques to reduce the number of tokens in the diff, including:

  1. Starting from the largest diff patch and pruning smaller ones.
  2. Not using extend context lines around diff.
  3. Applying Diff Minimization techniques to reduce the number of tokens.
  4. Converting hunks to line numbers if necessary.

The code also provides additional information about the files that were not processed due to insufficient token budget, and the modified files that were not included in the diff.


✨ Ask tool usage guide:

Overview:
The ask tool answers questions about the PR, based on the PR code changes.
It can be invoked manually by commenting on any PR:

/ask "..."

Note that the tool does not have "memory" of previous questions, and answers each question independently.
You can ask questions about the entire PR, about specific code lines, or about an image related to the PR code changes.

See the ask usage page for a comprehensive guide on using this tool.

@ElonaZharri
Copy link

Preparing review...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants