Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiroot #3815

Merged
merged 1 commit into from
Feb 6, 2023
Merged

Multiroot #3815

merged 1 commit into from
Feb 6, 2023

Conversation

bruntib
Copy link
Contributor

@bruntib bruntib commented Jan 12, 2023

[cmd] Multiroot directory analysis support

Currently the input of analysis is a compilation database JSON file. The
purpose of this PR is to support the following analysis invocations:

# Analyze one source file.
CodeChecker analyze main.c -o reports

#analyze all source files under a directory.
CodeChecker analyze my_project -o reports

This way the analysisa project with multiple root directories (i.e.
separate compilation databases for submodules in subdirectories)
is supported.

@bruntib bruntib added enhancement 🌟 WIP 💣 Work In Progress CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands analyzer 📈 Related to the analyze commands (analysis driver) labels Jan 12, 2023
@bruntib bruntib added this to the release 6.22.0 milestone Jan 12, 2023
@bruntib
Copy link
Contributor Author

bruntib commented Jan 12, 2023

This shouldn't be merged until #3807 is merged first.

@Szelethus
Copy link
Collaborator

I see you have proper descriptions in the commit messages, as well as a proper title -- could you copy paste them here?

@Szelethus Szelethus self-requested a review January 13, 2023 11:09
@whisperity whisperity marked this pull request as draft January 17, 2023 15:00
@bruntib bruntib mentioned this pull request Jan 19, 2023
@bruntib bruntib removed the WIP 💣 Work In Progress label Jan 20, 2023
@bruntib bruntib marked this pull request as ready for review January 23, 2023 10:13
Copy link
Collaborator

@vodorok vodorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my remarks

otherwise raises an argparse.ArgumentTypeError which constitutes in a
graceful error message automatically by argparse module.
"""
path = os.path.abspath(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use pathlib? That lib has good multiplatform support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using os.path module for handling paths. pathlib is an alternative module. I can see the advantages of its usage, but I think we shouldn't mix them in the code base. For example this existing_abspath() function is returning a path. Should it be a string or a pathlib.Path object? Should this be adapted at the caller side transitively?

@@ -28,7 +28,6 @@ def __init__(self, environ):
super(ClangSAConfigHandler, self).__init__()
self.ctu_dir = ''
self.ctu_on_demand = False
self.log_file = ''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this never used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member was stored in the construct_config_handler() function, but it is read nowhere. Since the store operation has been removed, this variable became completely unused.

"files which were created during the build. "
"The analyzers will check only the files "
"registered in these build databases.")
parser.add_argument('input',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this change replicated in the check command? There is a logfile argument there too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeChecker check command has a slightly different interface, for example it doesn't have this positional argument. It can be given the compilation database through -l flag, or a build command can be given by -b flag.

Traverse the parent directories of the given path and find the closest
compile_commands.json. If no JSON file exists with this name up to the root
then None returns.
The path of the first compilation database is returned even is it doesn't
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... even if it doesn't ..


while True:
path = os.path.dirname(path)
compile_commands_json = os.path.join(path, COMPILATION_DATABASE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if should we hard code "compile_commands.json" as the used compilation database name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another language I'd say, this is a constant variable. The intention here was to express that "compile_commands.json" is a conventionally used name by other tools (e.g. CMake exports it with this name, Clangd and vim plugins are looking for this automatically, etc.). I just wanted to prevent potential typos at the places of usage. This variable is used at several places.

os.path.splitext(source_file_path)[1] in C_CPP_OBJC_OBJCPP_EXTS


def build_actions_for_file(file_path: str) -> List[Dict]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_build_actions_for_source_file?


elif os.path.isdir(analysis_input):
compilation_database_files = \
find_all_compilation_databases(analysis_input)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure why the per file and the directory based method has different compDB discovery implementations.
Isn/t the single source file case just a specialized (restricted in a sense) case of the directiry based?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the original implementation, that you described. The problem was its extreme bad performance. I couldn't wait for the compilation database collection for analyzing LLVM source code. That's why the directory-based analysis is initiated from the compilation databases, not the source files.

"file": "inner.c"
}]

with open(cls.comp_db_outer, "w",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an inner function the create the compdb-s could be created.

Test if the correct set of build actions return to each test source
file.

WARNING! compilation_database.gather_compilation_database() function is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting consideration. I am thinking maybe we could somehow indicate in the assertion messages the path of the used compilation database? This way we could at least detect if something is wrong with the test environment.

databases.
input The input of the analysis can be either a compilation
database JSON file, a path to a source file or a path
to a directory containing source files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collected some more places where analyze input is mentioned:
It is at your discretion how much you edit.

  • README.md
    • 71, 116
  • analyzer/user_guide.md
    -859, 910
  • usage.md
    -125
    The numbers are line numbers where the new mention of the new option is missing.

Currently the input of analysis is a compilation database JSON file. The
purpose of this PR is to support the following analysis invocations:

```
# Analyze one source file.
CodeChecker analyze main.c -o reports

#analyze all source files under a directory.
CodeChecker analyze my_project -o reports
```

This way the analysisa project with multiple root directories (i.e.
separate compilation databases for submodules in subdirectories)
is supported.
@vodorok
Copy link
Collaborator

vodorok commented Feb 3, 2023

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer 📈 Related to the analyze commands (analysis driver) CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands enhancement 🌟
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants