Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyclomatic Complexity at Type Level for C++ #683

Closed
mcserep opened this issue Nov 14, 2023 · 13 comments · Fixed by #727
Closed

Cyclomatic Complexity at Type Level for C++ #683

mcserep opened this issue Nov 14, 2023 · 13 comments · Fixed by #727
Assignees
Labels
Kind: Enhancement 🌟 Level: Moderate (2) Plugin: C++ Issues related to the parsing and presentation of C++ projects. Plugin: Metrics Issues related to the code metrics plugin.

Comments

@mcserep
Copy link
Collaborator

mcserep commented Nov 14, 2023

See #682 for Cyclomatic Complexity at Function Level.

Adapted to the object-oriented paradigm, this metric can also be defined for classes as the sum of its methods complexity metric.

@mcserep mcserep added Kind: Enhancement 🌟 Plugin: C++ Issues related to the parsing and presentation of C++ projects. Plugin: Metrics Issues related to the code metrics plugin. Level: Moderate (2) labels Nov 14, 2023
@mcserep mcserep added this to To do in Roadmap via automation Nov 14, 2023
@mcserep mcserep added this to To do in Lab via automation Nov 14, 2023
@intjftw intjftw assigned intjftw and unassigned intjftw Feb 21, 2024
@Seeker04 Seeker04 self-assigned this Feb 22, 2024
@Seeker04
Copy link
Collaborator

Seeker04 commented Mar 2, 2024

I have a question. How should we handle nested classes?

class A {
	void f();
	void g();

	class B {
		void h();
	}
};

Should we add the complexity of B (coming from h) to A?

Conceptually, I like the idea to penalize containing classes for each of their inner class since they fall under them as a logical unit. This would also discourage using too much class nesting overall.

However, from a practical standpoint, it might be inconvenient for the user to have complexities counted twice or even more times in case of deeper nesting, e.g. A::B::C.

@mcserep @intjftw What's your opinion?

@mcserep
Copy link
Collaborator Author

mcserep commented Mar 3, 2024

@Seeker04 As far as I remember we counted "nested functions" (lamdas) towards a function's McCabe metric in #689, since it increases the complexity of that function to understand. (Should be checked.)

With the same argument a nested type increases the complexity of the container type, therefore I would count it in.

Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Mar 13, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Mar 19, 2024
@Seeker04
Copy link
Collaborator

Progress

Running the following query on the parsing result of tinyxml2

select astValue, value from (CppAstNodeMetrics m join CppAstNode n on m.astNodeId = n.id) where type = 3;

yields the following table:

class DepthTracker                                 | 3.0
class DynArray                                     | 13.0
class DynArray                                     | 14.0
class DynArray                                     | 14.0
class DynArray                                     | 14.0
class DynArray                                     | 14.0
class DynArray                                     | 14.0
class DynArray                                     | 18.0
class DynArray                                     | 19.0
class DynArray                                     | 37.0
class MemPool                                      | 4.0
class MemPoolT : public MemPool                    | 13.0
class MemPoolT : public MemPool                    | 13.0
class MemPoolT : public MemPool                    | 13.0
class MemPoolT : public MemPool                    | 13.0
class MemPoolT : public MemPool                    | 16.0
class TINYXML2_LIB StrPair                         | 63.0
class TINYXML2_LIB XMLAttribute                    | 43.0
class TINYXML2_LIB XMLComment : public XMLNode     | 13.0
class TINYXML2_LIB XMLConstHandle                  | 30.0
class TINYXML2_LIB XMLDeclaration : public XMLNode | 13.0
class TINYXML2_LIB XMLDocument : public XMLNode    | 132.0
class TINYXML2_LIB XMLElement : public XMLNode     | 169.0
class TINYXML2_LIB XMLHandle                       | 30.0
class TINYXML2_LIB XMLNode                         | 140.0
class TINYXML2_LIB XMLPrinter : public XMLVisitor  | 86.0
class TINYXML2_LIB XMLText : public XMLNode        | 21.0
class TINYXML2_LIB XMLUnknown : public XMLNode     | 13.0
class TINYXML2_LIB XMLUtil                         | 110.0
class TINYXML2_LIB XMLVisitor                      | 12.0
struct Block                                       | 4.0
struct Block                                       | 4.0
struct Block                                       | 4.0
struct Block                                       | 4.0
struct Entity                                      | 1.0
struct TestUtil: XMLVisitor                        | 15.0
union Item                                         | 4.0
union Item                                         | 4.0
union Item                                         | 4.0
union Item                                         | 4.0

Most values are correct, however there are a few problems:

  • Template classes, i.e. DynArray, MemPoolT. Each template instantiation results in a class AST node, so it is understandable that we have duplicates with the current implementation as this has no special handling. However, mccabe values between the instantiations differ which is odd, because from the definition of these specific class templates, I don't see how different instantiations could have different mccabe values...
  • Block, Item and Entity are all classes without any methods so they shouldn't appear among these metrics. Block and Item are under template classes (that's also the reason they are all duplicated several times), however Entity is not, so this issue is not exclusive to templates.
    Code of Entity:
struct Entity {
    const char* pattern;
    int length;
    char value;
};
  • Handling of nested classes is not yet implemented.

Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Mar 22, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Mar 22, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Mar 22, 2024
@Seeker04
Copy link
Collaborator

Seeker04 commented Apr 6, 2024

Note: In order to use valgrind, I needed to run ulimit -n 1024, because the allowed number of open file descriptors was ridiculously high in the docker image by default: 1073741816

This solved the valgrind error:

valgrind: m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed.
Segmentation fault (core dumped)

The image was built using the Docker guide.

Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Apr 7, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Apr 7, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Apr 7, 2024
@Seeker04
Copy link
Collaborator

Seeker04 commented Apr 7, 2024

The failed assertion occurs because of this:

If the query executed using query_one() or query_value() returns more than one element, then these functions fail with an assertion.

Source

@Seeker04
Copy link
Collaborator

In the current state the following query on tinyxml2:

select astValue, value from (CppAstNodeMetrics m join CppAstNode n on m.astNodeId = n.id) where type = 3;

yields:

class DepthTracker {                                 | 2
class DynArray {                                     | 13
class DynArray {                                     | 14
class DynArray {                                     | 14
class DynArray {                                     | 14
class DynArray {                                     | 14
class DynArray {                                     | 14
class DynArray {                                     | 18
class DynArray {                                     | 19
class DynArray {                                     | 37
class MemPool {                                      | 2
class MemPoolT : public MemPool {                    | 13
class MemPoolT : public MemPool {                    | 13
class MemPoolT : public MemPool {                    | 13
class MemPoolT : public MemPool {                    | 13
class MemPoolT : public MemPool {                    | 16
class TINYXML2_LIB StrPair {                         | 63
class TINYXML2_LIB XMLAttribute {                    | 43
class TINYXML2_LIB XMLComment : public XMLNode {     | 13
class TINYXML2_LIB XMLConstHandle {                  | 29
class TINYXML2_LIB XMLDeclaration : public XMLNode { | 13
class TINYXML2_LIB XMLDocument : public XMLNode {    | 132
class TINYXML2_LIB XMLElement : public XMLNode {     | 169
class TINYXML2_LIB XMLHandle {                       | 29
class TINYXML2_LIB XMLNode {                         | 140
class TINYXML2_LIB XMLPrinter : public XMLVisitor {  | 86
class TINYXML2_LIB XMLText : public XMLNode {        | 21
class TINYXML2_LIB XMLUnknown : public XMLNode {     | 13
class TINYXML2_LIB XMLUtil {                         | 110
class TINYXML2_LIB XMLVisitor {                      | 9
struct TestUtil: XMLVisitor {                        | 13

Classes with only implicit methods are no longer present and the metric of the others are also correct now.

Template classes still cause some duplicates. A potential solution to that would be to use getTemplateSpecializationKind in the parser checking the TemplateSpecializationKind. Then a new model::Tag could be added for entities (classes and perhaps functions too) which are implicit template instantiations (TSK_ImplicitInstantiation). Then we could simply filter out any such classes from the summation.

@mcserep
Copy link
Collaborator Author

mcserep commented Apr 10, 2024

As discussed today, first the types of the project should be considered and only then the methods of those types should be queried. This should be also more efficient.

Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Apr 11, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Apr 13, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue Apr 13, 2024
@Seeker04
Copy link
Collaborator

I did some investigation. We see results like this for template classes:

class MemPoolT : public MemPool { | 13
class MemPoolT : public MemPool { | 13
class MemPoolT : public MemPool { | 13
class MemPoolT : public MemPool { | 13
class MemPoolT : public MemPool { | 16

because some methods are only generated for the original template AST node and not for the template instantiations.

In case of MemPoolT, there are three such methods, hence the 13 and 16 values:

int tinyxml2::MemPoolT::ItemSize() const
int tinyxml2::MemPoolT<104>::ItemSize() const
int tinyxml2::MemPoolT<112>::ItemSize() const
int tinyxml2::MemPoolT<120>::ItemSize() const
int tinyxml2::MemPoolT<80>::ItemSize() const

tinyxml2::MemPoolT::MemPoolT<ITEM_SIZE>()
tinyxml2::MemPoolT<104>::MemPoolT()
tinyxml2::MemPoolT<112>::MemPoolT()
tinyxml2::MemPoolT<120>::MemPoolT()
tinyxml2::MemPoolT<80>::MemPoolT()

tinyxml2::MemPoolT::~MemPoolT<ITEM_SIZE>()
tinyxml2::MemPoolT<104>::~MemPoolT()
tinyxml2::MemPoolT<112>::~MemPoolT()
tinyxml2::MemPoolT<120>::~MemPoolT()
tinyxml2::MemPoolT<80>::~MemPoolT()

void * tinyxml2::MemPoolT::Alloc()
void * tinyxml2::MemPoolT<104>::Alloc()
void * tinyxml2::MemPoolT<112>::Alloc()
void * tinyxml2::MemPoolT<120>::Alloc()
void * tinyxml2::MemPoolT<80>::Alloc()

void tinyxml2::MemPoolT::Clear()
void tinyxml2::MemPoolT<104>::Clear()
void tinyxml2::MemPoolT<112>::Clear()
void tinyxml2::MemPoolT<120>::Clear()
void tinyxml2::MemPoolT<80>::Clear()

void tinyxml2::MemPoolT::Free(void *)
void tinyxml2::MemPoolT<104>::Free(void *)
void tinyxml2::MemPoolT<112>::Free(void *)
void tinyxml2::MemPoolT<120>::Free(void *)
void tinyxml2::MemPoolT<80>::Free(void *)

void tinyxml2::MemPoolT::SetTracked()
void tinyxml2::MemPoolT<104>::SetTracked()
void tinyxml2::MemPoolT<112>::SetTracked()
void tinyxml2::MemPoolT<120>::SetTracked()
void tinyxml2::MemPoolT<80>::SetTracked()

void tinyxml2::MemPoolT::Trace(const char *)

int tinyxml2::MemPoolT::CurrentAllocs() const

int tinyxml2::MemPoolT::Untracked() const

My first hunch was that these are the ones that does not depend on the template argument, but that's not true, because Trace() uses the template argument, ITEM_SIZE.

It looks like the method AST nodes for the template instantiations are only generated on demand, i.e., if they are used in the translation unit. Trace(), CurrentAllocs() and Untracked() are not used anywhere in tinyxml2 (only in debug builds).

This would also explain the even more diverse set of values for DynArray: 13, 14, 14, 14, 14, 14, 18, 19, 37, where 37 is the correct value. Different template instantiations might use a different set of methods which would require the compiler to generate a different number of method AST nodes for certain instantiations.

This is covered in [temp.inst] in the C++ standard (In C++11, this is §14.7.1/10. In C++14, it is §14.7.1/11, and in C++17 it is §17.7.1/9. Excerpt from C++17 below) - source

An implementation shall not implicitly instantiate a function template, a variable template, a member template, a non-virtual member function, a member class, a static data member of a class template, or a substatement of a constexpr if statement (9.4.1), unless such instantiation is required

The "shall not" part suggests that this is not enforced and is probably compiler dependant. Simpler/older compilers might generate all methods, but g++ 11.4.0 does not.

I still think, the best way to remove these duplicates would be to somehow include information about a record being a template class or template instantiation in the database (by tagging it, for example). Because there is no way to tell the templates and template instantiations apart by observing the current database (i.e., CppAstNode.astValue, Entity.name, Entity.qualifiedName, none of these include the template argument list).

If we could distinguish between the template AST node and its instantiations, then we could simply only include the former in the type McCabe metrics.

@mcserep
Copy link
Collaborator Author

mcserep commented Apr 15, 2024

The "shall not" part suggests that this is not enforced and is probably compiler dependant. Simpler/older compilers might generate all methods, but g++ 11.4.0 does not.

In programming language standards "shall" is the synonym for "must" and "shall not" is a synonym for "must not". "Should" and "should not" has the meaning of optionality.
(See e.g. ISO TR 10176, Information technology - Guidelines for the preparation of programming language standards for reference.

So it is not compiler dependent (and never was).

I still think, the best way to remove these duplicates would be to somehow include information about a record being a template class or template instantiation in the database (by tagging it, for example).

Let's proceed with that approach then.

Is it valid though that the complexity numbers are so different for these template method instantiations?

@Seeker04
Copy link
Collaborator

Yes, it is valid. The three methods which are not used by the template instantiations of MemPoolT are the reason for the 13 values opposed to the 16 which appears for the AST node of the original template.

I implemented the logic that marks and then skips template instantiations - other than explicit specializations - during the metric calculation. It's in the latest commit: 0bf929a

Now, only the proper values appear for the template classes DynArray and MemPoolT. Also, inner classes of class templates such as Block and Item appear only once in the new database as another result.

image

@Seeker04
Copy link
Collaborator

There are still duplicate entities in xerces-c for the following query:

// Lookup the definition (different AST node if not defined in class body)
const auto methodDef = _ctx.db->query_one<AstNode>(
  odb::query<AstNode>::entityHash == methodAstNode->entityHash &&
  odb::query<AstNode>::symbolType == AstNode::SymbolType::Function &&
  odb::query<AstNode>::astType == AstNode::AstType::Definition);

These duplicates can be extracted with the query:

select astValue, count(*) from CppAstNode
  where symbolType = 1 and astType = 3
  group by entityHash having count(*) > 1
  order by astValue;

which yields the following result:

StrX::StrX(const StrX &)                                                       | 20
StrX::StrX(const XMLCh *const)                                                 | 21
StrX::~StrX()                                                                  | 21
XStr::XStr(const XStr &)                                                       | 4
XStr::XStr(const char *const)                                                  | 4
XStr::~XStr()                                                                  | 4
const XMLCh * XStr::unicodeForm() const                                        | 4
const char * StrX::localForm() const                                           | 21
decltype(~std::forward<_Tp>(__t)) std::bit_not<void>::operator()(_Tp &&) const | 2
int main()                                                                     | 5
int main(int, char **)                                                         | 26
std::ostream & operator<<(std::ostream &, const StrX &)                        | 19
void tassert(_Bool, const char *, int)                                         | 3
void usage()                                                                   | 10

This is not a parsing defect, these functions indeed have the displayed number of definitions in the project. This happens, because they are compiled to different binaries, so ODR is not violated. They go to several test binaries, see tests/CMakeLists.txt.

An interesting observation is that the usages() function has 10 static and 10 non-static definitions and only the non-static ones appear as duplicates in the query. This means that the static functions have their translation unit mangled to their hash, however non-static ones and member functions do not. The hash is generated in getUSR() using generateUSRForDecl. Function bodies don't seem to affect the USR, because the main() functions, for example, have different bodies, but still get the same hash.

On the CodeCompass GUI, if I select a node for usages(), I get all 10 definitions:
image
Moreover, right clicking on a reference and hitting "Jump to definition" lists out all of them:
image

This means that the conventional queries are also not smart enough to find the definition that belongs to the corresponding translation unit. I suspect, this is not possible with the current database. CodeCompass does not handle cases where symbols of the same hash (e.g. same signature) get compiled to different binaries.

Either the hash had to be extended (by adding a mangled translation unit name perhaps?), or we would need some new field to store this information in CppAstNode.

@mcserep @intjftw
We should decide whether this problem should belong to the scope of this issue.

If yes, we should decide how to proceed.

If not, then I can replace the query_one call with query and take the first result (if any). This would introduce a potential inaccuracy in this metrics, but it's arguably negligible.

In xerces-c, 14 of the 15886 function definition nodes have such duplicate entities. This is a 0.08% ratio, and it also includes functions that are not methods (like main()) which mean no problem for us here. Also, most duplicates have the same definition and hence the same McCabe value.
Of course, it is a bit arbitrary to only consider xerces-c. Other projects could potentially be affected more by this issue (though I doubt, this would be too common).

@whisperity
Copy link
Member

whisperity commented Apr 27, 2024

@Seeker04 See #198 about the issue of project-level but not ODR-level (program-level) duplications. That patch stalled, and I highly suspect the main reason is that it was developed in a world prior to incremental analysis (#266) making its way into CodeCompass, and as it stands, #198 is highly incompatible with the latter development.

@mcserep mcserep added this to the Upcoming Release milestone Apr 29, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 1, 2024
@Seeker04
Copy link
Collaborator

Seeker04 commented May 1, 2024

@whisperity Thank you, this is indeed the information we would need to make this metric more accurate.

@mcserep @intjftw I pushed a workaround that considers the first function definition found in these cases (and also rebased my branch). Now there are no assert failed errors, all CI checks passed.
Currently, I don't see a better solution without the linkage information @whisperity mentioned and linked above, so I'm removing the draft status from the PR and adding you as reviewers.

Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Seeker04 pushed a commit to Seeker04/CodeCompass that referenced this issue May 10, 2024
Roadmap automation moved this from To do to Done Jun 17, 2024
Lab automation moved this from To do to Done Jun 17, 2024
mcserep added a commit that referenced this issue Jun 17, 2024
Co-authored-by: Seeker04 <zahoranb@proton.me>
Co-authored-by: Máté Cserép <mcserep@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Kind: Enhancement 🌟 Level: Moderate (2) Plugin: C++ Issues related to the parsing and presentation of C++ projects. Plugin: Metrics Issues related to the code metrics plugin.
Projects
Lab
Done
Development

Successfully merging a pull request may close this issue.

4 participants