# Phantom variables/functions in XML, created from non-code files (Origin: bugzilla #760836) #5933

Closed
opened this Issue Jul 2, 2018 · 0 comments

Projects
None yet
1 participant
Owner

### doxygen commented Jul 2, 2018

status RESOLVED severity major in component documentation for ---
Reported in version 1.8.12-GIT on platform Other
Assigned to: Dimitri van Heesch

Original attachment names and IDs:

On 2016-01-19 13:53:29 +0000, erez.oxman@gmail.com wrote:

Checked in version 1.8.12-GIT ()

The xml output memberdefs of kind function/variable that are non code.

For example, the following snippet from all.xml was taken while trying to create xml output for package - https://github.com/waylan/Python-Markdown/releases/tag/2.6.5-final

  <memberdef kind="function" id="extra_8txt_1afc80a6c723fc08b438df6fb016fc461e" prot="public" static="no" const="no" explicit="no" inline="no" virt="non-virtual">
<type>title There may</type>
<definition>title There may be</definition>
<argsstring>(index.html) that are distributed with Python-Markdown that are not included here in Extra.The features of those extensions are not part of PHP Markdown Extra</argsstring>
<name>be</name>
<param>
<type>index.</type>
<declname>html</declname>
</param>
<briefdescription>
</briefdescription>
<detaileddescription>
</detaileddescription>
<inbodydescription>
</inbodydescription>
<location file="Python-Markdown-2.6.5-final/docs/extensions/extra.txt" line="37" column="1"/>
</memberdef>
<memberdef kind="function" id="extra_8txt_1aee61e6bce6c1e5b8c0802804ce9a805e" prot="public" static="no" const="no" explicit="no" inline="no" virt="non-virtual">
<type>title There may and not part of Python Markdown Extra If you really would like Extra to include additional we suggest creating your own clone of Extra under a different</type>
<definition>title There may and not part of Python Markdown Extra If you really would like Extra to include additional we suggest creating your own clone of Extra under a different name</definition>
<argsstring>(see the[Extension API](api.html)).Markdown Inside HTML Blocks---------------------------Unlike the other Extra features</argsstring>
<name>name</name>
<param>
<type>see </type>
<declname>the</declname>
<array>(api.html)[Extension API]</array>
</param>
<briefdescription>
</briefdescription>
<detaileddescription>
</detaileddescription>
<inbodydescription>
</inbodydescription>
<location file="Python-Markdown-2.6.5-final/docs/extensions/extra.txt" line="42" column="1"/>
</memberdef>


This bug does not occur in a previous version of Doxygen, however I'm unsure exactly which version I'm using (it's written 1.8.11 in the VERSION & in xml, but haven't updated in around 6 months, before the 1.8.11 release date 30-12-2015)

From the looks of it, doxygen crawls and autodocuments .txt, as appears in Python-Markdown-2.6.5-final/docs/extensions/extra.txt

If you need any more data or examples, please let me know. Thx.

On 2016-01-19 17:50:38 +0000, erez.oxman@gmail.com wrote:

I checked previous versions:

The bug was introduced after Aug 23, 2015.

The version of Aug 23, 2015 (https://github.com/doxygen/doxygen/tree/SHA: 663544c) doesn't seem to have this bug.

Hope this helps

On 2016-01-24 19:06:00 +0000, albert wrote:

Which steps did you take to get the xml output on a reduced input and signal in which output file the problem occurs.

Please attach the self-contained example (source+config file in a tar or zip) that allows us to reproduce the problem?

On 2016-01-24 19:33:29 +0000, erez.oxman@gmail.com wrote:

The config I'm using is the following:

# Doxyfile 1.8.8

#---------------------------------------------------------------------------

# Project related configuration options

#---------------------------------------------------------------------------
DOXYFILE_ENCODING = UTF-8
PROJECT_NAME = "My Project"
PROJECT_NUMBER =
PROJECT_BRIEF =
PROJECT_LOGO =
OUTPUT_DIRECTORY = pakgdoxy
CREATE_SUBDIRS = NO
ALLOW_UNICODE_NAMES = NO
OUTPUT_LANGUAGE = English
BRIEF_MEMBER_DESC = YES
REPEAT_BRIEF = YES
ABBREVIATE_BRIEF =
ALWAYS_DETAILED_SEC = NO
INLINE_INHERITED_MEMB = NO
FULL_PATH_NAMES = YES
STRIP_FROM_PATH =
STRIP_FROM_INC_PATH =
SHORT_NAMES = NO
QT_AUTOBRIEF = NO
MULTILINE_CPP_IS_BRIEF = NO
INHERIT_DOCS = YES
SEPARATE_MEMBER_PAGES = NO
TAB_SIZE = 4
ALIASES =
TCL_SUBST =
OPTIMIZE_OUTPUT_FOR_C = NO
OPTIMIZE_OUTPUT_JAVA = YES
OPTIMIZE_FOR_FORTRAN = NO
OPTIMIZE_OUTPUT_VHDL = NO
EXTENSION_MAPPING =
MARKDOWN_SUPPORT = YES
BUILTIN_STL_SUPPORT = NO
CPP_CLI_SUPPORT = NO
SIP_SUPPORT = NO
IDL_PROPERTY_SUPPORT = YES
DISTRIBUTE_GROUP_DOC = NO
SUBGROUPING = YES
INLINE_GROUPED_CLASSES = NO
INLINE_SIMPLE_STRUCTS = NO
TYPEDEF_HIDES_STRUCT = NO
LOOKUP_CACHE_SIZE = 0

#---------------------------------------------------------------------------

# Build related configuration options

#---------------------------------------------------------------------------
EXTRACT_ALL = YES
EXTRACT_PRIVATE = YES
EXTRACT_PACKAGE = YES
EXTRACT_STATIC = YES
EXTRACT_LOCAL_CLASSES = YES
EXTRACT_LOCAL_METHODS = NO
EXTRACT_ANON_NSPACES = NO
HIDE_UNDOC_MEMBERS = NO
HIDE_UNDOC_CLASSES = NO
HIDE_FRIEND_COMPOUNDS = NO
HIDE_IN_BODY_DOCS = NO
INTERNAL_DOCS = NO
CASE_SENSE_NAMES = YES
HIDE_SCOPE_NAMES = NO
SHOW_INCLUDE_FILES = YES
SHOW_GROUPED_MEMB_INC = NO
FORCE_LOCAL_INCLUDES = NO
INLINE_INFO = YES
SORT_MEMBER_DOCS = YES
SORT_BRIEF_DOCS = NO
SORT_MEMBERS_CTORS_1ST = NO
SORT_GROUP_NAMES = NO
SORT_BY_SCOPE_NAME = NO
STRICT_PROTO_MATCHING = NO
GENERATE_TODOLIST = YES
GENERATE_TESTLIST = YES
GENERATE_BUGLIST = YES
GENERATE_DEPRECATEDLIST= YES
ENABLED_SECTIONS =
MAX_INITIALIZER_LINES = 30
SHOW_USED_FILES = YES
SHOW_FILES = YES
FILE_VERSION_FILTER =
LAYOUT_FILE =
CITE_BIB_FILES =

#---------------------------------------------------------------------------

# Configuration options related to warning and progress messages

#---------------------------------------------------------------------------
QUIET = YES
WARNINGS = YES
WARN_IF_UNDOCUMENTED = YES
WARN_IF_DOC_ERROR = YES
WARN_NO_PARAMDOC = NO
WARN_FORMAT = "$file:$line: \$text"
WARN_LOGFILE =

#---------------------------------------------------------------------------

# Configuration options related to the input files

#---------------------------------------------------------------------------

INPUT =
INPUT_ENCODING = UTF-8
FILE_PATTERNS =
RECURSIVE = YES
EXCLUDE =
EXCLUDE_PATTERNS = /tests/ /test/
EXCLUDE_SYMBOLS =
EXAMPLE_PATH =
EXAMPLE_PATTERNS =
EXAMPLE_RECURSIVE = NO
IMAGE_PATH =
INPUT_FILTER =
FILTER_PATTERNS =
FILTER_SOURCE_FILES = NO
FILTER_SOURCE_PATTERNS =
USE_MDFILE_AS_MAINPAGE =

#---------------------------------------------------------------------------

# Configuration options related to source browsing

#---------------------------------------------------------------------------

SOURCE_BROWSER = NO
INLINE_SOURCES = NO
REFERENCED_BY_RELATION = NO
REFERENCES_RELATION = NO
SOURCE_TOOLTIPS = YES
USE_HTAGS = NO

#---------------------------------------------------------------------------

# Configuration options related to the alphabetical class index

#---------------------------------------------------------------------------

ALPHABETICAL_INDEX = YES
COLS_IN_ALPHA_INDEX = 5
IGNORE_PREFIX =

#---------------------------------------------------------------------------

# Configuration options related to the HTML output

#---------------------------------------------------------------------------

GENERATE_HTML = NO
HTML_OUTPUT = html
HTML_FILE_EXTENSION = .html
HTML_FOOTER =
HTML_STYLESHEET =
HTML_EXTRA_STYLESHEET =
HTML_EXTRA_FILES =
HTML_COLORSTYLE_HUE = 220
HTML_COLORSTYLE_SAT = 100
HTML_COLORSTYLE_GAMMA = 80
HTML_TIMESTAMP = YES
HTML_DYNAMIC_SECTIONS = NO
HTML_INDEX_NUM_ENTRIES = 100
GENERATE_DOCSET = NO
DOCSET_FEEDNAME = "Doxygen generated docs"
DOCSET_BUNDLE_ID = org.doxygen.Project
DOCSET_PUBLISHER_ID = org.doxygen.Publisher
DOCSET_PUBLISHER_NAME = Publisher
GENERATE_HTMLHELP = NO
CHM_FILE =
HHC_LOCATION =
GENERATE_CHI = NO
CHM_INDEX_ENCODING =
BINARY_TOC = NO
TOC_EXPAND = NO
GENERATE_QHP = NO
QCH_FILE =
QHP_NAMESPACE = org.doxygen.Project
QHP_VIRTUAL_FOLDER = doc
QHP_CUST_FILTER_NAME =
QHP_CUST_FILTER_ATTRS =
QHP_SECT_FILTER_ATTRS =
QHG_LOCATION =
GENERATE_ECLIPSEHELP = NO
ECLIPSE_DOC_ID = org.doxygen.Project
DISABLE_INDEX = NO
GENERATE_TREEVIEW = NO
ENUM_VALUES_PER_LINE = 4
TREEVIEW_WIDTH = 250
FORMULA_FONTSIZE = 10
FORMULA_TRANSPARENT = YES
USE_MATHJAX = NO
MATHJAX_FORMAT = HTML-CSS
MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest
MATHJAX_EXTENSIONS =
MATHJAX_CODEFILE =
SEARCHENGINE = YES
SERVER_BASED_SEARCH = NO
EXTERNAL_SEARCH = NO
SEARCHENGINE_URL =
SEARCHDATA_FILE = searchdata.xml
EXTERNAL_SEARCH_ID =
EXTRA_SEARCH_MAPPINGS =

#---------------------------------------------------------------------------

# Configuration options related to the LaTeX output

#---------------------------------------------------------------------------

GENERATE_LATEX = NO
LATEX_OUTPUT = latex
LATEX_CMD_NAME = latex
MAKEINDEX_CMD_NAME = makeindex
COMPACT_LATEX = NO
PAPER_TYPE = a4
EXTRA_PACKAGES =
LATEX_FOOTER =
LATEX_EXTRA_FILES =
USE_PDFLATEX = YES
LATEX_BATCHMODE = NO
LATEX_HIDE_INDICES = NO
LATEX_SOURCE_CODE = NO
LATEX_BIB_STYLE = plain

#---------------------------------------------------------------------------

# Configuration options related to the RTF output

#---------------------------------------------------------------------------

GENERATE_RTF = NO
RTF_OUTPUT = rtf
COMPACT_RTF = NO
RTF_STYLESHEET_FILE =
RTF_EXTENSIONS_FILE =

#---------------------------------------------------------------------------

# Configuration options related to the man page output

#---------------------------------------------------------------------------

GENERATE_MAN = NO
MAN_OUTPUT = man
MAN_EXTENSION = .3
MAN_SUBDIR =

#---------------------------------------------------------------------------

# Configuration options related to the XML output

#---------------------------------------------------------------------------

GENERATE_XML = YES
XML_OUTPUT = xml
XML_PROGRAMLISTING = NO

#---------------------------------------------------------------------------

# Configuration options related to the DOCBOOK output

#---------------------------------------------------------------------------

GENERATE_DOCBOOK = NO
DOCBOOK_OUTPUT = docbook
DOCBOOK_PROGRAMLISTING = NO

#---------------------------------------------------------------------------

# Configuration options for the AutoGen Definitions output

#---------------------------------------------------------------------------

GENERATE_AUTOGEN_DEF = NO

#---------------------------------------------------------------------------

# Configuration options related to the Perl module output

#---------------------------------------------------------------------------

GENERATE_PERLMOD = NO
PERLMOD_LATEX = NO
PERLMOD_PRETTY = YES
PERLMOD_MAKEVAR_PREFIX =

#---------------------------------------------------------------------------

# Configuration options related to the preprocessor

#---------------------------------------------------------------------------

ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = NO
EXPAND_ONLY_PREDEF = NO
SEARCH_INCLUDES = YES
INCLUDE_PATH =
INCLUDE_FILE_PATTERNS =
PREDEFINED =
EXPAND_AS_DEFINED =
SKIP_FUNCTION_MACROS = YES

#---------------------------------------------------------------------------

# Configuration options related to external references

#---------------------------------------------------------------------------

TAGFILES =
GENERATE_TAGFILE =
ALLEXTERNALS = NO
EXTERNAL_GROUPS = YES
EXTERNAL_PAGES = YES
PERL_PATH = /usr/bin/perl

#---------------------------------------------------------------------------

# Configuration options related to the dot tool

#---------------------------------------------------------------------------

CLASS_DIAGRAMS = YES
MSCGEN_PATH =
DIA_PATH =
HIDE_UNDOC_RELATIONS = YES
HAVE_DOT = NO
DOT_FONTNAME = Helvetica
DOT_FONTSIZE = 10
DOT_FONTPATH =
CLASS_GRAPH = YES
COLLABORATION_GRAPH = YES
GROUP_GRAPHS = YES
UML_LOOK = NO
UML_LIMIT_NUM_FIELDS = 10
TEMPLATE_RELATIONS = NO
INCLUDE_GRAPH = YES
INCLUDED_BY_GRAPH = YES
CALL_GRAPH = NO
CALLER_GRAPH = NO
GRAPHICAL_HIERARCHY = YES
DIRECTORY_GRAPH = YES
DOT_IMAGE_FORMAT = png
INTERACTIVE_SVG = NO
DOT_PATH =
DOTFILE_DIRS =
MSCFILE_DIRS =
DIAFILE_DIRS =
PLANTUML_JAR_PATH =
DOT_GRAPH_MAX_NODES = 50
MAX_DOT_GRAPH_DEPTH = 0
DOT_TRANSPARENT = NO
DOT_MULTI_TARGETS = NO
GENERATE_LEGEND = YES
DOT_CLEANUP = YES

The xml example I attached is just a snippet of the all.xml.

The zip can be found in https://github.com/waylan/Python-Markdown/releases/tag/2.6.5-final

Tell me if you need anything else. I can attach the full all.xml if it's needed.

I'm constantly monitoring the xml output on various open-source packages. This bug occurred in multiple packages, not only this specific package. And again, it was added after Aug 23,2015.

Hope this helps, tell me if there's anything more you need.

On 2016-01-27 18:24:14 +0000, albert wrote:

I didn't find the all.xml when directly running doxygen, as far as I know by head this is a result of a "postprocessing" step with xsl etc.
I looked for the text part:
title There may be
(index.html) that are distributed with Python-Markdown that are not included here in Extra.The features of those extensions are not part of PHP Markdown Extra

Which I found in the 1.8.11 version in extra_8txt.xml. This file was not present in the 1.8.10 version.

In version 1.8.10 the txt files were not processed by default, in the 1.8.11 version they are. When enabling the txt files in in the 1.8.10 version the file this extra_8txt.xml will appear too and has the same content (except for the version number) as in the 1.8.11 version.

You might want to disable the *.txt (and *.md) files or even, in this case, the complete docs directory.

On 2016-01-27 19:13:15 +0000, erez.oxman@gmail.com wrote:

Why have txt files been enabled since 1.8.10? What was the feature request?

I think it's a bad solution that by default, non code files will be recognized as proper functions in the xml, with argsstrings, params, code body etc.

As a user, for me, doxygen's strength is to distinguish between actual code and other stuff, and by default it should read only comments of the actual code.

Isn't it better that by default doxygen only parses a specific set of extensions known as code like .h .cpp etc?

Moreover, i think it breaks doxygen, because the list of txt extensions that a user can have is endless there's no way to exclude all of them. As the bug title says, a user that runs doxygen on a given repository will have a high chance of having phantom functions as a result. On the other hand, the list of code files is finite.

If the feature request that asked for parsing txt file is important, an easy fix would maybe be to introduce an additional flag in the cfg additional non-code file extensions that doxygen will parse.

Hope my comment is helpful, if I can provide any more feedback, please let me know.

On 2016-01-28 19:03:54 +0000, albert wrote:

Regarding my statement:
"In version 1.8.10 the txt files were not processed by default, in the 1.8.11 version they are. When enabling the txt files in in the 1.8.10 version the file this extra_8txt.xml will appear too and has the same content (except for the version number) as in the 1.8.11 version." in Comment 4 I now think it is not true as I didn't find any reference yet why it is present now and was not present in the past. Looks like txt file are processed as C files (historical reason, now still present due to compatibility, maybe setting the txt files to md files in the EXTENSION_MAPPING might help.

I think that in principle txt files are still useful as they might contain some background information (like in md, dox files etc).

On 2016-01-28 20:43:11 +0000, erez.oxman@gmail.com wrote:

Hi,

I don't think it's there for historical reasons - it's new. It did not occur in the Aug 25, 2015 version in Github (https://github.com/doxygen/doxygen/tree/SHA: 663544c), with the attached config file, as described in comment2.

I've looked at the git commit messages since then and couldn't find any message that directly relates to why it was changed. I'm not intimate with Doxygen source code to git blame the specific line that defines which extensions are read by default.

I can do more tests and find the specific commit that changed it.

Think of users using doxygen out of the box on a given GitHub repository, containing source files and other non-source files. If Doxygen will create functions/namespaces/classes that are simply not there, they will not give Doxygen a second chance, and that would be a shame. The problem will get bigger as the number of non-source files increases. I personally don't use the HTML output - will these "phantom" functions/classes/namespaces be created there as well?

If you look at any GitHub repository, I agree with you that there is useful information in these files, but I think the easiest way to view them is to directly look at the files and directories of the repository. A simple link to the repository will give you all that info. I don't think Doxygen should crawl these files.

One question just to be clear - The new default will make it read .txt & .md extensions only, or everything? What about .rst .doc .docx .xls .xml .html etc...

On 2016-01-30 17:41:09 +0000, albert wrote:

Did some further research and found pull request 383 (#383) from August 16 2015 and incorporated on August 31 2015 with comment:

Make list of default extensions consistent with language mapping list

In the forum a discrepancy was noted between config.l and config.xml (http://doxygen.10944.n7.nabble.com/FILE-PATTERNS-one-custom-plus-all-defaults-td7308.html). The config.xml list was based on the function initDefaultExtensionMapping in util.cpp and the list in the function Config::check in config.l. Now both routines use the same list.

One of the file extensions added by default was 'txt' and this is by default interpreted as C code. In the past it was possible to specify 'txt' files as well and they would be interpreted as C code (main usage will probably have been to give some text and place this between C comment signs so the default interpreter could be used). The 'txt' files mentioned in this bug report are more markdown style and using an EXTENSION_MAPPING would be beneficial.

Looking at the config.xml file this file has not been updated and thus there is now and inconsistency between the documentation and the implementation.
This inconsistency should be removed.

On 2016-01-30 17:49:50 +0000, albert wrote:

List of supported extensions and the parser that is used:
".dox", "c"
".txt", "c"
".doc", "c"
".c", "c"
".C", "c"
".cc", "c"
".CC", "c"
".cxx", "c"
".cpp", "c"
".c++", "c"
".ii", "c"
".ixx", "c"
".ipp", "c"
".i++", "c"
".inl", "c"
".h", "c"
".H", "c"
".hh", "c"
".HH", "c"
".hxx", "c"
".hpp", "c"
".h++", "c"
".idl", "idl"
".ddl", "idl"
".odl", "idl"
".java", "java"
".as", "javascript"
".js", "javascript"
".cs", "csharp"
".d", "d"
".php", "php"
".php4", "php"
".php5", "php"
".inc", "php"
".phtml", "php"
".m", "objective-c"
".M", "objective-c"
".mm", "c"
".py", "python"
".pyw", "python"
".f", "fortran"
".for", "fortran"
".f90", "fortran"
".vhd", "vhdl"
".vhdl", "vhdl"
".tcl", "tcl"
".ucf", "vhdl"
".qsf", "vhdl"
".md", "md"
".markdown", "md"

On 2016-01-30 22:06:47 +0000, erez.oxman@gmail.com wrote:

Thank you for all your hard work.

I can verify that:

1. The said pull request was the cause of the mentioned bug
2. changing the EXTENSION_MAPPING to txt=md solved it.

If I may, I will suggest changing the default away from txt=c, and not only changing the documentation. I think it will fit better most GitHub repositories who tend to have many txt files, especially since doxygen is used these days for other languages than c.

I also think that for the repository used to report this bug, the new default creates a bug, because classes that are not there are created. I also checked the html output - please look at the html created for class "so" in "classes" tab, "class list" tab which simply does not exist in the code. Same If you look on "classes" tab, "class hierarchy", class "so" seems to inherit from classes that do not exist.

I think that currently the best default would be txt=md or don't parse txt files (which was the default behavior before Aug 31 2015) - both options are programming language agnostic.

On 2016-01-30 22:12:59 +0000, erez.oxman@gmail.com wrote:

Created attachment 320097
comment10 html screenshot

On 2016-01-31 13:05:48 +0000, Dimitri van Heesch wrote:

I'll remove parsing .txt files by default like before. Then the user can configure doxygen to do this using the "c" or "markdown" parser if needed, using FILE_PATTERNS and EXTENSION_MAPPING.

On 2016-02-02 06:52:26 +0000, erez.oxman@gmail.com wrote:

Thank you,

On the same topic, what about the extensions ".dox" and ".doc"? The same logic for ".txt" should hold.

If I understand correctly from http://doxygen.10944.n7.nabble.com/FILE-PATTERNS-one-custom-plus-all-defaults-td7308.html#a7310 ,the default was that ".dox" was parsed, and ".doc" was not. IMHO, leaving ".dox" for backward compatibility reasons alone is not a good idea.

On 2016-09-05 13:46:04 +0000, Dimitri van Heesch wrote:

This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.12. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information
that you think can be relevant (preferably in the form of a self-contained example).