Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cpp] Memory leaks in the C++ runtime #4309

Open
ghost opened this issue Jun 10, 2023 · 6 comments
Open

[Cpp] Memory leaks in the C++ runtime #4309

ghost opened this issue Jun 10, 2023 · 6 comments

Comments

@ghost
Copy link

ghost commented Jun 10, 2023

I'm developing a compiler for my language on Visual Studio 2022 with ANTLR 4.13.0 (Flex & Bison previously) and the CRT reports memory leaks after the compiler exited without allocation source information.

Partial outputs:

Detected memory leaks!
Dumping objects ->
{11750} normal block at 0x0000000000436E90, 128 bytes long.
 Data: <  C       C     > 90 04 43 00 00 00 00 00 90 04 43 00 00 00 00 00 
{11749} normal block at 0x00000000004202B0, 16 bytes long.
 Data: <`iC             > 60 69 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11748} normal block at 0x0000000000430490, 24 bytes long.
 Data: <  C       C     > 90 04 43 00 00 00 00 00 90 04 43 00 00 00 00 00 
{11747} normal block at 0x000000000041FDB0, 16 bytes long.
 Data: <HiC             > 48 69 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11746} normal block at 0x0000000000436F50, 128 bytes long.
 Data: <  C       C     > D0 09 43 00 00 00 00 00 D0 09 43 00 00 00 00 00 
{11745} normal block at 0x000000000041FB30, 16 bytes long.
 Data: < hC             > E8 68 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11744} normal block at 0x00000000004309D0, 24 bytes long.
 Data: <  C       C     > D0 09 43 00 00 00 00 00 D0 09 43 00 00 00 00 00 
{11743} normal block at 0x000000000041F310, 16 bytes long.
 Data: < hC             > D0 68 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11742} normal block at 0x0000000000439B90, 128 bytes long.
 Data: <0 C     0 C     > 30 04 43 00 00 00 00 00 30 04 43 00 00 00 00 00 
{11741} normal block at 0x000000000041FBD0, 16 bytes long.
 Data: <phC             > 70 68 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
(...)

I have tested for each statement to ensure there is no potential memory leaks in my code and I found that memory leaks appear after the lexer initialized:

std::string srcPath;
//
// ...
//
std::ifstream fs(srcPath, std::ios::binary);
antlr4::ANTLRInputStream is(fs);

// Memory leaks appear after this statement
SlakeLexer lexer(&is);

antlr4::CommonTokenStream tokens(&lexer);

SlakeParser parser(&tokens);

It seems like the lexer does not release resources properly during the deallocation, there is also an issue mentioned a similar problem: #4099.

@ghost
Copy link
Author

ghost commented Jun 11, 2023

Source of the lexer:

lexer grammar SlakeLexer;

COMMA: ',';
QUESTION: '?';
COLON: ':';
SEMICOLON: ';';
LBRACKET: '[';
RBRACKET: ']';
LBRACE: '{';
RBRACE: '}';
LPARENTHESE: '(';
RPARENTHESE: ')';
AT: '@';
DOT: '.';
VARARG: '...';

OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';
OP_MOD: '%';
OP_AND: '&';
OP_OR: '|';
OP_XOR: '^';
OP_NOT: '!';
OP_REV: '~';
OP_ASSIGN: '=';
OP_ASSIGN_ADD: '+=';
OP_ASSIGN_SUB: '-=';
OP_ASSIGN_MUL: '*=';
OP_ASSIGN_DIV: '/=';
OP_ASSIGN_MOD: '%=';
OP_ASSIGN_AND: '&=';
OP_ASSIGN_OR: '|=';
OP_ASSIGN_XOR: '^=';
OP_ASSIGN_REV: '~=';
OP_ASSIGN_LSH: '<<=';
OP_ASSIGN_RSH: '>>=';
OP_SWAP: '<=>';

OP_EQ: '==';
OP_NEQ: '!=';
OP_STRICTEQ: '===';
OP_STRICTNEQ: '!==';
OP_LSH: '<<';
OP_RSH: '>>';
OP_LT: '<';
OP_GT: '>';
OP_LTEQ: '<=';
OP_GTEQ: '>=';
OP_LAND: '&&';
OP_LOR: '||';
OP_INC: '++';
OP_DEC: '--';
OP_MATCH: '=>';
OP_WRAP: '->';
OP_SCOPE: '::';
OP_DOLLAR: '$';

KW_ASYNC: 'async';
KW_AWAIT: 'await';
KW_BASE: 'base';
KW_BREAK: 'break';
KW_CASE: 'case';
KW_CATCH: 'catch';
KW_CLASS: 'class';
KW_CONST: 'const';
KW_CONTINUE: 'continue';
KW_DELETE: 'delete';
KW_DEFAULT: 'default';
KW_ELIF: 'elif';
KW_ELSE: 'else';
KW_ENUM: 'enum';
KW_FALSE: 'false';
KW_FN: 'fn';
KW_FOR: 'for';
KW_FINAL: 'final';
KW_FINALLY: 'finally';
KW_IF: 'if';
KW_MODULE: 'module';
KW_NATIVE: 'native';
KW_NEW: 'new';
KW_NULL: 'null';
KW_OVERRIDE: 'override';
KW_OPERATOR: 'operator';
KW_PUB: 'pub';
KW_RETURN: 'return';
KW_STATIC: 'static';
KW_STRUCT: 'struct';
KW_SWITCH: 'switch';
KW_THIS: 'this';
KW_THROW: 'throw';
KW_TIMES: 'times';
KW_TRAIT: 'trait';
KW_TYPEOF: 'typeof';
KW_INTERFACE: 'interface';
KW_TRUE: 'true';
KW_TRY: 'try';
KW_USING: 'using';
KW_VAR: 'var';
KW_WHILE: 'while';
KW_YIELD: 'yield';

TN_I8: 'i8';
TN_I16: 'i16';
TN_I32: 'i32';
TN_I64: 'i64';
TN_ISIZE: 'isize';
TN_U8: 'u8';
TN_U16: 'u16';
TN_U32: 'u32';
TN_U64: 'u64';
TN_USIZE: 'usize';
TN_F32: 'f32';
TN_F64: 'f64';
TN_STRING: 'string';
TN_BOOL: 'bool';
TN_AUTO: 'auto';
TN_VOID: 'void';
TN_ANY: 'any';

L_INT: '0b' [01]+ | '0' [0-9]* | '0x' [0-9]+ | [1-9] [0-9]*;
L_UINT: L_INT [uU];
L_LONG: L_INT [lL];
L_ULONG: L_INT ( [uU][lL] | [lL][uU]);
L_F32: L_F64 [fF];
L_F64: [0-9]+ '.' ([0-9]+)?;
L_STRING: '"' CharSequence? '"';
L_RAWSTRING: '"""' (.)*? '"""';

ID: [a-zA-Z_][a-zA-Z0-9_]*;

fragment CharSequence: Char+;
fragment Char: StringEscape | ~["\\\r\n];
fragment StringEscape: SimpleEscape | OctEscape | HexEscape;

fragment SimpleEscape: '\\' [\\"rnt0];
fragment OctEscape: '\\' OctDigit OctDigit OctDigit;
fragment HexEscape: '\\' HexDigit HexDigit;

fragment OctDigit: [0-7];
fragment HexDigit: [0-9a-fA-F];

WHITESPACE: [ \t\r\n]+ -> skip;
COMMENT_BLK: '/*' .*? '*/' -> skip;
COMMENT_LINE: '//' ~ [\r\n]* -> skip;

and content of the input file:

class Base {
	pub i32 data = 0;
	
	operator new(i32 a) {
		println("Base Constructed");
	}

	operator delete() {
		println("Base Destructed");
	}
}

class Derived(@Base) {
	pub i32 data = 0;
	
	operator new(i32 a) {
		base.new(a * 2);
		println("Derived Constructed");
	}

	operator delete() {
		println("Derived Destructed");
	}

	pub void printMembers() {
		println("Base data: ", base.data);
		println("Derived data: ", data);
	}
}

pub i32 main() {
	@Base a = new @Base(123);

	return ++a.data;
}

(Because the parser does not affect the result, the source was not provided)

@ghost
Copy link
Author

ghost commented Jun 17, 2023

I have located where the problem originates (with the demo in runtime/Cpp/demo).

According to the log (complete log file is here), blocks allocated by codes from following files were not released correctly and cause memory leaks:

runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp(192)
runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp(295)
runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp(536)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(299)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(465)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(531)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(618)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(636)
runtime/Cpp/runtime/src/dfa/DFA.cpp(29)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(179)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(182)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(185)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(188)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(191)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(194)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(197)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(200)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(203)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(206)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(212)
runtime/Cpp/runtime/src/atn/ATNDeserializationOptions.cpp(17)
runtime/Cpp/runtime/src/atn/LexerMoreAction.cpp(16)
runtime/Cpp/runtime/src/atn/LexerSkipAction.cpp(16)
runtime/Cpp/runtime/src/atn/LexerPopModeAction.cpp(16)

Currently, I have no idea about how to fix it.

@jimidle
Copy link
Collaborator

jimidle commented Jun 19, 2023 via email

@ghost
Copy link
Author

ghost commented Aug 6, 2023

I found that the static data of the lexer and parser were not released correctly (DFA caches are also stored here), they will never be released after the allocation in xxxInitialize functions (in generated source files of lexer and parser).

So I tried to use unique_ptr instead of raw pointer for them (by modifying the codegen template) and then most of the leak prompts disappeared.

Detected memory leaks!
Dumping objects ->
C:\Users\Pyxherb\Desktop\antlr4\runtime\Cpp\runtime\src\atn\ATNDeserializationOptions.cpp(17) : {434} normal block at 0x000002891574C2C0, 3 bytes long.
 Data: <   > 00 01 00 
Object dump complete.

Now I think most of the prompts was caused by unreleased static data.

@liu876151990
Copy link

This needs to be modified ! I'd like you to revise and submit. Thanks

ATNDeserializationOptions.cpp

const ATNDeserializationOptions& ATNDeserializationOptions::getDefaultOptions() {
static const ATNDeserializationOptions* const defaultOptions = new ATNDeserializationOptions();
return *defaultOptions;
}

@ghost
Copy link
Author

ghost commented Aug 28, 2023

Fixed memory leaks in ATNDeserializationOptions.

@ghost ghost changed the title [Cpp] Memory leaks in the lexer [Cpp] Memory leaks in the C++ runtime Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants