Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in C parser #347

Closed
wangjia184 opened this issue Mar 14, 2021 · 8 comments
Closed

Memory leak in C parser #347

wangjia184 opened this issue Mar 14, 2021 · 8 comments
Assignees
Labels
bug C lexer Concerning the generated lexer
Milestone

Comments

@wangjia184
Copy link

BNFC version 2.9.1

Here is the code generated by BNFC

/* Entrypoint: parse Proc from string. */
Proc psProc(const char *str)
{
  YY_BUFFER_STATE buf;
  mylang__init_lexer(0);
  buf = mylang__scan_string(str);
  int result = yyparse();
  mylang__delete_buffer(buf);
  if (result)
  { /* Failure */
    return 0;
  }
  else
  { /* Success */
    return YY_RESULT_Proc_;
  }
}

I successfully call this method, and then called free() on the returned pointer.
but valgrind says there is 16KB memory leaked.

Error Leaked 16.1 kiB
Info at malloc
     at mylang_alloc (Lexer.c:2509)
     at mylang__create_buffer (Lexer.c:2047)
     at mylang_restart (Lexer.c:1987)
     at mylang__init_lexer (mylang.l:124)
     at psProc (mylang.y:197)
     at rho_runtime::interpreter::compiler::builder::build_ast (src/interpreter/compiler/builder.rs:15)
     at rho_runtime::interpreter::compiler::test (src/interpreter/compiler/mod.rs:22)
     at rho_runtime::main (src/main.rs:10)
     at core::ops::function::FnOnce::call_once (function.rs:227)
     at std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
     at std::rt::lang_start::{{closure}} (rt.rs:66)
Summary Leaked 16.1 kiB total

The leak is from mylang__create_buffer (yy_create_buffer) line 2047 below.

/** Allocate and initialize an input buffer state.
 * @param file A readable stream.
 * @param size The character buffer size in bytes. When in doubt, use @c YY_BUF_SIZE.
 * 
 * @return the allocated buffer state.
 */
YY_BUFFER_STATE yy_create_buffer  (FILE * file, int  size )
{
	YY_BUFFER_STATE b;
	b = (YY_BUFFER_STATE) yyalloc( sizeof( struct yy_buffer_state )  );  // <--- this line leaks
	if ( ! b )
		YY_FATAL_ERROR( "out of dynamic memory in yy_create_buffer()" );

	b->yy_buf_size = size;
	b->yy_ch_buf = (char *) yyalloc( (yy_size_t) (b->yy_buf_size + 2)  );
	if ( ! b->yy_ch_buf )
		YY_FATAL_ERROR( "out of dynamic memory in yy_create_buffer()" );
	b->yy_is_our_buffer = 1;
	yy_init_buffer( b, file );
	return b;
}

It is strange that psProc() did call mylang__delete_buffer(yy_delete_buffer) .

Is this a bug? Or I used it incorrectly?

@wangjia184
Copy link
Author

I avoid this issue by using the other edition pProc which accepts FILE* as parameter.

Before that, I need convert the C-string into a FILE* by fmemopen(), there is no memory leak any longer.

@andreasabel andreasabel added C info-needed More information is needed from the bug reporter to confirm the issue. labels Mar 18, 2021
@andreasabel
Copy link
Member

I successfully call this method, and then called free() on the returned pointer.

Maybe you didn't mean this literally, but a simple free(p) won't usually do the job, you need to recursively deallocate the subtrees of p, I suppose.

It is strange that psProc() did call mylang__delete_buffer(yy_delete_buffer) .

This seems to be the correct use pattern, see also the definition of SExpression *getAST(const char *expr) at https://en.wikipedia.org/wiki/GNU_Bison.

(Maybe there is a better reference for using the buffer than Wikipedia, if you happen to find a good tutorial, please link it here.)

@wangjia184
Copy link
Author

wangjia184 commented Mar 19, 2021

Maybe you didn't mean this literally, but a simple free(p) won't usually do the job, you need to recursively deallocate the subtrees of p, I suppose.

Yep, I did that. In my initial test, the AST tree only contains a single root node, And it still says 16KB memory leaked.

The real code traverses AST tree and free all of them. source code is here, C parser is called by Rust via FFI, it is the same how we call it from C.
The source code is here: https://github.com/wangjia184/rholang-rust/blob/main/rho-runtime/src/interpreter/compiler/normalize/mod.rs

Let me put it simple

  • If I load the AST tree by psProc(const char *str), valgrind says there is 16KB leaked
  • If I load the AST tree by pProc(FILE* file), valgrind says there is no leak at all.
    The other part of code do not change.

So I convert a char*str to a FILE* file by using fmemopen() to solve it : https://github.com/wangjia184/rholang-rust/blob/main/rho-runtime/src/interpreter/compiler/builder.rs

@andreasabel andreasabel added this to the 2.9.2 milestone Mar 19, 2021
@andreasabel andreasabel added bug lexer Concerning the generated lexer and removed info-needed More information is needed from the bug reporter to confirm the issue. labels Mar 19, 2021
@andreasabel andreasabel self-assigned this Mar 19, 2021
@andreasabel
Copy link
Member

andreasabel commented Mar 19, 2021

Thanks for getting back to me @wangjia184!
I could confirm the leak. (Could be related to https://westes.github.io/flex/manual/Memory-leak-_002d-16386-bytes-allocated-by-malloc_002e.html#Memory-leak-_002d-16386-bytes-allocated-by-malloc_002e.) However, just calling yylex_destroy as in https://www.javaer101.com/en/article/37968820.html did not do the trick.

The problem is that the generated init_lexer (FILE* inp) calls yyrestart(inp) even if inp is null. (See 73b10de#r48434527.) yyrestart allocates a buffer if there is none yet. But then we are not reading from a file but allocate the buffer ourselves with yy_scan_string. This seems to just overwrite the existing buffer structure without freeing it first.

Not calling yyrestart if we are not parsing from a file seems to fix the leak.

@andreasabel
Copy link
Member

andreasabel commented Mar 19, 2021

Note to myself: Here is a modified driver StringCalc for the Calc.cf example that parses from a string:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "Parser.h"
#include "Printer.h"
#include "Absyn.h"

void usage(void) {
  printf("usage: Call with one of the following argument combinations:\n");
  printf("\t--help\t\tDisplay this help message.\n");
  printf("\t(text)\t\tParse text verbosely.\n");
  printf("\t-s (text)\tSilent mode. Parse text silently.\n");
}

int main(int argc, char ** argv)
{
  Exp parse_tree;
  int quiet = 0;
  char *text = NULL;

  if (argc > 1) {
    if (strcmp(argv[1], "-s") == 0) {
      quiet = 1;
      if (argc > 2) {
        text = argv[2];
      }
    } else {
      text = argv[1];
    }
  }

  if (!text) {
      text = "1 + 2 * 3";
      /* usage(); */
      /* exit(1); */
  }

  parse_tree = psExp(text);
  if (parse_tree)
  {
    printf("\nParse Successful!\n");
    if (!quiet) {
      printf("\n[Abstract Syntax]\n");
      printf("%s\n\n", showExp(parse_tree));
      printf("[Linearized Tree]\n");
      printf("%s\n\n", printExp(parse_tree));
    }
    return 0;
  }
  return 1;
}

This is the valgrind instruction to show the leak:

valgrind --leak-check=full --show-leak-kinds=definite ./StringCalc 

TODO: add tests that check for memory leaks in the testsuite.

@andreasabel
Copy link
Member

andreasabel commented Mar 19, 2021

@wangjia184: The release of 2.9.2 will probably be in a few months only, so if you rely on this fix, please install the master branch.
Thanks for reporting!

Upstream issue (flex): westes/flex#479

@andreasabel
Copy link
Member

There are still some space leaks, e.g. in tests

  • 222_IntegerList
  • 278_Keywords

@andreasabel andreasabel reopened this Apr 2, 2021
andreasabel added a commit that referenced this issue Apr 2, 2021
Also: build Skeleton.o which currently fails for 266_define.

The valgrind check seems to randomly fail when running the testsuite,
but no definite leaks are detected when running a test case manually.
Thus, I leave this test off, as it is not reliable.
@andreasabel
Copy link
Member

Well, dunno, the memory leaks show only (and non-deterministically) when running the testsuite, not when running the tests individually.
So, it is not clear whether these are leaks indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug C lexer Concerning the generated lexer
Projects
None yet
Development

No branches or pull requests

2 participants