Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current version of Raptor apparently causes Soprano to segfault: possible to fix? #66

Open
barracuda156 opened this issue Apr 12, 2024 · 13 comments

Comments

@barracuda156
Copy link

@dajobe Sorry to disturb with this, this may not be a Raptor bug, however it seems at least that Raptor somehow causes this.

We are unable to build KDE4 libs now, since soprano segfaults due to unclear reason, but logs point at Raptor. Notably, the failure occurs on different macOS versions and different archs (apparently it works nowhere in fact).
Discussion is here: https://trac.macports.org/ticket/68452

This is what I see in a crash log on a PowerPC:

Process:         onto2vocabularyclass [73240]
Path:            /opt/local/bin/onto2vocabularyclass
Identifier:      onto2vocabularyclass
Version:         ??? (???)
Code Type:       PPC (Native)
Parent Process:  sh [73239]

Date/Time:       2024-04-12 16:26:02.770 +0800
OS Version:      Mac OS X 10.6 (10A190)
Report Version:  6

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00000000c1012beb
Crashed Thread:  0

Thread 0 Crashed:
0   libraptor2.0.dylib            	0x0435746c raptor_free_uri + 48
1   libraptor2.0.dylib            	0x04359604 raptor_free_namespace + 36
2   libraptor2.0.dylib            	0x043596f0 raptor_namespaces_clear + 188
3   libraptor2.0.dylib            	0x0436decc raptor_turtle_parse_terminate + 24
4   libraptor2.0.dylib            	0x043543ec raptor_free_parser + 44
5   libsoprano_raptorparser.so    	0x043489e4 Soprano::Raptor::Parser::parseStream(QTextStream&, QUrl const&, Soprano::RdfSerialization, QString const&) const + 1064

And the issue is also confirmed on x86 (the ticket referred has full logs).

Could this be addressed? It would be really helpful.

@cooljeanius
Copy link

raptor_free_uri is defined here:

raptor/src/raptor_uri.c

Lines 475 to 505 in 72a8a2d

/**
* raptor_free_uri:
* @uri: URI to destroy
*
* Destructor - destroy a #raptor_uri object
**/
void
raptor_free_uri(raptor_uri *uri)
{
if(!uri)
return;
uri->usage--;
#if defined(RAPTOR_DEBUG) && RAPTOR_DEBUG > 1
RAPTOR_DEBUG3("URI %s usage count now %d\n", uri->string, uri->usage);
#endif
/* decrement usage, don't free if not 0 yet*/
if(uri->usage > 0) {
return;
}
/* this does not free the uri */
if(uri->world->uris_tree)
raptor_avltree_delete(uri->world->uris_tree, uri);
if(uri->string)
RAPTOR_FREE(char*, uri->string);
RAPTOR_FREE(raptor_uri, uri);
}

I see there's a check to guard against the uri pointer being null, but it looks like one of its subpointers, uri->world, could still be null, though... what happens if you add a check to ensure uri->world isn't null before dereferencing it, does that fix it?

@dajobe
Copy link
Owner

dajobe commented Apr 12, 2024

There's not enough information here to help.

My guess is that it's how raptor's functions are being called. Since raptor isn't written in a reference counted language with garbage collection, it can't guarantee use-after-free if the caller does free twice.

I would suggest building and running your app with something like valgrind or clan asan and see if there are such issues.

I have tested raptor release code that way and in other ways, such as with coverity.

cooljeanius added a commit to cooljeanius/raptor2 that referenced this issue Apr 12, 2024
check for potential null pointer dereference; see dajobe#66
(untested)
@barracuda156
Copy link
Author

@dajobe By the way, since the problem happens on Intel too, maybe you could try sudo port -v build kdelibs4 in Macports?
It should download everything prebuilt up to that point, so it will not waste time on compilation.

@barracuda156
Copy link
Author

@cooljeanius Does it solve the issue on x86 for you?
I can try on PowerPC, of course.

@cooljeanius
Copy link

@cooljeanius Does it solve the issue on x86 for you? I can try on PowerPC, of course.

You mean cooljeanius/raptor2@85fc5b7? I haven't tested it yet, which is why I haven't submitted a PR for it yet...

@kencu
Copy link

kencu commented Apr 17, 2024

I did try @cooljeanius 's patch, and it doesn't error in the raptor code but still later errored in the system.

I rebuilt raptor with clang's address sanitizers enabled, and I believe it shows a double-free is happening as suspected. I'm not 100% sure why or how to fix it at the moment. Perhaps someone familiar with how raptor works like @dajobe might see the issue?

$  DYLD_INSERT_LIBRARIES=/Library/Developer/CommandLineTools/usr/lib/clang/8.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib /opt/local/bin/onto2vocabularyclass --name TMO --encoding trig --namespace Nepomuk::Vocabulary --export-module nepomuk /opt/local/share/ontology/pimo/tmo.trig
=================================================================
==54557==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000076d04 at pc 0x000111b9b49f bp 0x7fff53c78000 sp 0x7fff53c77ff8
READ of size 4 at 0x603000076d04 thread T0
    #0 0x111b9b49e in raptor_free_uri raptor_uri.c:487
    #1 0x111ba3189 in raptor_free_namespace raptor_namespace.c:688
    #2 0x111ba2c38 in raptor_namespaces_clear raptor_namespace.c:303
    #3 0x111c28dfb in raptor_turtle_parse_terminate .turtle_parser.y:1535
    #4 0x111b87883 in raptor_free_parser raptor_parse.c:500
    #5 0x111b75eec in Soprano::Raptor::Parser::parseStream(QTextStream&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:271
    #6 0x111b75505 in Soprano::Raptor::Parser::parseFile(QString const&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:200
    #7 0x111b7576a in non-virtual thunk to Soprano::Raptor::Parser::parseFile(QString const&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:192
    #8 0x10bf8b0b1 in main onto2vocabularyclass.cpp:275
    #9 0x7fff8f6925ac in start (libdyld.dylib+0x35ac)

0x603000076d04 is located 20 bytes inside of 24-byte region [0x603000076cf0,0x603000076d08)
freed by thread T0 here:
    #0 0x10bff4db9 in wrap_free (libclang_rt.asan_osx_dynamic.dylib+0x4adb9)
    #1 0x111b9b688 in raptor_free_uri raptor_uri.c:504
    #2 0x111c2541e in yydestruct .turtle_parser.y:203
    #3 0x111c23879 in turtle_parser_parse turtle_parser.c:3178
    #4 0x111c2a8e7 in turtle_parse .turtle_parser.y:1430
    #5 0x111c29d1e in raptor_turtle_parse_chunk .turtle_parser.y:1750
    #6 0x111b897e7 in raptor_parser_parse_chunk raptor_parse.c:482
    #7 0x111b75ed8 in Soprano::Raptor::Parser::parseStream(QTextStream&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:270
    #8 0x111b75505 in Soprano::Raptor::Parser::parseFile(QString const&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:200
    #9 0x111b7576a in non-virtual thunk to Soprano::Raptor::Parser::parseFile(QString const&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:192
    #10 0x10bf8b0b1 in main onto2vocabularyclass.cpp:275
    #11 0x7fff8f6925ac in start (libdyld.dylib+0x35ac)

previously allocated by thread T0 here:
    #0 0x10bff5157 in wrap_calloc (libclang_rt.asan_osx_dynamic.dylib+0x4b157)
    #1 0x111b99135 in raptor_new_uri_from_counted_string raptor_uri.c:150
    #2 0x111b99a08 in raptor_new_uri_relative_to_base_counted raptor_uri.c:302
    #3 0x111b99b48 in raptor_new_uri_relative_to_base raptor_uri.c:325
    #4 0x111c100d7 in turtle_lexer_lex .turtle_lexer.l:514
    #5 0x111c1b674 in turtle_parser_parse turtle_parser.c:1680
    #6 0x111c2a8e7 in turtle_parse .turtle_parser.y:1430
    #7 0x111c29d1e in raptor_turtle_parse_chunk .turtle_parser.y:1750
    #8 0x111b897e7 in raptor_parser_parse_chunk raptor_parse.c:482
    #9 0x111b75ea4 in Soprano::Raptor::Parser::parseStream(QTextStream&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:267
    #10 0x111b75505 in Soprano::Raptor::Parser::parseFile(QString const&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:200
    #11 0x111b7576a in non-virtual thunk to Soprano::Raptor::Parser::parseFile(QString const&, QUrl const&, Soprano::RdfSerialization, QString const&) const raptorparser.cpp:192
    #12 0x10bf8b0b1 in main onto2vocabularyclass.cpp:275
    #13 0x7fff8f6925ac in start (libdyld.dylib+0x35ac)

SUMMARY: AddressSanitizer: heap-use-after-free raptor_uri.c:487 in raptor_free_uri
Shadow bytes around the buggy address:
  0x1c060000ed50: fd fd fd fa fa fa fd fd fd fd fa fa fd fd fd fa
  0x1c060000ed60: fa fa fd fd fd fd fa fa fd fd fd fa fa fa fd fd
  0x1c060000ed70: fd fd fa fa fd fd fd fa fa fa fd fd fd fd fa fa
  0x1c060000ed80: fd fd fd fd fa fa fd fd fd fa fa fa fd fd fd fd
  0x1c060000ed90: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd
=>0x1c060000eda0:[fd]fa fa fa fd fd fd fd fa fa fd fd fd fa fa fa
  0x1c060000edb0: fd fd fd fd fa fa fd fd fd fa fa fa fd fd fd fd
  0x1c060000edc0: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd
  0x1c060000edd0: fd fa fa fa fd fd fd fd fa fa fd fd fd fa fa fa
  0x1c060000ede0: fd fd fd fd fa fa 00 00 00 fa fa fa fd fd fd fd
  0x1c060000edf0: fa fa fd fd fd fa fa fa fd fd fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==54557==ABORTING
Abort trap: 6

@kencu
Copy link

kencu commented Apr 17, 2024

This appears to be the spot in soprano where these calls emanate from:

https://invent.kde.org/unmaintained/soprano/-/blob/master/parsers/raptor/raptorparser.cpp?ref_type=heads#L271

   // if possible let raptor do the decoding
    if ( QIODevice* dev = stream.device() ) {
        QByteArray buf( bufSize, 0 );
        while ( !dev->atEnd() ) {
            qint64 r = dev->read( buf.data(), buf.size() );
            if ( r <= 0 ||
                 raptor_parser_parse_chunk( parser, ( const unsigned char* )buf.data(), r, 0 ) ) {
                // parse_chunck return failure code.
                // Call it with END=true and then free
                raptor_parser_parse_chunk(parser,0,0,/*END=*/1);
                raptor_free_parser( parser );
                if ( raptorBaseUri ) {
                    raptor_free_uri( raptorBaseUri );
                }
                return StatementIterator();
            }
        }

perhaps "parser" needs to be tested as non-NULL prior to this call?

raptor_free_parser( parser );

@kencu
Copy link

kencu commented Apr 17, 2024

testing for null didn't fix the issue, but removing the line did, right or wrong.

that then leads to this error:

$  DYLD_INSERT_LIBRARIES=/Library/Developer/CommandLineTools/usr/lib/clang/8.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib /opt/local/bin/onto2vocabularyclass --name TMO --encoding trig --namespace Nepomuk::Vocabulary --export-module nepomuk /opt/local/share/ontology/pimo/tmo.trig
Failed to parse file/opt/local/share/ontology/pimo/tmo.trig(Parsing failed (3): syntax error, unexpected end of file, expecting } (line: 75, column: -1))

which may indicate what the problem really is...

@dajobe
Copy link
Owner

dajobe commented Apr 17, 2024

If you can demonstrate this issue/crash with the lastest release build of raptor and the 'rapper' utility, then I probably can look deeper.

FWIW I only have amd64, aarch64 (linux) / arm64 (darwin), armv7l, riscv arches here to test.

@barracuda156
Copy link
Author

FWIW I only have amd64, aarch64 (linux) / arm64 (darwin), armv7l, riscv arches here to test.

The issue is present on x86_64, AFAICT.

@kencu
Copy link

kencu commented Apr 19, 2024

I don't think this issue has anything to do with raptor really -- although I guess ideally it shouldn't crash when given bogus data, that is not raptor's fault.

The main problem is most likely soprano, which is ancient, out-of-date, unsupported upstream, and horribly fails it's test suite when that is attempted.

@barracuda156
Copy link
Author

I don't think this issue has anything to do with raptor really -- although I guess ideally it shouldn't crash when given bogus data, that is not raptor's fault.

The main problem is most likely soprano, which is ancient, out-of-date, unsupported upstream, and horribly fails it's test suite when that is attempted.

@kencu But it presumably worked at some point, right? At least in a sense of KDE4 ports building and working with it (not necessarily passing its own test-suite).
What has changed to break it? If soprano has introduced some bug, we can roll back, of course, or fix related code in it.

@kencu
Copy link

kencu commented Apr 20, 2024

yes, soprano worked last year when i built kdelibs4 on Sonoma. Something broke it since then.

not raptor, though. Let's leave this man in peace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants