Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csound sometimes segfaults on macOS and Linux with JackoInit and pyinit #1182

Closed
nwhetsell opened this issue Sep 2, 2019 · 11 comments
Closed
Assignees
Labels

Comments

@nwhetsell
Copy link
Contributor

To reproduce on macOS Mojave 10.14.6:

  1. Install Homebrew by following the instructions at https://brew.sh.

  2. Enter in Terminal:

    brew install jack
    brew install --build-from-source csound
    brew services start jack
    cat <<EOS > test.orc
    sr = $(jack_samplerate)
    ksmps = $(jack_bufsize)
    
    JackoInit "default", "csound"
    pyinit
    EOS
  3. Enter in Terminal:

    csound --nosound --orc test.orc

It may require doing step 3 a few times, but Csound eventually segfaults:

0dBFS level = 32768.0
--Csound version 6.13 (double samples) Sep  2 2019
[commit: none]
libsndfile-1.0.28
scoreless operation
orchname:  test.orc
rtaudio: PortAudio module enabled ...
using callback interface
rtmidi: PortMIDI module enabled
Elapsed time at end of orchestra compile: real: 0.002s, CPU: 0.002s
sorting score ...
	... done
Elapsed time at end of score sort: real: 0.002s, CPU: 0.002s
graphics suppressed, ascii substituted
0dBFS level = 32768.0
Created Jack client "csound" for Jack server "default".
Activated Jack client "csound".
orch now loaded
audio buffered in 1024 sample-frame blocks
not writing to sound disk
SECTION 1:
Score finished in csoundPerform().
inactive allocs returned to freespace
end of score.		   overall amps:      0.0
	   overall samples out of range:        0
0 errors in performance
Elapsed time at end of performance: real: 0.603s, CPU: 0.022s
no sound written to disk

csound command: Segmentation fault

To stop JACK, enter in Terminal:

brew services stop jack
@kunstmusik
Copy link
Member

I used jack from homebrew but custom local build. I could reproduce this segfault but building with RelWithDebInfo, I couldn't find any usable information from running with lldb. I looked at the jacko opcode source but it is @gogins C++ code and I don't have mental space to debug (lots of pthread code there). I did see this when I loaded into lldb:

(lldb) target create "csound"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/Cellar/python@2/2.7.16_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
    import weakref
  File "/usr/local/Cellar/python@2/2.7.16_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <modul
e>
    from _weakref import (
ImportError: cannot import name _remove_dead_weakref
rCurrent executable set to 'csound' (x86_64).```

which I do not know is relevant or not.  

Removing either pyinit or JackoInit, I wasn't able to segfault, so there is something about the two together. 

Also tested on 10.14.6. 

@vlazzarini
Copy link
Member

vlazzarini commented Sep 5, 2019

That lldb warning is nothing to do with Csound, but something with lldb and python,

https://stackoverflow.com/questions/52320028/getting-remove-dead-weakref-error-for-lldb

The jacko code does some unusual things like running Csound performKsmps from inside it and it is hard for us to debug it.

@nwhetsell
Copy link
Contributor Author

nwhetsell commented Sep 5, 2019

Removing either pyinit or JackoInit, I wasn't able to segfault, so there is something about the two together.

I’ve been able to reproduce this by removing JackoInit pyinit and setting 0dbfs to 1—

cat <<EOS > test.orc
sr = $(jack_samplerate)
ksmps = $(jack_bufsize)

0dbfs = 1
JackoInit "default", "csound"
EOS

—but it usually requires running csound --nosound --orc test.orc many more times than with JackoInit and pyinit together.

@nwhetsell
Copy link
Contributor Author

Sorry, that should’ve been “removing pyinit”, not removing JackoInit.

@gogins
Copy link
Contributor

gogins commented Oct 17, 2019

You should not assign this to me unless the problem can be reproduced on Linux, the only operating system that I currently use.

@vlazzarini
Copy link
Member

I am sure it can be reproduced there. Nothing in particular to do with MacOS in this.

@gogins
Copy link
Contributor

gogins commented Oct 17, 2019 via email

@gogins
Copy link
Contributor

gogins commented Oct 20, 2019

With this .orc file on Ubuntu

sr = 48000
ksmps = 128
nchnls = 2

JackoInit "default", "csound"
pyinit

I can reproduce the occasional crash. When I use valgrind with the helgrind tool I see some errors that may be related to the crash. I will investigate further.

mkg@bodhimandala:~/csound-extended/nudruz/examples$ valgrind --tool=helgrind csound --nosound --orc jackoinit-test.orc 
==7712== Helgrind, a thread error detector
==7712== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==7712== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7712== Command: csound --nosound --orc jackoinit-test.orc
==7712== 
0dBFS level = 32768.0
--Csound version 6.14 (double samples) Oct  2 2019
[commit: 3b08a449cc147dec35ba843ced86e0058a9ac865]
libsndfile-1.0.28
WARNING: could not open library '/usr/lib/csound/plugins64-6.0/libfaustcsound.so' (/usr/lib/csound/plugins64-6.0/libfaustcsound.so: undefined symbol: _Z16deleteDSPFactoryP16llvm_dsp_factory)
scoreless operation
orchname:  jackoinit-test.orc
rtaudio: ALSA module enabled
rtmidi: ALSA Raw MIDI module enabled
Elapsed time at end of orchestra compile: real: 0.764s, CPU: 0.758s
sorting score ...
	... done
Elapsed time at end of score sort: real: 0.774s, CPU: 0.768s
graphics suppressed, ascii substituted
0dBFS level = 32768.0
Created Jack client "csound" for Jack server "default".
Activated Jack client "csound".
orch now loaded
audio buffered in 256 sample-frame blocks
not writing to sound disk
SECTION 1:
Score finished in csoundPerform().
inactive allocs returned to freespace
end of score.		   overall amps:      0.0      0.0
	   overall samples out of range:        0        0
0 errors in performance
Elapsed time at end of performance: real: 1.870s, CPU: 1.859s
no sound written to disk
Aeolus: csoundModuleDestroy...
Aeolus: csoundModuleDestroy.
==7712== ---Thread-Announcement------------------------------------------
==7712== 
==7712== Thread #1 is the program's root thread
==7712== 
==7712== ----------------------------------------------------------------
==7712== 
==7712== Thread #1 unlocked a not-locked lock at 0xBDC6D50
==7712==    at 0x4C34534: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7712==    by 0xE124F02: csoundModuleDestroy (signalflowgraph.cpp:1534)
==7712==    by 0x4F9AB54: csoundDestroyModules (csmodule.c:715)
==7712==    by 0x4E85B50: reset (csound.c:3224)
==7712==    by 0x4E87A9F: csoundDestroy (csound.c:1379)
==7712==    by 0x1095AD: main (csound_main.c:329)
==7712==  Lock at 0xBDC6D50 was first observed
==7712==    at 0x4C37F2A: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7712==    by 0x4FA11D4: csoundCreateMutex (threads.c:479)
==7712==    by 0xE1249D7: SignalFlowGraphState (signalflowgraph.cpp:261)
==7712==    by 0xE1249D7: csoundModuleCreate_signalflowgraph (signalflowgraph.cpp:1494)
==7712==    by 0x4F9AED3: csoundLoadExternal (csmodule.c:363)
==7712==    by 0x4F9B741: csoundLoadModules (csmodule.c:535)
==7712==    by 0x4E89972: csoundReset (csound.c:3407)
==7712==    by 0x4E8A140: csoundCreate (csound.c:1315)
==7712==    by 0x109467: main (csound_main.c:320)
==7712==  Address 0xbdc6d50 is 0 bytes inside a block of size 40 alloc'd
==7712==    at 0x4C30F2F: malloc (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7712==    by 0x4FA11C1: csoundCreateMutex (threads.c:477)
==7712==    by 0xE1249D7: SignalFlowGraphState (signalflowgraph.cpp:261)
==7712==    by 0xE1249D7: csoundModuleCreate_signalflowgraph (signalflowgraph.cpp:1494)
==7712==    by 0x4F9AED3: csoundLoadExternal (csmodule.c:363)
==7712==    by 0x4F9B741: csoundLoadModules (csmodule.c:535)
==7712==    by 0x4E89972: csoundReset (csound.c:3407)
==7712==    by 0x4E8A140: csoundCreate (csound.c:1315)
==7712==    by 0x109467: main (csound_main.c:320)
==7712==  Block was alloc'd by thread #1
==7712== 
==7712== 

csound command: Segmentation fault
==7712== ----------------------------------------------------------------
==7712== 
==7712== Thread #1's call to pthread_mutex_destroy failed
==7712==    with error code 16 (EBUSY: Device or resource busy)
==7712==    at 0x4C33EB3: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==7712==    by 0x4FA123D: csoundDestroyMutex (threads.c:534)
==7712==    by 0xE124F0C: csoundModuleDestroy (signalflowgraph.cpp:1535)
==7712==    by 0x4F9AB54: csoundDestroyModules (csmodule.c:715)
==7712==    by 0x4E85B50: reset (csound.c:3224)
==7712==    by 0x4E87A9F: csoundDestroy (csound.c:1379)
==7712==    by 0x1095AD: main (csound_main.c:329)
==7712== 
==7712== 
==7712== For counts of detected and suppressed errors, rerun with: -v
==7712== Use --history-level=approx or =none to gain increased speed, at
==7712== the cost of reduced accuracy of conflicting-access information
==7712== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 202 from 81)

@gogins
Copy link
Contributor

gogins commented Oct 21, 2019

That was in the signal flow graph opcodes, but this may be the real problem, which only is printed when using the drd tool:

==11470== Destroying condition variable that is being waited upon: cond 0xf5795a0, mutex 0xf5795e0 locked by thread 2
==11470==    at 0x401CFD7: munmap (syscall-template.S:78)
==11470==    by 0x40197DC: _dl_unmap_segments (dl-unmap-segments.h:32)
==11470==    by 0x40197DC: _dl_unmap (tlsdesc.c:139)
==11470==    by 0x40161D9: _dl_close_worker.part.0 (dl-close.c:680)
==11470==    by 0x4016AA9: _dl_close_worker (dl-close.c:125)
==11470==    by 0x4016AA9: _dl_close (dl-close.c:842)
==11470==    by 0x56802DE: _dl_catch_exception (dl-error-skeleton.c:196)
==11470==    by 0x568036E: _dl_catch_error (dl-error-skeleton.c:215)
==11470==    by 0x619F734: _dlerror_run (dlerror.c:162)
==11470==    by 0x619F0B2: dlclose (dlclose.c:46)
==11470==    by 0x4FA6B74: csoundDestroyModules (csmodule.c:723)
==11470==    by 0x4E91B80: reset (csound.c:3226)
==11470==    by 0x4E93ACF: csoundDestroy (csound.c:1381)
==11470==    by 0x1097CA: signal_handler (csound_main.c:226)
==11470== mutex 0xf5795e0 was first observed at:
==11470==    at 0x4C385F0: pthread_mutex_init (in /usr/lib/valgrind/vgpreload_drd-amd64-linux.so)
==11470==    by 0xF36BE61: jack_messagebuffer_init (in /usr/lib/x86_64-linux-gnu/libjack.so.0.0.28)
==11470==    by 0xF369007: jack_client_open_aux (in /usr/lib/x86_64-linux-gnu/libjack.so.0.0.28)
==11470==    by 0xF36962E: jack_client_open (in /usr/lib/x86_64-linux-gnu/libjack.so.0.0.28)
==11470==    by 0x1736F42B: ???
==11470==    by 0x4EA91D8: init0 (insert.c:249)
==11470==    by 0x4EAEDFC: musmon (musmon.c:313)
==11470==    by 0x4FA92F3: csoundStart (main.c:548)
==11470==    by 0x10947F: main (csound_main.c:324)
==11470== 

@gogins
Copy link
Contributor

gogins commented Oct 21, 2019

This crash appears to be happening because I did not anticipate the case of calling JackoInit without actually using any other Jacko opcodes.

@gogins gogins changed the title Csound sometimes segfaults on macOS with JackoInit and pyinit Csound sometimes segfaults on macOS and Linux with JackoInit and pyinit Oct 22, 2019
@gogins
Copy link
Contributor

gogins commented Oct 22, 2019

This is now fixed for me.

Opcodes/jacko.cpp:

  1. The Jack process callback was returning a nonzero value even though it was succeeding. This caused Jack to shut down and then Csound also would shut it down. The process callback now always returns 0.
  2. There are two kinds of Jack shutdown required, (1) after performance has ended, and (2) in the csoundModuleDestroy function which is called (as in the test case here) even if there actually is no performance. If the shutdown happens twice, there is a crash. There is now a test for whether shutdown has already happened.
  3. There is a gap between when the client (Csound) shuts down its Jack connection, and when the complete Jack signal audio processing graph shuts down. There was a crash called by trying to read and write Jack audio buffers that had been deallocated by Csound before the Csound performance was completely finished. There is now a test in every Jack input and output opcode at every kperiod for whether the Jack processing has finished, and if so the Jack buffers are not accessed.
  4. Message printing is more regularized.

Tested with Jack and the Aeolus software organ, also with the test case here, and both of these also with Valgrind's helgrind thread safety and data race tool.

Opcodes/signalflowgraph.cpp:

A mutex was unlocked a second time in csoundModuleDestroy, since it has already been unlocked it is simply destroyed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants